This blog post is part of a series on how to set up and grow a Hadoop version 2 ecosystem for your Big data analyses. The methods outlined can be implemented on a single computer or a cluster of computers. Similarly, the methods can be implemented for business applications as well as non-business applications. The main non-business applications include academic and personal research.
The main technology fields include computing
(software and hardware), programming (statistical and non-statistical), cloud computing,
applications development, (Big) data science, mathematical statistics and (pure)
mathematics. The tools and applications that will be considered can be applied
in e-commerce, environmental science, library science, linguistics and many
other fields.
The series has two aims. The first is to provide a
simple overview of the field of DevOps. The second is to give an account of
what can be achieved with a very minimal set of DevOps tools. This
distinction will naturally be reflected in the hardware and software settings. A professional DevOps specialist would be deploying
a much (much) more advanced configuration. This is especially so for a data science DevOps
specialist from a company/organisation like Apache, Google, Canonical,
DataStax, Yahoo!, Amazon, Cloudera, MapR, Hortonworks, IBM, Microsoft Azure, SAS, Microsoft Revolution
Analytics and others.
The idea is to think of each communication between
two machines over the internet as a private cloud. The private cloud can then
be generalized to more complex scenarios based on the purpose of establishing
the cloud. The cloud can be private or
public according to your preference and purpose.
The post describes the steps involved in setting up
an OpenStack cloud from the context of a single Windows host machine. The OpenStack cloud will have the characteristics of a Hadoop version 2 ecosystem (i.e. without the Hadoop version 2).
1. Oracle VM VirtualBox
The Oracle VM VirtualBox (OVB) User Manual gives the
following key reasons for virtualization:
- Run multiple operating systems simultaneously
- Easy software instructions
- Testing and disaster recovery
- Infrastructure consolidation
Below are two screenshots of the OVB Virtual Manager and Virtual Machines.
2. Oracle VM VirtualBox download
The OVB can be downloaded from the OVB website downloads section. The site also includes downloadable user manuals, online user manuals, tutorials and other materials. The site also has other update information as well as the OVB Extension pack. The extension pack should be downloaded with the OVB in order to fully access its features, like for example shared folders. It is also a very good idea to download a copy of the user manual and to fully setup the OVB's features.
3. Ubuntu Linux Server and OpenStack Autopilot download
The next step after the downloading the OVB
components is to download the operating system and cloud software for the Virtual
Machine (VM). The download for Ubuntu Server 14.04.3 LTS can be found at Canonical’s
Ubuntu website at the Server downloads section.
The download for Ubuntu OpenStack Autopilot can be found at Canonical’s Ubuntu website at the Cloud downloads section.
The software bundles are suited for a DevOps project because of the various cloud communications. The most important being the Apache Hadoop project and other Hadoop Ecosystem projects. The notable ones being Apache Ambari, Apache Avro, Apache Chukwa!, Apache Zookeeper, Apache Hive, Apache Maven, Apache HBase, Apache Pig, Apache Cassandra (and DataStax Apache Cassandra), Apache Mahout, Apache Spark, Apache Storm and Apache Tez. It is is also very important for our purposes to keep in mind the excellent projects under the Google Cloud platform like Hadoop on Google Cloud Platform, Google Cloud BigQuery project and other related projects.
Canonical Ubuntu also has two other exciting options that can be used at this stage. The first is to test drive installing OpenStack Autopilot on vSphere. The second is to manually install Ubuntu OpenStack using MAAS and Juju.
The test drive of OpenStack Autopilot on vSphere can be found in the Cloud downloads section. The installation of OpenStack using MAAS and Juju can be found in the Cloud downloads section.
4. Virtual Machine Setup
The installation files can be saved in a windows folder like c:/Downloads/ or ~/Downloads in Ubuntu Linux. The manual guide gives a more detailed account of the steps in setting up the OVB and VM.
Install the OVB by double clicking on
the downloaded file. This will start the wizard as shown below. The various
options will be according to preference(s).
After the installation completes successfully, the Virtual Manager Window will appear. Click on New to create your VM. Give your machine a name. Choose your type to be Linux. Choose your version to be Ubuntu (64-bit).
Specify the file location and size of the disk. My recommendation is 25 G. However, one can start off with 8G but a smaller size will mean more management later like adding new hard drives, drive partitioning, and so on. This will just add complications in managing your Hadoop cluster ecosystem.
The important issue to consider is that for DevOps you need enough local space for: your Hadoop cluster; management of the cluster; managing interactions with the cluster; developing cluster applications; data sets for your applications,;system log files; Ubuntu 14.04 server updates and features; Ubuntu OpenStack cloud management; Juju OpenStack cloud management, managin; your cloud server accesses; Oracle Virtual Box features; and so on.
The Virtual Manager is where you manage the VMs and their settings and preferences. This is also where you can create new hard drives, manage network connections, preferences and settings. It is most important to read the OVB manual to set up all the features of your VMs and in our case our Ubuntu Server (or private cloud).
5. Installing Ubuntu 14.04.3 LTS Server and Ubuntu OpenStack Autopilot on the Virtual Machine
The installation instructions for Ubuntu
Server 14.04.3 LTS can be found at the Canonical’s Ubuntu website under the Download>Server Installation guides section.
The installation instructions for OpenStack can be found at the Canonical’s Ubuntu website under the Download>Cloud Installation guides section. The instructions for the required installations in the VM involve combining the instructions for installing a server and software for an OpenStack cloud.
6. Setting up Ubuntu OpenStack cloud
Setting up the OpenStack cloud involves going to the Canonical Ubuntu website Cloud> OpenStack section for the getting started guides on Ubuntu OpenStack.
The next step is to go to the Canonical Ubuntu Cloud>Juju section for the getting started guides on Juju.
The next step is to go to the Canonical Ubuntu Cloud> MAAS section for the getting started guides on MAAS.
The next step is to go to the Apache Hadoop website for the downloads and installations for the Hadoop version 2 Ecosystem.
It will also be very important to go and visit the Google Cloud Big Data projects, especially the Hadoop on Google Cloud Platform, Google BigQuery project and its related projects.
A good place for online courses on Apache Cassandra is the DataStax Academy.
I hope this post was helpful to you.The next post in the series will outline the procedures in setting up OpenStack, Juju, Hadoop version 2.6.0 and the Hadoop version 2 Ecosystem.
In the meantime, interested in seeing other digital and social media materials from Stats Cosmos blog?
Check out our other blog posts and screencast series
Subscribe
to our RSS feeds for blog material updates
Blog post
RSS feeds
Screencast RSS Feeds
Or get a 50% discount to our exciting training
opportunity bundle
Do you have statistical products to sell?
Why not try selling them on Amazon?
Sources:
http://bit.ly/1XcUqXq
http://bit.ly/1T7SoIO
http://bit.ly/1Pgq57i
http://bit.ly/1UXj3Gc
http://bit.ly/20QtmiW
http://bit.ly/20QtmiW
http://bit.ly/1XcUqXq
http://bit.ly/1T7SoIO
http://bit.ly/1Pgq57i
http://bit.ly/1UXj3Gc
http://bit.ly/20QtmiW
http://bit.ly/20QtmiW
I have finally found a worth able content to read, your information in this blog is impressive. Keep sharing more like this.
ReplyDeleteData Science Course in Chennai
Data Science Training in Chennai
R Training in Chennai
R Programming Training in Chennai
Machine Learning Course in Chennai
Machine Learning institute in Chennai
Data Science Training in Anna Nagar
Data Science Training in Chennai
This concept is a good way to enhance the knowledge.thanks for sharing..
ReplyDeleteOpenstack Training
Openstack Certification Training
OpenStack Online Training
Openstack Training Course
Openstack Training in Hyderabad