StatsCosmos: Setting up an Ubuntu OpenStack cloud using the Oracle VM VirtualBox, Ubuntu Linux Server 14.04.3 LTS and Ubuntu OpenStack Autopilot

This blog post is part of a series on how to set up and grow a Hadoop version 2 ecosystem for your Big data analyses. The methods outlined can be implemented on a single computer or a cluster of computers. Similarly, the methods can be implemented for business applications as well as non-business applications. The main non-business applications include academic and personal research.

The main technology fields include computing (software and hardware), programming (statistical and non-statistical), cloud computing, applications development, (Big) data science, mathematical statistics and (pure) mathematics. The tools and applications that will be considered can be applied in e-commerce, environmental science, library science, linguistics and many other fields.

The series has two aims. The first is to provide a simple overview of the field of DevOps. The second is to give an account of what can be achieved with a very minimal set of DevOps tools. This distinction will naturally be reflected in the hardware and software settings. A professional DevOps specialist would be deploying a much (much) more advanced configuration. This is especially so for a data science DevOps specialist from a company/organisation like Apache, Google, Canonical, DataStax, Yahoo!, Amazon, Cloudera, MapR, Hortonworks, IBM, Microsoft Azure, SAS, Microsoft Revolution Analytics and others.

The idea is to think of each communication between two machines over the internet as a private cloud. The private cloud can then be generalized to more complex scenarios based on the purpose of establishing the cloud. The cloud can be private or public according to your preference and purpose.

The post describes the steps involved in setting up an OpenStack cloud from the context of a single Windows host machine. The OpenStack cloud will have the characteristics of a Hadoop version 2 ecosystem (i.e. without the Hadoop version 2).

1. Oracle VM VirtualBox

The Oracle VM VirtualBox (OVB) User Manual gives the following key reasons for virtualization:

Run multiple operating systems simultaneously
Easy software instructions
Testing and disaster recovery
Infrastructure consolidation

Below are two screenshots of the OVB Virtual Manager and Virtual Machines.

2. Oracle VM VirtualBox download

The OVB can be downloaded from the OVB website downloads section. The site also includes downloadable user manuals, online user manuals, tutorials and other materials. The site also has other update information as well as the OVB Extension pack. The extension pack should be downloaded with the OVB in order to fully access its features, like for example shared folders. It is also a very good idea to download a copy of the user manual and to fully setup the OVB's features.

3. Ubuntu Linux Server and OpenStack Autopilot download

The next step after the downloading the OVB components is to download the operating system and cloud software for the Virtual Machine (VM). The download for Ubuntu Server 14.04.3 LTS can be found at Canonical’s Ubuntu website at the Server downloads section.

The download for Ubuntu OpenStack Autopilot can be found at Canonical’s Ubuntu website at the Cloud downloads section.

The software bundles are suited for a DevOps project because of the various cloud communications. The most important being the Apache Hadoop project and other Hadoop Ecosystem projects. The notable ones being Apache Ambari, Apache Avro, Apache Chukwa!, Apache Zookeeper, Apache Hive, Apache Maven, Apache HBase, Apache Pig, Apache Cassandra (and DataStax Apache Cassandra), Apache Mahout, Apache Spark, Apache Storm and Apache Tez. It is is also very important for our purposes to keep in mind the excellent projects under the Google Cloud platform like Hadoop on Google Cloud Platform, Google Cloud BigQuery project and other related projects.

Canonical Ubuntu also has two other exciting options that can be used at this stage. The first is to test drive installing OpenStack Autopilot on vSphere. The second is to manually install Ubuntu OpenStack using MAAS and Juju.

The test drive of OpenStack Autopilot on vSphere can be found in the Cloud downloads section. The installation of OpenStack using MAAS and Juju can be found in the Cloud downloads section.

4. Virtual Machine Setup

The installation files can be saved in a windows folder like c:/Downloads/ or ~/Downloads in Ubuntu Linux. The manual guide gives a more detailed account of the steps in setting up the OVB and VM.

Install the OVB by double clicking on the downloaded file. This will start the wizard as shown below. The various options will be according to preference(s).

After the installation completes successfully, the Virtual Manager Window will appear. Click on New to create your VM. Give your machine a name. Choose your type to be Linux. Choose your version to be Ubuntu (64-bit).

Select Memory size to be 4G and click next.

In the next step, select Create a virtual hard disk now.

Select VHD (Virtual Hard Disk).

Select Dynamically allocated.

Specify the file location and size of the disk. My recommendation is 25 G. However, one can start off with 8G but a smaller size will mean more management later like adding new hard drives, drive partitioning, and so on. This will just add complications in managing your Hadoop cluster ecosystem.

The important issue to consider is that for DevOps you need enough local space for: your Hadoop cluster; management of the cluster; managing interactions with the cluster; developing cluster applications; data sets for your applications,;system log files; Ubuntu 14.04 server updates and features; Ubuntu OpenStack cloud management; Juju OpenStack cloud management, managin; your cloud server accesses; Oracle Virtual Box features; and so on.

The Virtual Manager is where you manage the VMs and their settings and preferences. This is also where you can create new hard drives, manage network connections, preferences and settings. It is most important to read the OVB manual to set up all the features of your VMs and in our case our Ubuntu Server (or private cloud).

5. Installing Ubuntu 14.04.3 LTS Server and Ubuntu OpenStack Autopilot on the Virtual Machine

The installation instructions for Ubuntu Server 14.04.3 LTS can be found at the Canonical’s Ubuntu website under the Download>Server Installation guides section.

The installation instructions for OpenStack can be found at the Canonical’s Ubuntu website under the Download>Cloud Installation guides section. The instructions for the required installations in the VM involve combining the instructions for installing a server and software for an OpenStack cloud.

6. Setting up Ubuntu OpenStack cloud

Setting up the OpenStack cloud involves going to the Canonical Ubuntu website Cloud> OpenStack section for the getting started guides on Ubuntu OpenStack.

The next step is to go to the Canonical Ubuntu Cloud>Juju section for the getting started guides on Juju.

The next step is to go to the Canonical Ubuntu Cloud> MAAS section for the getting started guides on MAAS.

The next step is to go to the Apache Hadoop website for the downloads and installations for the Hadoop version 2 Ecosystem.

It will also be very important to go and visit the Google Cloud Big Data projects, especially the Hadoop on Google Cloud Platform, Google BigQuery project and its related projects.

A good place for online courses on Apache Cassandra is the DataStax Academy.

I hope this post was helpful to you.The next post in the series will outline the procedures in setting up OpenStack, Juju, Hadoop version 2.6.0 and the Hadoop version 2 Ecosystem.

In the meantime, interested in seeing other digital and social media materials from Stats Cosmos blog?

Check out our other blog posts and screencast series