I can HaaS…Hadoop as a Service Setup

There seems to be a lot of focus today around XaaS (anything as a service). There is also lots of talk about big data, typically Hadoop, and being able to automate the Hadoop lifecycle. We’ve had a number of customers ask us about deploying Hadoop clusters using the XaaS model instead of deploying each component manually. Here’s an introduction video series to Hadoop if your curious. As I started to build this out in the Varrow Innovation labs I found, or rather couldn’t find, an end-to-end tutorial of setting up Hadoop as a Service. Here’s everything you need to get started

Big Data Extensions (BDE) is used to deploy big data clusters in a VMware environment. BDE runs on top of an open source project started by VMware called Project Serengeti. Project Serengeti consists of:

  • Serengeti management server — provides services/framework for big data clusters on vSphere. Provides management of resources, provisioning, monitoring, etc.
  • Serengeti CLI — provides the tools to manage and monitor big data cluster management within vSphere. If you’re only using the open source version of Project Serengeti then the CLI is the only option you have
  • BDE – this is the enterprise version of Project Serengeti. Provides a UI to manage your big data clusters as well as enterprise support

Multiple distributions of Hadoop are supported for BDE, check out what distros are supported here. In this walk-through we’ll be working with the Hortonworks Data Platform (HDP), specifically HDP

Install VMware Big Data Extensions 2.1

  • Deploy the OVA file via the vSphere Web Client — you don’t HAVE to use the web client
  • Accept the EULA
  • Select a name and location for the appliance
  • Select your storage and network
  • On the Customize template screen ensure the Initialize Resources checkbox is checked
  • Still on the Customize template screen, enter in the vCenter SSO URL. The description of the field tells you how to format the URL. The port is different for vSphere 5.x (port 7444) and vSphere 6.0 (port 443)
  • Setup you DNS and IP settings


  • The next screen tells you that the appliance is going to bind to the vCenter Extension vService. This means that the BDE appliance will have unfettered access to the vCenter APIs.
  • Select the Power on after deployment checkbox and click Finish

Configure VMware Big Data Extensions 2.1

  • Once you have the vApp deployed and powered on open up the console for the management-server
  • Here you will notice a temporary password and instructions on how to change it


  • Login as seregenti and run sudo /opt/serengeti/sbin/set-password –u to change the serengeti user password
  • Close the console and SSH into the management-server with the serengeti user
  • We’ll need to configure the YUM repository. Do so by running the following commands:

Install the BDE 2.1 Plugin


  • Log out and back in to the vSphere Web Client. You should now see Big Data Extensions in the tree on the left as well as in the Home tab on the right


Connect the Serengeti Management Server

  • Within the vSphere Web Client click on Big Data Extensions
  • In the Summary tab click on the Connect Server… hyperlink
  • Browse through the inventory tree and find the management-server


  • Click OK –> click OK to accept the certificate
  • You will now see the server connected in the Summary tab

NOTE: If you get an SSO error you either didn’t check the Initialize Resources checkbox during the OVA deployment or the SSO URL is not right. To rectify this run sudo /opt/seregenti/sbin/EnableSSOAuth from the command line of the management-server

Adding a Hadoop Distribution to BDE

As I mentioned earlier, we’ll be adding HDP 2.1 to use for our big data clusters. Reference the link above to see what other distros are supported. You should be able to add other supported distros in this same fashion. The steps below come from VMware KB 2091054. I’ve included the steps here for completeness. If you really LOVE VMware knowledge base article formatting feel free to follow the linked KB.

  • To get started you need to have a box that can act as a YUM repo server. All you need for this is a CentOS box (or similar flavor), install httpd and ensure that it is always on (chkconfig httpd on) and install yum-utils createrepo. Use yum install -y yum-utils createrepo for this.

Now that you have a new YUM repo server setup you’ll need to configure it to host the HDP 2.1 distribution

  • SSH into the YUM repo server as root
  • Type the following commands to configure HDP 2.1:
Make sure you can browse to the new repo using the following URL: http://myyumserverIP/hdp/2

Now you’ll need to configure the new distribution on the Serengeti server

  • SSH into the seregenti management server with the Serengeti user account
  • Run the following commands
If you’ve done everything correctly (and if I haven’t missed something in the steps below) you will be able to log on to the vCenter Web Client and view the new HDP 2.1 distribution you just added.

  • Log into the vCenter Web Client > click on Big Data Extension from the tree on the left
  • Click on Big Data Clusters > click the icon to add a new cluster (serers with a green +)
  • In the Hadoop Distribution you should see the name of the new HDP 2.1 distro. Keep in mind the name will match the parameter specified when you ran the ./config-distro.rb command above, which was myhdp. In the screenshot below you’ll see I changed it to HDP_21


Alright, congratulations! You can now deploy new HDP 2.1 Hadoop clusters from the vCenter Web Client. This works great as in the wizard you can specify the type of Hadoop cluster you want, the size of the name/data nodes and the distribution you want.

If all you were looking for was to get Big Data Extensions up and running and configure a new distribution then you’re [finally] done! If you’re looking to wrap this into vRealize Automation then you’ll need to continue reading, but not too much more.

Bringing Big Data Extensions into vRealize Automation

In order to bring Big Data Extensions into vRealize Automation you’ll need to import the vRealize Orchestrator plugin for Haddop, add a custom resource, expose the workflows up to the vRealize Automation as an advanced service and then add resource actions to the advanced service. Let’s start with importing the plugin. You can download the plugin from the solution exchange here

  • Log in to the vRealize Orchestrator configuration page (https://<vro-name-or-ip>:8283)
  • Once logged in click on the Plug-ins tab on the left
  • On the right, scroll to the bottom and click the magnifying glass > browse to the plugin file and select it
  • Click the Upload and Install button > accept the license agreement
  • On the left click Startup Options > restart the vCO server
  • Restart the vCO configuration server
  • Log into the vCO client > under Run select Workflows
  • Under Library you should now see Hadoop Cluster as a Service

  • Expand Hadoop Cluster as a Service > expand Configuration > right-click Configure The Serengeti Host and click Start Workflow…
  • Type in the URL for the Serengeti management server in the following format: https://<mgmt-server-ip>:8443/serengeti
  • Enter in the username for an administrative user on vCenter in UPN format (e.g. user@domain.com)
  • Enter in the password for the administrative user
  • Click Submit

  • You’ll get a question about importing the certificate, on the last page of the form select Install certificate… from the dropdown

The Hadoop plugin should now be installed and ready to go. Log into your vRealize Automation Instance to configure the advanced service. In order to create the advanced service we’ll need to add a custom resource so that we can later apply resource actions to our Hadoop clusters

  • Log into your vRealize Automation Instance as a service architect
  • Navigate to Advanced Services > from the left click Custom Resources
  • Click the Add button > start typing BigData and you should get a drop down, select BigDataExtension:Cluster
  • Type in the name for this resource, such as Hadoop Cluster

  • Click Next > click Add
  • Navigate to Advanced Services > from the left click Service Blueprints
  • Click the Add button > expand the Orchestrator folder > expand Library > select Create Basic Hadoop Cluster
  • Click Next

  • Change the name if you’d like and add a description > click Next 
  • Nothing needs to be changed on this next page, click Next
  • In the Provisioned Resource tab choose cluster [Hadoop Cluster]
  • Click Add
  • Select the newly create service blueprint > click Publish

Alright everyone, we’re ALMOST done. The last steps we need to do are add custom actions, add the new actions to an entitlement and add the blueprint to the service catalog.

Adding Resource Actions

  • While still logged into vRealize Automation under Advanced Services click Resource Actions
  • Click the Add button > expand Orchestrator > expand Library > expand Hadoop Cluster as a Service > expand Cluster Operation Service
  • Select Delete Cluster > click Next
  • On the Input Resource page click Next
  • In the Details page check the Disposal checkbox > click Next
  • Click Add


  • Select the newly created custom resource and click Publish

Repeat the above steps for the remaining custom actions: DO NOT CHOOSE DISPOSAL LIKE YOU DID ABOVEFOR THE ACTIONS BELOW

  • Resize Cluster
  • Start Cluster
  • Stop Cluster
  • Update Cluster

The last items are to add the new service to the service catalog and add the new resource actions to an entitlement

Add Service to Catalog

  • Log into vRealize Automation and navigate to Administration
  • From the left click Catalog Management > click Catalog Items
  • Select the name you chose for your Hadoop Cluster creation advanced service blueprint, in my case it is Create a Hadoop Cluster > click Configure
  • Select an existing service rom the dropdown at the bottom and change the icon if you’d like
  • Click Update


Add Custom Resources to Entitlement

  • Log into vRealize Automation and navigate to Administration
  • Click Catalog Management > select Entitlements
  • I only have one entitlement, but if you have more than one, choose the one that makes sense to you
  • Choose an entitlement > click Edit > click the Items & Approvals 
  • Click the green + button next to Entitled Actions
  • Select the new custom resources you added earlier > click OK
  • Click Update


That should be it! Now that you’ve spent four hours of your time going through this article, you should now have an end product. Now you should see a new item in your catalog for a Hadoop cluster. Once you deploy a new Hadoop cluster from the catalog it will show up in your Items tab.

New Hadoop as a Service Catalog Item

Actions Available to Provisioned Hadoop Cluster


In the future I hope to do some more work around some of the different Application Managers and how they integrate into Big Data Extensions.

Comments 6

  1. This is a great post. I am currently setting this up in my environment. It’s always great to see others share there experiences and detailed documentation. Thanks again!!!

  2. Great article, on my side with HortonWorks 2.1.1 I had an error at bootstrap (yum install hadoop -y) No paquets available even if my repo seemed correct …

    How add distribution choice in vRA (vCO workflow) ?

  3. Great article, would be great if the same article applies to BDE 2.3.x with vRA/vRO 7 version..

    1. Post
      1. Thanks JC, that would be great help for me, as I have already i have both BDE 2.3.1 and vRA/vRO 7 infrastructure and searching some good article/docs for integration

  4. Fantastic beat ! I woսld ⅼike tⲟ apprentice whilе yⲟu amend yoᥙr
    site, how cοuld i subscribe for ɑ blog site? TҺe account helped me a acceptable
    deal. Ⅰ had beᥱn tiny bit acquainted оf this your
    broadcast offered bright cleɑr concept

Leave a Reply to Josh Coen Cancel reply

Your email address will not be published. Required fields are marked *