TripleO in Hong Kong

A few weeks ago, I spent two and a half weeks in Hong Kong, preparing for, and then participating in, OpenStack Summit.

There, at the Red Hat booth, we did a live demo of using TripleO to deploy an 8-node overcloud. I arrived a week before the demo and worked with the fine folks at Tech-21 Systems to get everything installed. (They set up all of the hardware, provided by Quanta, and graciously let me use their office while I did the software setup. They even took me out for authentic dim sum.)

The demo

The rack of gear had 11 servers. I set one up as a utility machine (essentially a router / bastion host), and then we had a control node, a leaf node, and eight resource nodes that would be provisioned as the overcloud. The images we used are available here. Basically I booted the Undercloud-Control and Undercloud-Leaf images, did some quick configuration, imported the other images into glance, registered the other systems, and provisioned the machines. (In fairness, there’s still a lot of work to be done to make things go smoothly.)

While I gave live demos, we also prepared a recorded video of the process, in case the demo gods chose to smite us. Here’s the video:

Performance and scalability

Our demo involved a single controller node, and a single leaf node. The idea was to model a much larger deployment, where you’d have a central controller and then one leaf node per rack. The leaf node takes commands from the controller, and handles provisioning the machines in its rack. That makes keeping racks on isolated L2 domains easy, but also helps ensure things would scale. In our setup, with just one leaf node and one controller, the leaf node was not necessary.

Provisioning 8 nodes took us 10-15 minutes. Early on, we were seeing deploy times closer to 30 minutes, until I realized that the NIC used for provisioning had come up at 100 Mbps instead of Gigabit. Fixing that made a huge difference—pushing out the image jumped from about 80Mbps to about 800 Mbps. (I’m eager to see what happens with 10Gb Ethernet!)

10-15 minutes to provisioning eight machines seems pretty fast overall, but there’s still plenty of room to make this much faster. A surprising amount of the time was spent just waiting for machines to POST. (Stuff like getting the boot order right is also important in keeping this under control.) One problem I found is that the image deploy process from the leaf node seems to happen serially, rather than doing all of the machines in parallel. We were more or less bottlenecked on the network speed, so parallelizing that wouldn’t have necessarily led to a massive improvement, but it’s something that should be fixed.

OpenStack Summit in general


There were something like 3,500 people in attendance. I knew OpenStack was a substantial project, but seeing the crowd—and knowing that many of them travelled literally thousands of miles to be there—really drove home just how enormous the community is, and how quickly it is growing. And there was certainly no lack of enthusiasm.

I spent most of the Summit at the booth instead of in conferences. (We’ll have to work on this next time!) Most of the sessions were recorded, and can be watched here, though. Even not attending the sessions, it was a great opportunity to meet many of the people that I’ve only worked with online thus far.

Here’s a short video Red Hat put together of the Summit (in which I have a silent 3-second cameo—my big break into stardom, I think):

Fellow Red Hatter Steve Gordon captured a lot more of the Summit here.

Introducing Tuskar

I’ve been working on a new project we’re starting in the OpenStack community, Tuskar, a set of tools to help with deploying and managing large OpenStack installations.

We started with an internal, unadvertised prototype, the conclusion of which is demonstrated here:

I’m excited to note that we’re now in Stackforge, so everything is now open.

Tuskar is the back-end project, with tuskar-ui (where I’m spending most of my time) being the Django interface on top of it. We also have python-tuskarclient as a client library for Tuskar.

Come join us!

HP Cloud working with Aeolus

I’m happy to report that the latest code we’ve added, which adds OpenStack support to Aeolus and will ship with our next release, is working successfully with HP Cloud, expanding our repertoire of public clouds.

While the support should allow us to support all OpenStack-based public cloud providers, in practice the APIs various providers expose are often mutated enough to prevent them from working. Rackspace, for example, has modified its authentication API enough to prevent authentication with Deltacloud today. I had similar issues trying Internap’s AgileCLOUD, which is using the hAPI interface that Voxel provided. (I understand that there’s a proper OpenStack environment in the works, though.)

But enough about what doesn’t work — HP’s cloud service does work! Getting it set up took a little figuring out, though, so I wanted to share some details.

First things first, you’ll need to activate one or more of the Availability Zones for its Compute service:

Until at least one is activated, you’ll have a tough time authenticating and it won’t be apparent why. (Or, at least, this was my experience.)

Once in, you’ll want to head over to the API Keys section to (you guessed it) get your API keys. Here’s an example of what it might look like (with randomized values):

(Just to be clear, the keys and tenant information were artificially-generated for this screenshot.)

At the bottom is the Keystone entrypoint you’ll want to put in to set up the Provider:

This much is straightforward. Adding a Provider Account is a little more of an adventure.

Despite what their documentation may say, the only way I’ve been able to authenticate through Deltacloud has been with my username and the tenant name shown — not the API keys, and not the Tenant ID.

In the example above, my Tenant Name is “example-tenant1”, with a username of “example”. So in Conductor, I’d want to enter “example+example-tenant1”, since we need to join username and tenant name that way. Password is what you use to log into the account.

Here you’ll notice that I cheat — Glance URL is currently a required field in Conductor. As best as I can tell, HP Cloud does not currently expose Glance to users, so there is not actually a valid Glance URL available. I’ve opened an issue to fix this in Conductor, but for right this second I just used localhost:1234 which passes validation.

As this may imply, we don’t presently support building images for HP Cloud, either, though there’s work being done to allow snapshot-style builds (in which a minimal OS is booted on the cloud, customized in place, and then snapshotted). What does work today are image imports, though.

It took me a moment to figure out how to import a reference to an HP Cloud image, though.   If you view the Servers tab within an Availability Zone and click “Create a new server from an Image”, you’ll get a dialog like this:

The orangey-red arrow point to the image ID — 54021 for the first one, 78265 for a CentOS 6.3 image, etc. These integers are what you enter into Conductor to import an image:

With an image imported, the launch process is just like with other providers, and you’ll be able to download a generated keypair and ssh in.

Of course, the job isn’t finished. The ability to build and push images is important for our cross-cloud workflow, and it’s something that’s in progress. And the Glance URL process is quite broken. But, despite these headaches, it works — I’ve got an instance running there launched through Conductor.

Digital Ocean

I came across Digital Ocean today, and am fairly interested. (Though I’m not really planning on jumping from my current host.)

The premise is that they offer SSD-based virtual machines cloud servers. They’re not the only ones doing that, but their pricing is beyond competitive. The front page advertises a VM with 20GB of SSD storage and 512 MB RAM for $5/month. (And unmetered transfer.) Prices climb a bit as you go, but are pretty proportional — $20/month for a 2GB instance with 2 cores and 40GB of SSD-backed storage. That’s a very good deal — but almost frighteningly low, along the lines of “Would you like this 40-cent bottle of champagne?” in that it leaves me a bit worried about what’s “wrong”. (Though I’m yet to find anything.)

In not-very-scientific (nor real-world) tests, hdparm -t showed 310.29 MB/sec throughput (932MB in 3.00 seconds). Various speed tests give scattered results, from 2-6 MB/sec. (16-48 Mbps), though it’s entirely possible that the bottleneck was the remote server. I must say, though, that yum is faster than I have ever seen it before.

They do seem to block outbound ICMP, probably due to abuse problems. They also appear to block NTP, which is odd and makes me wonder what else is blocked.

I don’t plan on switching over any time soon, but at the same time, it’s tempting to think of $10/month as a reasonable expenditure if I find myself needing something to host the occasional app or whatnot.