Aeolus Project | ma.ttwagner.com

For Aeolus, we’ve been talking a lot lately about how to improve the user interface and make things generally more intuitive.

One thing I noticed the other day was more of a general difference between desktop apps and many web apps — the presence of explanatory text on each page, often with a sort of conversational tone. In a desktop app, you might be presented with a form with various controls that you’re expected to understand. In a web app, you might get asked, “Hey bro, now that you’ve X’ed the Y, it’s time to choose your Z!”

We probably don’t want to address anyone as “bro,” but I’ve been thinking a bit about the concept of what this might look like in Conductor. We have a variety of forms users are asked to fill out to complete certain actions, and it’s not always immediately obvious. And for someone new to Conductor, our teminology isn’t always intuitive, either. (And for intermediate users, it’s still easy to get a little bit confused from time to time.)

Here’s an example of what I’m thinking. When you go to create a deployable, here’s what you see (click for full-size):

That’s great for power users, but what if you’re not one? Wouldn’t it be nice if there was a little bit of a description? (And if you are a power user, be honest: you’re not going to read the instructions anyway, so they won’t do you any harm.) What about something like this:

The text there is really just a quick example, so please don’t worry too much about the exact tone or what we should do for styling of the text. For now I just threw a %p tag at the top of the page and took a quick stab at telling the user what they’re meant to do here. We should probably do better than I did, and actually explain what a “Deployable XML file” is, and how they could create one.

What do you think about this? Are there best practices for this sort of thing?

This Netflix blog post starts off sounding like an April Fools prank:

We have found that the best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient. We are excited to make a long-awaited announcement today that will help others who embrace this approach.
We have written about our Simian Army in the past and we are now proud to announce that the source code for the founding member of the Simian Army, Chaos Monkey, is available to the community.
Do you think your applications can handle a troop of mischievous monkeys loose in your infrastructure? Now you can find out.

But it turns out it’s not a prank. And it’s actually pretty neat. It’s a script that runs during standard business hours (to make sure people are in the office), randomly killing instances in Netflix’s Auto-Scaling Groups on EC2. But Chaos Monkey isn’t to cause headaches. Or maybe it is, but in the short term. By forcing random failures when plenty of people are on call and no one is being woken up in the middle of the night, Netflix has been able to find all the little bits of a high-available system that aren’t actually highly-available: the type of things you normally only discover when everything goes catastrophically wrong.

Can your infrastructure handle the Chaos Monkey?

ma.ttwagner.com

Matt Wagner's blog

Category Archives: Aeolus Project

Musing: Conversational Guides

Netflix’s Chaos Monkey