Lazy distro mirrors with squid

I have a problem that I think a lot of fellow developers probably have–I have enough computers (or virtual machines!) running the same operating system version(s) that I would benefit from a local mirror of them, but I don’t have so many systems that it’s actually reasonable for me to run a full mirror, which would entail rsyncing a bunch of content daily, much of which may be packages I would never use. And using a proxy server isn’t terribly practical, because with a bunch of semi-round-robin mirrors, it’s likely that two systems would pull the same package from different mirrors. A proxy server would have no way of knowing (ahead of time) that the two documents were actually the same.

What I wanted for a long time was a “lazy” mirror — something that would appear to my systems as a full mirror, but would act more as a proxy. When a client installed a particular version of a particular package for the first time, it would go fetch them from a “real” mirror, and then cache it for a long time. Subsequent requests for the same package from my “mirror” would be served from cache. I was convinced that this was impossible to do with a proxy server. Worse, I wanted to mirror multiple repos — Fedora and CentOS and EPEL, and maybe even Ubuntu. There’s no way squid can do that.

I was wrong. squid is pretty awesome. We just pull a few tricks:

  • Instead of using squid as a traditional proxy server that listens on port 3128, use it as a reverse proxy / accelerator that listens on port 80. (This is, incidentally, what sites like Wikipedia do.)
  • Abuse Massage the refresh_pattern rules to cache RPM files (etc.) for a very long time. Normally it is an awful, awful idea for proxy servers to do interfere with the Cache-Control / Expires headers that sites serve. But in the case of a mirror, we know that any updates to a package will necessarily bump the version number in the URL. Ergo, we can pretty safely cache RPMs indefinitely.
  • Set up name-based virtual hosting with squid, so that centos-mirror.lan and fedora-mirror.lan can point to different mirrors.

Two other important steps involve setting up cache_dir reasonably (by default, at least in the packages on CentOS 6, squid will only cache data in RAM), and bumping up maximum_object_size from the default of 4MB.

Here is the relevant section of my squid.conf. (The “irrelevant” section of my squid.conf is a bunch of acl lines that I haven’t really customized and can probably be deleted.)

# Listen on port 80, not 3128
# 'accel' tells squid that it's a reverse proxy
# 'defaultsite' sets the hostname that will be used if none is provided
# 'vhost' tells squid that it'll use name-based virtual hosting. I'm not
#   sure if this is actually needed.
http_port 80 accel defaultsite=mirror.lowell.lan vhost

# Create a disk-based cache of up to 10GB in size:
# (10000 is the size in MB. 16 and 256 seem to set how many subdirectories
#  are created, and are default values.)
cache_dir ufs /var/spool/squid 10000 16 256

# Use the LFUDA cache eviction policy -- Least Frequently Used, with
#  Dynamic Aging. http://www.squid-cache.org/Doc/config/cache_replacement_policy/
# It's more important to me to keep bigger files in cache than to keep
# more, smaller files -- I am optimizing for bandwidth savings, not latency.
cache_replacement_policy heap LFUDA

# Do unholy things with refresh_pattern.
# The top two are new lines, and probably aren't everything you would ever
# want to cache -- I don't account for VM images, .deb files, etc.
# They're cached for 129600 minutes, which is 90 days.
# refresh-ims and override-expire are described in the configuration here:
#  http://www.squid-cache.org/Doc/config/refresh_pattern/
# but basically, refresh-ims makes squid check with the backend server
# when someone does a conditional get, to be cautious.
# override-expire lets us override the specified expiry time. (This is
#  illegal per the RFC, but works for our specific purposes.)
# You will probably want to tune this part.
refresh_pattern -i .rpm$ 129600 100% 129600 refresh-ims override-expire
refresh_pattern -i .iso$ 129600 100% 129600 refresh-ims override-expire
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

# This is OH SO IMPORTANT: squid defaults to not caching objects over
# 4MB, which may be a reasonable default, but is awful behavior on our
# pseudo-mirror. Let's make it 4GB:
maximum_object_size 4096 MB

# Now, let's set up several mirrors. These work sort of like Apache
# name-based virtual hosts -- you get different content depending on
# which hostname you use in your request, even on the same IP. This lets
# us mirror more than one distro on the same machine.

# cache_peer is used here to set an upstream origin server:
#   'mirror.us.as6453.net' is the hostname of the mirror I connect to.
#   'parent' tells squid that that this is a 'parent' server, not a peer
#    '80 0' sets the HTTP port (80) and ICP port (0)
#    'no-query' stops ICP queries, which should only be used between squid servers
#    'originserver' tells squid that this is a server that originates content,
#      not another squid server.
#    'name=as6453' tags it with a name we use on the next line.
# cache_peer_domain is used for virtual hosting.
#    'as6453' is the name we set on the previous line (for cache_peer)
#    subsequent words are virtual hostnames it answers to. (This particular
#     mirror has Fedora and Debian content mirrored.) These are the hostnames
#     you set up and will use to access content.
# Taken together, these two lines tell squid that, when it gets a request for
#  content on fedora-mirror.lowell.lan or debian-mirror.lowell.lan, it should
#  route the request to mirror.us.as6453.net and cache the result.
cache_peer mirror.us.as6453.net parent 80 0 no-query originserver name=as6453
cache_peer_domain as6453 fedora-mirror.lowell.lan debian-mirror.lowell.lan

# Another, for CentOS:
cache_peer mirrors.seas.harvard.edu parent 80 0 no-query originserver name=harvard
cache_peer_domain harvard centos-mirror.lowell.lan

You will really want to customize this. The as6453.net and harvard.edu mirrors happen to be geographically close to me and very fast, but that might not be true for you. Check out the CentOS mirror list and Fedora mirror list to find something close by. (And perhaps fetch a file or two with wget to check speeds.) And I’m reasonably confident that you don’t have a lowell.lan domain in your home.

If you can find one mirror that has all the distros you need, you don’t need to bother with virtual hosts.

You can edit the respective repos in /etc/yum.repos.d/ to point to the hostnames you set up. Pay attention to whether the mirror matches the URL structure the file defaults to or not.

You can just drop the hostnames in /etc/hosts if you don’t have a home DNS server, e.g.,:

172.16.1.100 fedora-mirror.lowell.lan centos-mirror.lowell.lan

Software Circuit Breakers in Ruby

I found an interesting article in this week’s Ruby Weekly newsletter—a post from Martin Fowler about the circuit breaker concept in Ruby.

The idea is pretty simple, but pretty slick: wrap calls to external services that can fail in a ‘circuit breaker’, which will detect when the call is failing (or acting particularly slow) and short-circuit calls. In the simplest case, this can help avoid slow-downs when a non-critical remote service fails. For example, if you normally made an inline call to send a welcome email to new signups, you might fall back to just enqueuing the task if the mailserver call slows down—or perhaps just take them to a webpage with the same content.

In the best case, this can prevent cascading failures. Webpages make a blocking call to an external service, which goes down, thus filling up the queue of available application servers, thus leading to a service outage.

TripleO / Ironic Meetup in Sunnyvale

Last week a group of developers working on the OpenStack projects TripleO and Ironic convened in the sunny vale of Sunnyvale for a mid-cycle meetup.

Yahoo! Sunnyvale Campus

My focus was primarily on Ironic, though lots of discussion about TripleO happened. (Here is some Tuskar documentation, for example.) I thought it would be worthwhile to quickly summarize my experiences:

  • About 40 people turned out, including some really bright folks from HP, Yahoo!, Mirantis, Rackspace, and Red Hat. (And surely some others that I’ve temporarily forgotten—sorry!) Just meeting everyone I’ve been working with online was pretty valuable.
  • A whole ton of patches got rapidly tested and merged, since sitting in the same room instead of being on separate continents made it much more efficient. In fact, a lot of patches got written and merged.
  • We hit feature freeze Tuesday. On Monday, -2′s were given to bigger patches to ensure that we had time to review everything. The -2 will be lifted once development for Juno opens up. Some of the things bumped include:
  • Because of feature freeze across projects, the Ironic driver for Nova was temporarily copied into Ironic’s source tree so we can work on it there.
  • Described in the same email linked above, a lot of work went into extending CI coverage for Ironic though it hasn’t yet landed. This test integration will be necessary to graduate from incubation.
  • We also identified end-user documentation as an important task, one which is both required to graduate incubation and as something that can be done during feature freeze in addition to bugfixes. This Etherpad tries to outline what’s required.
  • A lot of whiteboarding was done around a ramdisk agent for Ironic. The idea is that nodes can boot up a richer agent to support more advanced configuration and lifecycle management. The link here goes to the introduction of a pretty interesting thread.

Fixing Alembic error with multiple heads

Today I went to run automated tests in my Ironic development setup, and tests failed with a slew of errors like this:

CommandError: Only a single head is supported. The script
 directory has multiple heads (due to branching), which must be 
 resolved by manually editing the revision files to form a linear 
 sequence. Run `alembic branches` to see the divergence(s).

This was quite baffling to me, and I at first assumed it was a git thing, given the references to branches and head.

I am still moderately confused. But here is how I fixed it.

Please note that: * I have little idea what I’m doing. This is what worked for me, not necessarily what someone clueful would do. * My fix involves trashing your database and recreating it. That’s perfectly acceptable for my test database, but proceed with caution.

Run alembic branches

The error told me to run alembic branches, but that failed:

$ alembic branches
  No config file 'alembic.ini' found, or file
  has no '[alembic]' section

The solution is to find your alembic.ini file. In Ironic, that’s ironic/db/sqlalchemy. cd into there, and now it will work:

$ alembic branches
None -> 2581ebaf0cb2 (branchpoint), initial migration
     -> 2581ebaf0cb2 -> 21b7629c61e7 (head), 
     -> 2581ebaf0cb2 -> 21b331f883ef (head), Add provision_updated_at

The alembic documentation talks a little bit about this, for further reading.

Look for .pyc files

Each of those hashes should point to a Python file in alembic/versions/.

I think where I went wrong is that I had switched branches and ended up with a .pyc file, but not matching .py. The name should match one of the hashes above. (I suppose you could safely just delete all the .pyc files, actually.)

Downgrade to alembic base

I’m actually not sure if this was necessary, but I did this: alembic downgrade base. If it bombs out with errors, I think we’ll fix them momentarily.

Delete the tables from your database

Now we’re basically going to start a new database.

I fired up mysql, ran USE ironic, and then deleted all the tables. You might have to play with ordering to get it right, due ot foreign key constraints.

Re-run alembic

In Ironic, we have ironic-dbsync to do this for us. If not using Ironic, I think alembic upgrade head is analogous.

And then you should be good. Hopefully.