Setting up replication to an RDS instance

This is basically the official RDS documentation rephrased in a way that makes sense to my brain. These will take data from a “normal” MySQL server (e.g., installed by you into an EC2 instance) and import it into an RDS instance, then enable replication. Their instructions are correct, but caused me a good bit of confusion and didn’t prepare me for some gotchas.

You’ll have two instances, which I’ll refer to as such:

  • Master, the non-RDS instance (Amazon calls this the “Replication Source”)
  • Slave, the RDS instance which will pull data from the master

Launch a slave RDS instance

This one is normal. Log into AWS, and start up an RDS instance. Amazon says that you should not enable multi-AZ support until the import is complete. I missed that detail, and importing my trivial (one row in one table, for testing) database went fine. They’re probably right, though. Don’t forget the credentials you create! For this post, I used ‘dbuser’ as a username, and ‘dbpassword’ as a password. (Obviously, use something better in the real world.)

Make sure to get security groups / VPC ACLs right. I put them in the same VPC, and just enabled 3306 all around and it was good. They have more detailed instructions in the docs.

Configure the master

You’ll need to do several things on the master:

Enable binlogs and set a server-id

MySQL requires that a binary log (binlog) be used before replication is possible. You also need to set a server-id parameter, with a unique ID.

I just dropped this in the [mysqld] section of /etc/mysql.conf:

log-bin=mysql-bin
server-id=101

If this is the only server, server-id doesn’t really matter.

You need to service mysqld restart for this to apply.

Add a replication user

This one wasn’t abundantly clear to me. You need to add a replication user to the master, which the slave will use.

You’ll want the following two statements (with this example taken direclty from the MySQL docs): CREATE USER 'repl'@'%.mydomain.com' IDENTIFIED BY 'slavepass';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%.mydomain.com';

Obviously, customize the hostname part. I just used ‘%’ because I was doing a POC test in a VPC, but that should be locked down for anything real.

Export a DB dump

Use mysqldump to create a snapshot.

I just wanted to copy one database, so I ran something like this:

mysqldump -u root -p --database test_db1 --master-data > test_db1.dump

That will prompt for a password, and then write a dump of the database to test_db1.dump. Next, we’ll import this.

Import the dump to RDS

Hopefully by now the RDS instance has come online. Test that you can connect to it over MySQL. (Note: you cannot ssh into the RDS node. It only exposes MySQL as a service.)

We now want to import that database dump, and then we can start replication. But first, we need to tweak one thing in the dump we just created!

With --master-data, a line like this is written near the top of the dump file: CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=107;

I had to remove that line, or else I got this error:

Access denied; you need (at least one of) the SUPER privilege(s) for this operation

With that fixed, it’s time to import the data. The thing that’s not necessarily intuitive is that you want to run the MySQL client from your existing database server, and use -h to specify a remote hostname. You can’t ssh to the RDS instance and run it locally, because they don’t have ssh enabled. Here’s the command I used:

mysql -u dbuser -p -h REDACTED.us-east-1.rds.amazonaws.com < test_db1.dump

Enable replication

With the old database imported on RDS, it’s time to enable replication to get it to sync up with anything since the dump was taken, and then stay current. Since we don’t have ssh access, Amazon gives us a few custom procedures in MySQL we can run.

Connect to MySQL on your RDS slave (e.g., mysql -u dbuser -p -h REDACTED.us-east-1.rds.amazonaws.com or whatever).

In that MySQL shell, use their mysql.rds_set_external_master procedure, by running something like this (read the docs for more details):

CALL mysql.rds_set_external_master (
'REDACTED.us-east-1.rds.amazonaws.com',
'3306',
'repl',
'repl-password',
'mysql-bin',
'00001',
'0'
);

It’s important to note that you need to use the credentials for the replication user you created, not the normal admin credentials.

Once that’s configured, start replication, with mysql.rds_start_replication. That one is much simpler, as it doesn’t take any arguments:

CALL mysql.rds_start_replication;

Then, you can run SHOW SLAVE STATUS\G to view the replication status. If all went well, there will be no errors. Yay! You can skip replication errors with another procedure they implement, mysql.rds_skip_repl_error, though ideally that won’t be necessary.

At this point, data inserted to the master should show up on the slave automatically. (Don’t insert rows into the slave yet, or you’ll end up with a real mess!)

Promote the slave

Amazon provides those instructions for the purposes of importing a database, then cutting over to use the RDS node as a master. When the RDS slave is cut over and your application is ready, you can stop replication, decommission the master, and start using the RDS slave as your master.

There are two procedures you’ll be interested in here; mysql.rds_stop_replication and mysql.rds_reset_external_master to unset the master information. Remember to clean up security groups, the old master, etc.

Quick-start with Gluster on AWS

I wanted to play around with Gluster a bit, and EC2 has gotten cheap enough that it makes sense to spin up a few instances. My goal is simple: set up Gluster running on two servers in different regions, and see how everything works between them. This is in no way a production-ready guide, or even necessarily good practice. But I found the official guides lacking and confusing. (For reference, they have a Really, Really Quick Start Guide and also one tailored to EC2. Both took some tweaking. Here’s what I did:

  • Start two EC2 instances. I used “Amazon Linux” on a t2.micro, and started one each in Sydney and Oregon. (Using different regions is in no way required; I’m doing that because I’m specifically curious how it will behave in that case.)
  • Configure the security groups from the outset. Every node needs access to every other node on the following ports (this was different for older versins):
    • TCP and UDP 111 (portmap)
    • TCP 49152
    • TCP 24007-24008
  • Create a 5GB (or whatever you like, really) EBS volume for each instance; attach them. This will be our ‘brick’ that Gluster uses.
  • Pop this in /etc/yum.repos.d/glusterfs-epel.repo:
# Place this file in your /etc/yum.repos.d/ directory

[glusterfs-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-6/$basearch/
enabled=1
skip_if_unavailable=1
gpgcheck=0

[glusterfs-noarch-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-6/noarch
enabled=1
skip_if_unavailable=1
gpgcheck=0

[glusterfs-source-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes. - Source
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-6/SRPMS
enabled=0
skip_if_unavailable=1
gpgcheck=0
  • sudo yum install glusterfs glusterfs-fuse glusterfs-geo-replication glusterfs-server. This should pull in the necessary dependencies.
  • Now, set up those volumes:
    • sudo fdisk /dev/sdf (or whatever it was attached as); create a partition spanning the disk
    • Create a filesystem on it; I used sudo mkfs.ext4 /dev/sdf1 for now
  • Create a mountpoint; mount
sudo mkdir -p /exports/sdf1
sudo mount /dev/sdf1 /exports/sdf1
sudo mkdir -p /exports/sdf1/brick
  • Edit /etc/fstab and add the appropriate line, like:
/dev/sdf1   /exports/sdf1 ext4  defaults        0   0
  • Start gluster on each node; sudo service glusterd start
  • Peer detection… This tripped me up big time. The only way I got this to work was by creating fake hostnames for each box in /etc/hosts. I used gluster01 and gluster02 for names. /etc/hosts/ mapped gluster01 to 127.0.0.1 on gluster01, and gluster02 to 127.0.0.1 on gluster02. Then, from one node (it doesn’t matter which), detect the other by its hostname that you just used. You don’t need to repeat from the other host; they’ll see each other.
  • Create the volume, replication level 2 (nodes), one one of them:
sudo gluster volume create test1 rep 2 gluster01:/exports/sdf1/brick gluster02:/exports/sdf1/brick

This will fail miserably if you didn’t get the hostname thing right. You can’t do it by public IP, and you can’t directly use localhost. If it works right, you’ll see “volume create: test1: success: please start the volume to access data”. So, let’s do that.

  • sudo gluster volume start test1 (you can then inspect it with sudo gluster volume status)
  • Now, mount it. On each box; sudo mkdir /mnt/storage. Then, on each box, mount it with a reference to one of the Gluster nodes: sudo mount -t glusterfs gluster01:test1 /mnt/storage (you could use gluster01:test1 or gluster02:test1, either will find the right volume). This may take a big if it’s going across oceans. * cd into /mnt/storage, create a file, and see that it appears on the other. Magic!

Please keep in mind that this was the bare minimum for a cobbled-together test, and is surely not a good production setup.

Also, replicating Gluster between Sydney and Oregon is horribly slow. Don’t do that! Even when it’s not across continents, Gluster doesn’t do well across a WAN.

GPG/PGP Keysigning

I just got back from this year’s OpenStack Summit, which was a great experience. In addition to many fruitful sessions about OpenStack itself, a keysigning party was held. This was the first such session I’ve attended, and the use of PKI for signing/encrypting mail is something that’s only recently drawn my interest.

One thing that I find interesting is that there’s no central authority from which keys derive trust, unlike SSL in browsers. Instead, it’s a web-of-trust model. Individuals cryptographically sign each others’ public keys to denote trust in them. If you’ve verified my key, and I sign Bob’s key saying I’ve verified it, then, if you trust me, you can trust Bob’s key.

At the keysigning party, we used the Sassman Projected Method, in which we each stood up, presented something like a passport on the projector, and verbally verified that the list of key fingerprints compiled before the event was valid. (We also verified the MD5 and SHA sums of the list itself before beginning, so that we knew we were working with the same list.)

GPG setup notes

I’m not going to cover the basics, because myriad other sources already do a much better job. But a few helpful hints for your gpg.conf:

  • You can set a default-key value if you have more than one key.
  • Ensure that require-cross-certification is present

You may also want to set up a keyserver different from the default. Here is what I have:

keyserver hkps://hkps.pool.sks-keyservers.net
keyserver-options ca-cert-file=~/.gnupg/sks-keyservers.netCA.pem
keyserver-options auto-key-retrieve
keyserver-options no-honor-keyserver-url

This uses the SKS Keyservers pool, a pool of almost 100 keyservers that all exchange keys. More specifically, it selects the HKPS one, running SSL on port 443. To use this, you must grab their self-signed SSL certificate. (Note that the use of SSL is mostly to prevent a middleman from eavesdropping than tampering with your keys—that security comes through the keys themselves.)

The auto-key-retrieve option is so that when I get new email in mutt with a key I haven’t seen before, it will be fetched automatically. The no-honor-keyserver-url ensures that we always use our HKPS-enabled one, even if a key points to another server, so we ensure we stay on HKPS.

Keysigning Process

caff automates much of this. On Fedora, it’s provided by pgp-tools.

  • After installing it, run caff once to have it generate a ~/.caffrc file.
  • Edit ~/.caffrc to taste:
    • Make sure that $CONFIG{'owner} and $CONFIG{'email'} are set properly.
    • If your machine doesn’t run a properly-configured MTA, add a line to relay mail through a mailserver, like so: $CONFIG{'mailer-send'} = [ 'smtp.corp.example.com'].

caff maintains its own gpg.conf file, in ~/.caff/gnupghome/. You may want to customize it, or just symlink your main one to it. Partly because I missed exactly what was happening at first, I instead imported keys to my normal keyring, and just pointed caff to that keyring. I used -R to prevent it from fetching keys, and --key-file ~/.gnupg/pubring.gpg to pull from my normal keyring. This probably made things more difficult than needed.

One thing that took me a moment was how to look up a fingerprint. For example, if my key fingerprint is 5150 9442 00FE 3099 4CA8 D2EA E639 859C 2BE0 2E05, how do I look that up? It turns out to be simple: take the last eight characters (2BE02E05), prepend 0x, and search.

So my workflow was:

gpg2 --search-keys 0x2be02e05 # and import
caff -R --key-file ~/.gnupg/pubring.gpg 0x2be02e05 # and follow steps

Of course, be sure that the fingerprint matches, and that you’ve validated the person’s identity in real life before signing. Once you run caff, it will have you sign the key and email it to each address on file.

Other stuff