I helped recover a Gluster setup that had gone bad today, and wanted to write up what I did because there’s precious little information out there on what’s going on. Note that I don’t consider myself a Gluster expert by any stretch.
The problem
The ticket actually came to me as an Apache permissions error:
[Thu Mar 03 07:37:58 2016] [error] [client 192.168.1.1] (5)Input/output error: file permissions deny server access: /var/www/html/customer/header.jpg
(Full disclosure: I’ve mucked with all the logs here to not reveal any customer data.)
We suspected that it might have something to do with Gluster, which turned out to be correct. This setup is a pair of servers running Gluster 3.0.
We looked at Gluster’s logs, where we saw a ton of stuff like this:
[2016-03-03 15:54:53] W [fuse-bridge.c:862:fuse_fd_cbk] glusterfs-fuse: 97931: OPEN() /customer/banner.png => -1 (Input/output error)
[2016-03-03 16:00:26] E [afr-self-heal-metadata.c:566:afr_sh_metadata_fix] replicate0: Unable to self-heal permissions/ownership of '/customer/style.css' (possible split-brain). Please fix the file on all backend volumes
Those are separate errors for separate files, but both share a good degree of brokenness.
The problem
For reasons I haven’t yet identified, but that I’m arbitrarily assuming is a Gluster bug, Gluster got into a split-brain situation on the metadata of those files. Debugging this was a bit of an adventure, because there’s little information out there on how to proceed.
getfattr
After a lot of digging and hair-pulling, I eventually came across this exchange on gluster-users that addresses an issue that looked like ours. The files appeared the same and to have the same permissions, but Gluster thought they mismatched.
Gluster stores some information in extended attributes, or xattrs, on each file. This happens on the brick, not on the mounted Gluster filesystem. You can examine that with the getfattr
tool. Some of the attributes are named trusted.afr.<brickname>
for each host. As Jeff explains in that gluster-users post:
The [trusted.afr] values are arrays of 32-bit counters for how many updates we believe are still pending at each brick.
In that example, their values were:
[root at ca1.sg1 /]# getfattr -m . -d -e hex /data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from absolute path names # file: data/gluster/lfd/techstudiolfc/pub trusted.afr.shared-client-0=0x000000000000000000000000 trusted.afr.shared-client-1=0x000000000000001d00000000 trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2 [root at ca2.sg1 /]# getfattr -m . -d -e hex /data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from absolute path names # file: data/gluster/lfd/techstudiolfc/pub trusted.afr.shared-client-0=0x000000000000000300000000 trusted.afr.shared-client-1=0x000000000000000000000000 trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2
Note that their values disagree. One sees shared-client-1 with a “1d” value in the middle; the other sees shared-client-0 with a “03” in the middle. Jeff explains:
Here we see that ca1 (presumably corresponding to client-0) has a count of 0x1d for client-1 (presumably corresponding to ca2). In other words, ca1 saw 29 updates that it doesn’t know completed at ca2. At the same time, ca2 saw 3 operations that it doesn’t know completed at ca1. When there seem to be updates that need to be propagated in both directions, we don’t know which ones should superseded which others, so we call it split brain and decline to do anything lest we cause data loss.
Red Hat has a knowledgebase article on this, though it’s behind a paywall.
If you run getfattr and have no output, you’re probably running it on the shared Gluster filesystem, not on the local machine’s brick. Run it against the brick.
Fixing it
Don’t just skip here; this isn’t a copy-and-paste fix. :)
To fix this, you want to remove the offending xattrs from one of the split-brain node’s bricks, and then stat the file to get it to automatically self-heal.
Use the trusted.afr.whatever
values. I unset all of them, one per brick—but remember, only do this on one node! Don’t run it on both!
In our case, we had one node that looked like this:
trusted.afr.remote1315012=0x000000000000000000000000 trusted.afr.remote1315013=0x000000010000000100000000
And the other looked like this:
trusted.afr.remote1315012=0x000000010000000100000000 trusted.afr.remote1315013=0x000000000000000000000000
(Note here that the same two values appears on both hosts, but not for the same keys. One has the 1’s on remote1315013, and one sees them on remote1315012.)
Since it’s not like one is ‘right’ and the other is ‘wrong’, on one of the two nodes, I unset both xattrs, using setfattr
:
setfattr -x trusted.afr.remote1315012 /mnt/brick1315013/customer/header.jpg setfattr -x trusted.afr.remote1315013 /mnt/brick1315013/customer/header.jpg
I ran the getfattr
command again to make sure the attributes had disappeared. (Remember: this is on the brick, not the presented Gluster filesystem.)
Then, simply stat
the file on the mounted Gluster filesystem on that node, and it should automatically correct the missing attributes, filling them in from the other node. You can verify again withgetfattr
against the brick.
If this happens for a bunch of files, you can simply script it.