Fixing Gluster’s replicate0: Unable to self-heal permissions/ownership error

I helped recover a Gluster setup that had gone bad today, and wanted to write up what I did because there’s precious little information out there on what’s going on. Note that I don’t consider myself a Gluster expert by any stretch.

The problem

The ticket actually came to me as an Apache permissions error:

[Thu Mar 03 07:37:58 2016] [error] [client 192.168.1.1] (5)Input/output error: file permissions deny server access: /var/www/html/customer/header.jpg

(Full disclosure: I’ve mucked with all the logs here to not reveal any customer data.)

We suspected that it might have something to do with Gluster, which turned out to be correct. This setup is a pair of servers running Gluster 3.0.

We looked at Gluster’s logs, where we saw a ton of stuff like this:

[2016-03-03 15:54:53] W [fuse-bridge.c:862:fuse_fd_cbk] glusterfs-fuse: 97931: OPEN() /customer/banner.png => -1 (Input/output error)
[2016-03-03 16:00:26] E [afr-self-heal-metadata.c:566:afr_sh_metadata_fix] replicate0: Unable to self-heal permissions/ownership of '/customer/style.css' (possible split-brain). Please fix the file on all backend volumes

Those are separate errors for separate files, but both share a good degree of brokenness.

The problem

For reasons I haven’t yet identified, but that I’m arbitrarily assuming is a Gluster bug, Gluster got into a split-brain situation on the metadata of those files. Debugging this was a bit of an adventure, because there’s little information out there on how to proceed.

getfattr

After a lot of digging and hair-pulling, I eventually came across this exchange on gluster-users that addresses an issue that looked like ours. The files appeared the same and to have the same permissions, but Gluster thought they mismatched.

Gluster stores some information in extended attributes, or xattrs, on each file. This happens on the brick, not on the mounted Gluster filesystem. You can examine that with the getfattr tool. Some of the attributes are named trusted.afr.<brickname> for each host. As Jeff explains in that gluster-users post:

The [trusted.afr] values are arrays of 32-bit counters for how many updates we believe are still pending at each brick.

In that example, their values were:

[root at ca1.sg1 /]# getfattr -m . -d -e hex
/data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from
absolute path names # file: data/gluster/lfd/techstudiolfc/pub 
trusted.afr.shared-client-0=0x000000000000000000000000 
trusted.afr.shared-client-1=0x000000000000001d00000000
trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2


[root at ca2.sg1 /]# getfattr -m . -d -e hex
/data/gluster/lfd/techstudiolfc/pub getfattr: Removing leading '/' from
absolute path names # file: data/gluster/lfd/techstudiolfc/pub 
trusted.afr.shared-client-0=0x000000000000000300000000
trusted.afr.shared-client-1=0x000000000000000000000000 
trusted.gfid=0x3700ee06f8f74ebc853ee8277c107ec2

Note that their values disagree. One sees shared-client-1 with a “1d” value in the middle; the other sees shared-client-0 with a “03” in the middle. Jeff explains:

Here we see that ca1 (presumably corresponding to client-0) has a count of 0x1d for client-1 (presumably corresponding to ca2). In other words, ca1 saw 29 updates that it doesn’t know completed at ca2. At the same time, ca2 saw 3 operations that it doesn’t know completed at ca1. When there seem to be updates that need to be propagated in both directions, we don’t know which ones should superseded which others, so we call it split brain and decline to do anything lest we cause data loss.

Red Hat has a knowledgebase article on this, though it’s behind a paywall.

If you run getfattr and have no output, you’re probably running it on the shared Gluster filesystem, not on the local machine’s brick. Run it against the brick.

Fixing it

Don’t just skip here; this isn’t a copy-and-paste fix. :)

To fix this, you want to remove the offending xattrs from one of the split-brain node’s bricks, and then stat the file to get it to automatically self-heal.

Use the trusted.afr.whatever values. I unset all of them, one per brickā€”but remember, only do this on one node! Don’t run it on both!

In our case, we had one node that looked like this:

trusted.afr.remote1315012=0x000000000000000000000000
trusted.afr.remote1315013=0x000000010000000100000000

And the other looked like this:

trusted.afr.remote1315012=0x000000010000000100000000
trusted.afr.remote1315013=0x000000000000000000000000

(Note here that the same two values appears on both hosts, but not for the same keys. One has the 1’s on remote1315013, and one sees them on remote1315012.)

Since it’s not like one is ‘right’ and the other is ‘wrong’, on one of the two nodes, I unset both xattrs, using setfattr:

setfattr -x trusted.afr.remote1315012 /mnt/brick1315013/customer/header.jpg
setfattr -x trusted.afr.remote1315013 /mnt/brick1315013/customer/header.jpg

I ran the getfattr command again to make sure the attributes had disappeared. (Remember: this is on the brick, not the presented Gluster filesystem.)

Then, simply stat the file on the mounted Gluster filesystem on that node, and it should automatically correct the missing attributes, filling them in from the other node. You can verify again withgetfattr against the brick.

If this happens for a bunch of files, you can simply script it.

One thought on “Fixing Gluster’s replicate0: Unable to self-heal permissions/ownership error

  1. What version of Gluster are you running? If I understand it correctly, recent versions of Gluster no longer use stat() to regenerate gfids, it’s just part of the internal self-heal process?

    Brett

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax