ext4: xattr inode deduplication

Ext4 now supports xattr values that are up to 64k in size (vfs limit).
Large xattr values are stored in external inodes each one holding a
single value. Once written the data blocks of these inodes are immutable.

The real world use cases are expected to have a lot of value duplication
such as inherited acls etc. To reduce data duplication on disk, this patch
implements a deduplicator that allows sharing of xattr inodes.

The deduplication is based on an in-memory hash lookup that is a best
effort sharing scheme. When a xattr inode is read from disk (i.e.
getxattr() call), its crc32c hash is added to a hash table. Before
creating a new xattr inode for a value being set, the hash table is
checked to see if an existing inode holds an identical value. If such an
inode is found, the ref count on that inode is incremented. On value
removal the ref count is decremented and if it reaches zero the inode is
deleted.

The quota charging for such inodes is manually managed. Every reference
holder is charged the full size as if there was no sharing happening.
This is consistent with how xattr blocks are also charged.

[ Fixed up journal credits calculation to handle inline data and the
  rare case where an shared xattr block can get freed when two thread
  race on breaking the xattr block sharing. --tytso ]

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit is contained in:
Tahsin Erdogan
2017-06-22 11:44:55 -04:00
committed by Theodore Ts'o
parent 30a7eb970c
commit dec214d00e
7 changed files with 867 additions and 299 deletions

View File

@@ -13,10 +13,11 @@
* mb_cache_entry_delete()).
*
* Ext2 and ext4 use this cache for deduplication of extended attribute blocks.
* They use hash of a block contents as a key and block number as a value.
* That's why keys need not be unique (different xattr blocks may end up having
* the same hash). However block number always uniquely identifies a cache
* entry.
* Ext4 also uses it for deduplication of xattr values stored in inodes.
* They use hash of data as a key and provide a value that may represent a
* block or inode number. That's why keys need not be unique (hash of different
* data may be the same). However user provided value always uniquely
* identifies a cache entry.
*
* We provide functions for creation and removal of entries, search by key,
* and a special "delete entry with given key-value pair" operation. Fixed