Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi: "This contains two new features: - Stack file operations: this allows removal of several hacks from the VFS, proper interaction of read-only open files with copy-up, possibility to implement fs modifying ioctls properly, and others. - Metadata only copy-up: when file is on lower layer and only metadata is modified (except size) then only copy up the metadata and continue to use the data from the lower file" * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits) ovl: Enable metadata only feature ovl: Do not do metacopy only for ioctl modifying file attr ovl: Do not do metadata only copy-up for truncate operation ovl: add helper to force data copy-up ovl: Check redirect on index as well ovl: Set redirect on upper inode when it is linked ovl: Set redirect on metacopy files upon rename ovl: Do not set dentry type ORIGIN for broken hardlinks ovl: Add an inode flag OVL_CONST_INO ovl: Treat metacopy dentries as type OVL_PATH_MERGE ovl: Check redirects for metacopy files ovl: Move some dir related ovl_lookup_single() code in else block ovl: Do not expose metacopy only dentry from d_real() ovl: Open file with data except for the case of fsync ovl: Add helper ovl_inode_realdata() ovl: Store lower data inode in ovl_inode ovl: Fix ovl_getattr() to get number of blocks from lower ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry ovl: Copy up meta inode data from lowest data inode ovl: Modify ovl_lookup() and friends to lookup metacopy dentry ...
This commit is contained in:
@@ -21,8 +21,7 @@ prototypes:
|
||||
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
|
||||
struct vfsmount *(*d_automount)(struct path *path);
|
||||
int (*d_manage)(const struct path *, bool);
|
||||
struct dentry *(*d_real)(struct dentry *, const struct inode *,
|
||||
unsigned int, unsigned int);
|
||||
struct dentry *(*d_real)(struct dentry *, const struct inode *);
|
||||
|
||||
locking rules:
|
||||
rename_lock ->d_lock may block rcu-walk
|
||||
|
@@ -10,10 +10,6 @@ union-filesystems). An overlay-filesystem tries to present a
|
||||
filesystem which is the result over overlaying one filesystem on top
|
||||
of the other.
|
||||
|
||||
The result will inevitably fail to look exactly like a normal
|
||||
filesystem for various technical reasons. The expectation is that
|
||||
many use cases will be able to ignore these differences.
|
||||
|
||||
|
||||
Overlay objects
|
||||
---------------
|
||||
@@ -266,6 +262,30 @@ rightmost one and going left. In the above example lower1 will be the
|
||||
top, lower2 the middle and lower3 the bottom layer.
|
||||
|
||||
|
||||
Metadata only copy up
|
||||
--------------------
|
||||
|
||||
When metadata only copy up feature is enabled, overlayfs will only copy
|
||||
up metadata (as opposed to whole file), when a metadata specific operation
|
||||
like chown/chmod is performed. Full file will be copied up later when
|
||||
file is opened for WRITE operation.
|
||||
|
||||
In other words, this is delayed data copy up operation and data is copied
|
||||
up when there is a need to actually modify data.
|
||||
|
||||
There are multiple ways to enable/disable this feature. A config option
|
||||
CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
|
||||
by default. Or one can enable/disable it at module load time with module
|
||||
parameter metacopy=on/off. Lastly, there is also a per mount option
|
||||
metacopy=on/off to enable/disable this feature per mount.
|
||||
|
||||
Do not use metacopy=on with untrusted upper/lower directories. Otherwise
|
||||
it is possible that an attacker can create a handcrafted file with
|
||||
appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
|
||||
pointed by REDIRECT. This should not be possible on local system as setting
|
||||
"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
|
||||
for untrusted layers like from a pen drive.
|
||||
|
||||
Sharing and copying layers
|
||||
--------------------------
|
||||
|
||||
@@ -284,7 +304,7 @@ though it will not result in a crash or deadlock.
|
||||
Mounting an overlay using an upper layer path, where the upper layer path
|
||||
was previously used by another mounted overlay in combination with a
|
||||
different lower layer path, is allowed, unless the "inodes index" feature
|
||||
is enabled.
|
||||
or "metadata only copy up" feature is enabled.
|
||||
|
||||
With the "inodes index" feature, on the first time mount, an NFS file
|
||||
handle of the lower layer root directory, along with the UUID of the lower
|
||||
@@ -297,6 +317,10 @@ lower root origin, mount will fail with ESTALE. An overlayfs mount with
|
||||
does not support NFS export, lower filesystem does not have a valid UUID or
|
||||
if the upper filesystem does not support extended attributes.
|
||||
|
||||
For "metadata only copy up" feature there is no verification mechanism at
|
||||
mount time. So if same upper is mounted with different set of lower, mount
|
||||
probably will succeed but expect the unexpected later on. So don't do it.
|
||||
|
||||
It is quite a common practice to copy overlay layers to a different
|
||||
directory tree on the same or different underlying filesystem, and even
|
||||
to a different machine. With the "inodes index" feature, trying to mount
|
||||
@@ -306,27 +330,40 @@ the copied layers will fail the verification of the lower root file handle.
|
||||
Non-standard behavior
|
||||
---------------------
|
||||
|
||||
The copy_up operation essentially creates a new, identical file and
|
||||
moves it over to the old name. Any open files referring to this inode
|
||||
will access the old data.
|
||||
Overlayfs can now act as a POSIX compliant filesystem with the following
|
||||
features turned on:
|
||||
|
||||
The new file may be on a different filesystem, so both st_dev and st_ino
|
||||
of the real file may change. The values of st_dev and st_ino returned by
|
||||
stat(2) on an overlay object are often not the same as the real file
|
||||
stat(2) values to prevent the values from changing on copy_up.
|
||||
1) "redirect_dir"
|
||||
|
||||
Unless "xino" feature is enabled, when overlay layers are not all on the
|
||||
same underlying filesystem, the value of st_dev may be different for two
|
||||
non-directory objects in the same overlay filesystem and the value of
|
||||
st_ino for directory objects may be non persistent and could change even
|
||||
while the overlay filesystem is still mounted.
|
||||
Enabled with the mount option or module option: "redirect_dir=on" or with
|
||||
the kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y.
|
||||
|
||||
Unless "inode index" feature is enabled, if a file with multiple hard
|
||||
links is copied up, then this will "break" the link. Changes will not be
|
||||
propagated to other names referring to the same inode.
|
||||
If this feature is disabled, then rename(2) on a lower or merged directory
|
||||
will fail with EXDEV ("Invalid cross-device link").
|
||||
|
||||
Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
|
||||
directory will fail with EXDEV.
|
||||
2) "inode index"
|
||||
|
||||
Enabled with the mount option or module option "index=on" or with the
|
||||
kernel config option CONFIG_OVERLAY_FS_INDEX=y.
|
||||
|
||||
If this feature is disabled and a file with multiple hard links is copied
|
||||
up, then this will "break" the link. Changes will not be propagated to
|
||||
other names referring to the same inode.
|
||||
|
||||
3) "xino"
|
||||
|
||||
Enabled with the mount option "xino=auto" or "xino=on", with the module
|
||||
option "xino_auto=on" or with the kernel config option
|
||||
CONFIG_OVERLAY_FS_XINO_AUTO=y. Also implicitly enabled by using the same
|
||||
underlying filesystem for all layers making up the overlay.
|
||||
|
||||
If this feature is disabled or the underlying filesystem doesn't have
|
||||
enough free bits in the inode number, then overlayfs will not be able to
|
||||
guarantee that the values of st_ino and st_dev returned by stat(2) and the
|
||||
value of d_ino returned by readdir(3) will act like on a normal filesystem.
|
||||
E.g. the value of st_dev may be different for two objects in the same
|
||||
overlay filesystem and the value of st_ino for directory objects may not be
|
||||
persistent and could change even while the overlay filesystem is mounted.
|
||||
|
||||
|
||||
Changes to underlying filesystems
|
||||
|
@@ -989,8 +989,7 @@ struct dentry_operations {
|
||||
char *(*d_dname)(struct dentry *, char *, int);
|
||||
struct vfsmount *(*d_automount)(struct path *);
|
||||
int (*d_manage)(const struct path *, bool);
|
||||
struct dentry *(*d_real)(struct dentry *, const struct inode *,
|
||||
unsigned int, unsigned int);
|
||||
struct dentry *(*d_real)(struct dentry *, const struct inode *);
|
||||
};
|
||||
|
||||
d_revalidate: called when the VFS needs to revalidate a dentry. This
|
||||
@@ -1124,22 +1123,15 @@ struct dentry_operations {
|
||||
dentry being transited from.
|
||||
|
||||
d_real: overlay/union type filesystems implement this method to return one of
|
||||
the underlying dentries hidden by the overlay. It is used in three
|
||||
the underlying dentries hidden by the overlay. It is used in two
|
||||
different modes:
|
||||
|
||||
Called from open it may need to copy-up the file depending on the
|
||||
supplied open flags. This mode is selected with a non-zero flags
|
||||
argument. In this mode the d_real method can return an error.
|
||||
|
||||
Called from file_dentry() it returns the real dentry matching the inode
|
||||
argument. The real dentry may be from a lower layer already copied up,
|
||||
but still referenced from the file. This mode is selected with a
|
||||
non-NULL inode argument. This will always succeed.
|
||||
non-NULL inode argument.
|
||||
|
||||
With NULL inode and zero flags the topmost real underlying dentry is
|
||||
returned. This will always succeed.
|
||||
|
||||
This method is never called with both non-NULL inode and non-zero flags.
|
||||
With NULL inode the topmost real underlying dentry is returned.
|
||||
|
||||
Each dentry has a pointer to its parent dentry, as well as a hash list
|
||||
of child dentries. Child dentries are basically like files in a
|
||||
|
Reference in New Issue
Block a user