Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux: nfsd: remove IS_ISMNDLCK macro Rework /proc/locks via seq_files and seq_list helpers fs/locks.c: use list_for_each_entry() instead of list_for_each() NFS: clean up explicit check for mandatory locks AFS: clean up explicit check for mandatory locks 9PFS: clean up explicit check for mandatory locks GFS2: clean up explicit check for mandatory locks Cleanup macros for distinguishing mandatory locks Documentation: move locks.txt in filesystems/ locks: add warning about mandatory locking races Documentation: move mandatory locking documentation to filesystems/ locks: Fix potential OOPS in generic_setlease() Use list_first_entry in locks_wake_up_blocks locks: fix flock_lock_file() comment Memory shortage can result in inconsistent flocks state locks: kill redundant local variable locks: reverse order of posix_locks_conflict() arguments
This commit is contained in:
@@ -52,6 +52,10 @@ isofs.txt
|
||||
- info and mount options for the ISO 9660 (CDROM) filesystem.
|
||||
jfs.txt
|
||||
- info and mount options for the JFS filesystem.
|
||||
locks.txt
|
||||
- info on file locking implementations, flock() vs. fcntl(), etc.
|
||||
mandatory-locking.txt
|
||||
- info on the Linux implementation of Sys V mandatory file locking.
|
||||
ncpfs.txt
|
||||
- info on Novell Netware(tm) filesystem using NCP protocol.
|
||||
ntfs.txt
|
||||
|
67
Documentation/filesystems/locks.txt
Normal file
67
Documentation/filesystems/locks.txt
Normal file
@@ -0,0 +1,67 @@
|
||||
File Locking Release Notes
|
||||
|
||||
Andy Walker <andy@lysaker.kvaerner.no>
|
||||
|
||||
12 May 1997
|
||||
|
||||
|
||||
1. What's New?
|
||||
--------------
|
||||
|
||||
1.1 Broken Flock Emulation
|
||||
--------------------------
|
||||
|
||||
The old flock(2) emulation in the kernel was swapped for proper BSD
|
||||
compatible flock(2) support in the 1.3.x series of kernels. With the
|
||||
release of the 2.1.x kernel series, support for the old emulation has
|
||||
been totally removed, so that we don't need to carry this baggage
|
||||
forever.
|
||||
|
||||
This should not cause problems for anybody, since everybody using a
|
||||
2.1.x kernel should have updated their C library to a suitable version
|
||||
anyway (see the file "Documentation/Changes".)
|
||||
|
||||
1.2 Allow Mixed Locks Again
|
||||
---------------------------
|
||||
|
||||
1.2.1 Typical Problems - Sendmail
|
||||
---------------------------------
|
||||
Because sendmail was unable to use the old flock() emulation, many sendmail
|
||||
installations use fcntl() instead of flock(). This is true of Slackware 3.0
|
||||
for example. This gave rise to some other subtle problems if sendmail was
|
||||
configured to rebuild the alias file. Sendmail tried to lock the aliases.dir
|
||||
file with fcntl() at the same time as the GDBM routines tried to lock this
|
||||
file with flock(). With pre 1.3.96 kernels this could result in deadlocks that,
|
||||
over time, or under a very heavy mail load, would eventually cause the kernel
|
||||
to lock solid with deadlocked processes.
|
||||
|
||||
|
||||
1.2.2 The Solution
|
||||
------------------
|
||||
The solution I have chosen, after much experimentation and discussion,
|
||||
is to make flock() and fcntl() locks oblivious to each other. Both can
|
||||
exists, and neither will have any effect on the other.
|
||||
|
||||
I wanted the two lock styles to be cooperative, but there were so many
|
||||
race and deadlock conditions that the current solution was the only
|
||||
practical one. It puts us in the same position as, for example, SunOS
|
||||
4.1.x and several other commercial Unices. The only OS's that support
|
||||
cooperative flock()/fcntl() are those that emulate flock() using
|
||||
fcntl(), with all the problems that implies.
|
||||
|
||||
|
||||
1.3 Mandatory Locking As A Mount Option
|
||||
---------------------------------------
|
||||
|
||||
Mandatory locking, as described in 'Documentation/filesystems/mandatory.txt'
|
||||
was prior to this release a general configuration option that was valid for
|
||||
all mounted filesystems. This had a number of inherent dangers, not the
|
||||
least of which was the ability to freeze an NFS server by asking it to read
|
||||
a file for which a mandatory lock existed.
|
||||
|
||||
From this release of the kernel, mandatory locking can be turned on and off
|
||||
on a per-filesystem basis, using the mount options 'mand' and 'nomand'.
|
||||
The default is to disallow mandatory locking. The intention is that
|
||||
mandatory locking only be enabled on a local filesystem as the specific need
|
||||
arises.
|
||||
|
171
Documentation/filesystems/mandatory-locking.txt
Normal file
171
Documentation/filesystems/mandatory-locking.txt
Normal file
@@ -0,0 +1,171 @@
|
||||
Mandatory File Locking For The Linux Operating System
|
||||
|
||||
Andy Walker <andy@lysaker.kvaerner.no>
|
||||
|
||||
15 April 1996
|
||||
(Updated September 2007)
|
||||
|
||||
0. Why you should avoid mandatory locking
|
||||
-----------------------------------------
|
||||
|
||||
The Linux implementation is prey to a number of difficult-to-fix race
|
||||
conditions which in practice make it not dependable:
|
||||
|
||||
- The write system call checks for a mandatory lock only once
|
||||
at its start. It is therefore possible for a lock request to
|
||||
be granted after this check but before the data is modified.
|
||||
A process may then see file data change even while a mandatory
|
||||
lock was held.
|
||||
- Similarly, an exclusive lock may be granted on a file after
|
||||
the kernel has decided to proceed with a read, but before the
|
||||
read has actually completed, and the reading process may see
|
||||
the file data in a state which should not have been visible
|
||||
to it.
|
||||
- Similar races make the claimed mutual exclusion between lock
|
||||
and mmap similarly unreliable.
|
||||
|
||||
1. What is mandatory locking?
|
||||
------------------------------
|
||||
|
||||
Mandatory locking is kernel enforced file locking, as opposed to the more usual
|
||||
cooperative file locking used to guarantee sequential access to files among
|
||||
processes. File locks are applied using the flock() and fcntl() system calls
|
||||
(and the lockf() library routine which is a wrapper around fcntl().) It is
|
||||
normally a process' responsibility to check for locks on a file it wishes to
|
||||
update, before applying its own lock, updating the file and unlocking it again.
|
||||
The most commonly used example of this (and in the case of sendmail, the most
|
||||
troublesome) is access to a user's mailbox. The mail user agent and the mail
|
||||
transfer agent must guard against updating the mailbox at the same time, and
|
||||
prevent reading the mailbox while it is being updated.
|
||||
|
||||
In a perfect world all processes would use and honour a cooperative, or
|
||||
"advisory" locking scheme. However, the world isn't perfect, and there's
|
||||
a lot of poorly written code out there.
|
||||
|
||||
In trying to address this problem, the designers of System V UNIX came up
|
||||
with a "mandatory" locking scheme, whereby the operating system kernel would
|
||||
block attempts by a process to write to a file that another process holds a
|
||||
"read" -or- "shared" lock on, and block attempts to both read and write to a
|
||||
file that a process holds a "write " -or- "exclusive" lock on.
|
||||
|
||||
The System V mandatory locking scheme was intended to have as little impact as
|
||||
possible on existing user code. The scheme is based on marking individual files
|
||||
as candidates for mandatory locking, and using the existing fcntl()/lockf()
|
||||
interface for applying locks just as if they were normal, advisory locks.
|
||||
|
||||
Note 1: In saying "file" in the paragraphs above I am actually not telling
|
||||
the whole truth. System V locking is based on fcntl(). The granularity of
|
||||
fcntl() is such that it allows the locking of byte ranges in files, in addition
|
||||
to entire files, so the mandatory locking rules also have byte level
|
||||
granularity.
|
||||
|
||||
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
|
||||
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
||||
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
||||
|
||||
2. Marking a file for mandatory locking
|
||||
---------------------------------------
|
||||
|
||||
A file is marked as a candidate for mandatory locking by setting the group-id
|
||||
bit in its file mode but removing the group-execute bit. This is an otherwise
|
||||
meaningless combination, and was chosen by the System V implementors so as not
|
||||
to break existing user programs.
|
||||
|
||||
Note that the group-id bit is usually automatically cleared by the kernel when
|
||||
a setgid file is written to. This is a security measure. The kernel has been
|
||||
modified to recognize the special case of a mandatory lock candidate and to
|
||||
refrain from clearing this bit. Similarly the kernel has been modified not
|
||||
to run mandatory lock candidates with setgid privileges.
|
||||
|
||||
3. Available implementations
|
||||
----------------------------
|
||||
|
||||
I have considered the implementations of mandatory locking available with
|
||||
SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
|
||||
|
||||
Generally I have tried to make the most sense out of the behaviour exhibited
|
||||
by these three reference systems. There are many anomalies.
|
||||
|
||||
All the reference systems reject all calls to open() for a file on which
|
||||
another process has outstanding mandatory locks. This is in direct
|
||||
contravention of SVID 3, which states that only calls to open() with the
|
||||
O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
|
||||
definition, which is the "Right Thing", since only calls with O_TRUNC can
|
||||
modify the contents of the file.
|
||||
|
||||
HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
|
||||
just mandatory locks. That would appear to contravene POSIX.1.
|
||||
|
||||
mmap() is another interesting case. All the operating systems mentioned
|
||||
prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX
|
||||
also disallows advisory locks for such a file. SVID actually specifies the
|
||||
paranoid HP-UX behaviour.
|
||||
|
||||
In my opinion only MAP_SHARED mappings should be immune from locking, and then
|
||||
only from mandatory locks - that is what is currently implemented.
|
||||
|
||||
SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
|
||||
mandatory locks, so reads and writes to locked files always block when they
|
||||
should return EAGAIN.
|
||||
|
||||
I'm afraid that this is such an esoteric area that the semantics described
|
||||
below are just as valid as any others, so long as the main points seem to
|
||||
agree.
|
||||
|
||||
4. Semantics
|
||||
------------
|
||||
|
||||
1. Mandatory locks can only be applied via the fcntl()/lockf() locking
|
||||
interface - in other words the System V/POSIX interface. BSD style
|
||||
locks using flock() never result in a mandatory lock.
|
||||
|
||||
2. If a process has locked a region of a file with a mandatory read lock, then
|
||||
other processes are permitted to read from that region. If any of these
|
||||
processes attempts to write to the region it will block until the lock is
|
||||
released, unless the process has opened the file with the O_NONBLOCK
|
||||
flag in which case the system call will return immediately with the error
|
||||
status EAGAIN.
|
||||
|
||||
3. If a process has locked a region of a file with a mandatory write lock, all
|
||||
attempts to read or write to that region block until the lock is released,
|
||||
unless a process has opened the file with the O_NONBLOCK flag in which case
|
||||
the system call will return immediately with the error status EAGAIN.
|
||||
|
||||
4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
|
||||
any mandatory locks owned by other processes will be rejected with the
|
||||
error status EAGAIN.
|
||||
|
||||
5. Attempts to apply a mandatory lock to a file that is memory mapped and
|
||||
shared (via mmap() with MAP_SHARED) will be rejected with the error status
|
||||
EAGAIN.
|
||||
|
||||
6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
|
||||
that has any mandatory locks in effect will be rejected with the error status
|
||||
EAGAIN.
|
||||
|
||||
5. Which system calls are affected?
|
||||
-----------------------------------
|
||||
|
||||
Those which modify a file's contents, not just the inode. That gives read(),
|
||||
write(), readv(), writev(), open(), creat(), mmap(), truncate() and
|
||||
ftruncate(). truncate() and ftruncate() are considered to be "write" actions
|
||||
for the purposes of mandatory locking.
|
||||
|
||||
The affected region is usually defined as stretching from the current position
|
||||
for the total number of bytes read or written. For the truncate calls it is
|
||||
defined as the bytes of a file removed or added (we must also consider bytes
|
||||
added, as a lock can specify just "the whole file", rather than a specific
|
||||
range of bytes.)
|
||||
|
||||
Note 3: I may have overlooked some system calls that need mandatory lock
|
||||
checking in my eagerness to get this code out the door. Please let me know, or
|
||||
better still fix the system calls yourself and submit a patch to me or Linus.
|
||||
|
||||
6. Warning!
|
||||
-----------
|
||||
|
||||
Not even root can override a mandatory lock, so runaway processes can wreak
|
||||
havoc if they lock crucial files. The way around it is to change the file
|
||||
permissions (remove the setgid bit) before trying to read or write to it.
|
||||
Of course, that might be a bit tricky if the system is hung :-(
|
||||
|
Reference in New Issue
Block a user