android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Frederic Weisbecker	8050318598	kill-the-bkl/reiserfs: panic in case of lock imbalance Until now, trying to unlock the reiserfs write lock whereas the current task doesn't hold it lead to a simple warning. We should actually warn and panic in this case to avoid the user datas to reach an unstable state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-14 07:18:30 +02:00
Frederic Weisbecker	7e94277050	kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() reiserfs_commit_write() is always called with the write lock held. Thus the current calls to reiserfs_write_lock() in this function are acquiring the lock recursively. We can safely drop them. This also solves further assumptions for this lock to be really released while calling reiserfs_write_unlock(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-14 07:18:29 +02:00
Frederic Weisbecker	b10ab4c337	kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called from the dir inodes callbacks, without the lock held. But it can also be called from other internal sites such as reiserfs_xattr_init() which already holds the lock. This recursive locking leads to further wrong assumptions. For example, later calls to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock the time we acquire a given mutex, creating unexpected lock inversions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-14 07:18:27 +02:00
Frederic Weisbecker	ae635c0bbd	kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency reiserfs_xattr_init is called with the reiserfs write lock held, but if the ".reiserfs_priv" entry is not created, we take the superblock root directory inode mutex until .reiserfs_priv is created. This creates a lock dependency inversion against other sites such as reiserfs_file_release() which takes an inode mutex and the reiserfs lock after. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-14 07:18:26 +02:00
Frederic Weisbecker	08f14fc896	kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock When do_balance() balances the tree, a trick is performed to provide the ability for other tree writers/readers to check whether do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK). This is done to protect concurrent accesses to the tree. The trick is the following: When do_balance is called, a unique global variable called cur_tb takes a pointer to the current tree to be rebalanced. Once do_balance finishes its work, cur_tb takes the NULL value. Then, concurrent tree readers/writers just have to check the value of cur_tb to ensure do_balance isn't executing concurrently. If it is, then it proves that schedule() occured on do_balance(), which then relaxed the bkl that protected the tree. Now that the bkl has be turned into a mutex, this check is still fine even though do_balance() becomes preemptible: the write lock will not be automatically released on schedule(), so the tree is still protected. But this is only fine if we have a single reiserfs mountpoint. Indeed, because the bkl is a global lock, it didn't allowed concurrent executions between a tree reader/writer in a mount point and a do_balance() on another tree from another mountpoint. So assuming all these readers/writers weren't supposed to be reentrant, the current check now sometimes detect false positives with the current per-superblock mutex which allows this reentrancy. This patch keeps the concurrent tree accesses check but moves it per superblock, so that only trees from a same mount point are checked to be not accessed concurrently. [ Impact: fix spurious panic while running several reiserfs mount-points ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:25 +02:00
Frederic Weisbecker	c72e05756b	kill-the-bkl/reiserfs: acquire the inode mutex safely While searching a pathname, an inode mutex can be acquired in do_lookup() which calls reiserfs_lookup() which in turn acquires the write lock. On the other side reiserfs_fill_super() can acquire the write_lock and then call reiserfs_lookup_privroot() which can acquire an inode mutex (the root of the mount point). So we theoretically risk an AB - BA lock inversion that could lead to a deadlock. As for other lock dependencies found since the bkl to mutex conversion, the fix is to use reiserfs_mutex_lock_safe() which drops the lock dependency to the write lock. [ Impact: fix a possible deadlock with reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:24 +02:00
Frederic Weisbecker	2ac626955e	kill-the-bkl/reiserfs: unlock only when needed in search_by_key search_by_key() is the site which most requires the lock. This is mostly because it is a very central function and also because it releases/reaqcuires the write lock at least once each time it is called. Such release/reacquire creates a lot of contention in this place and also opens more the window which let another thread changing the tree. When it happens, the current path searching over the tree must be retried from the beggining (the root) which is a wasteful and time consuming recovery. This patch factorizes two release/reacquire sequences: - reading leaf nodes blocks - reading current block The latter immediately follows the former. The whole sequence is safe as a single unlocked section because we check just after if the tree has changed during these operations. Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:22 +02:00
Frederic Weisbecker	c63e3c0b24	kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe reiserfs_mutex_lock_safe() is a hack to avoid any dependency between an internal reiserfs mutex and the write lock, it has been proposed to follow the old bkl logic. The code does the following: while (!mutex_trylock(m)) { reiserfs_write_unlock(s); schedule(); reiserfs_write_lock(s); } It then imitate the implicit behaviour of the lock when it was a Bkl and hadn't such dependency: mutex_lock(m) { if (fastpath) let's go else { wait_for_mutex() { schedule() { unlock_kernel() reacquire_lock_kernel() } } } } The problem is that by using such explicit schedule(), we don't benefit of the adaptive mutex spinning on owner. The logic in use now is: reiserfs_write_unlock(s); mutex_lock(m); // -> possible adaptive spinning reiserfs_write_lock(s); [ Impact: restore the use of adaptive spinning mutexes in reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:21 +02:00
Frederic Weisbecker	d6f5b0aa08	kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() reiserfs_write_end() is a hot path in reiserfs. We have two wasteful write lock lock/release inside that can be gathered without changing the code logic. This patch factorizes them out in a single protected section, reducing the number of contentions inside. [ Impact: reduce lock contention in a reiserfs hotpath ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:20 +02:00
Frederic Weisbecker	09eb47a7c5	kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() search_by_key() is a central function in reiserfs which searches the patch in the fs tree from the root to a node given its key. It is the function that is most requesting the write lock because it's a path very often used. Also we forget to release the lock while reading the next tree node, making us holding the lock in a wasteful way. Then we release the lock while reading the current node and its childs, all-in-one. It should be safe because we have a reference to these blocks and even if we read a block that will be concurrently changed, we have an fs_changed check later that will make us retry the path from the root. [ Impact: release the write lock while unused in a hot path ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:19 +02:00
Frederic Weisbecker	b1c839bb2d	kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() The write lock can be acquired recursively in reiserfs_lookup(). But we may want to really release the lock before possible rescheduling from a reiserfs_lookup() callee. Hence we want to only acquire the lock once (ie: not recursively). [ Impact: prevent from possible false unreleased write lock on sleeping ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:17 +02:00
Frederic Weisbecker	26931309a4	kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() reiserfs_get_block() is one of these sites where the write lock might be acquired recursively. It's a particular problem because this function is called very often. It's a hot spot which needs to reschedule() periodically while converting direct items to indirect ones because it can take some time. Then if we are applying the write lock release/reacquire pattern on schedule() here, it may not produce the desired effect since we may have locked in more than one depth. The solution is to use reiserfs_write_lock_once() which won't try to reacquire the lock recursively. Then the lock will be really released before schedule(). Also, we only release the lock if TIF_NEED_RESCHED is set to not create wasteful numerous contentions. [ Impact: fix a too long holded lock case in reiserfs_get_block() ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:16 +02:00
Frederic Weisbecker	6e3647acb4	kill-the-BKL/reiserfs: release the write lock on flush_commit_list() flush_commit_list() uses ll_rw_block() to commit the pending log blocks. ll_rw_block() might sleep, and the bkl was released at this point. Then we can also relax the write lock at this point. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:13 +02:00
Frederic Weisbecker	4c5eface5d	kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block() reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This helper might sleep. Then, when the bkl was used, it was released at this point. We can then relax the write lock too here. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:11 +02:00
Frederic Weisbecker	148d3504c1	kill-the-BKL/reiserfs: release the write lock inside get_neighbors() get_neighbors() is used to get the left and/or right blocks against a given one in order to balance a tree. sb_bread() is used to read the buffer of these neighors blocks and while it waits for this operation, it might sleep. The bkl was released at this point, and then we can also release the write lock before calling sb_bread(). This is safe because if the filesystem is changed after this lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED in the function header comments) in order to repeat the neighbhor research. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:10 +02:00
Frederic Weisbecker	5e69e3a449	kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() prepare_for_delete_or_cut() can process several types of items, including indirect items, ie: items which contain no file data but pointers to unformatted nodes scattering the datas of a file. In this case it has to zero out these pointers to block numbers of unformatted nodes and release the bitmap from these block numbers. It can take some time, so a rescheduling() is performed between each block processed. We can safely release the write lock while rescheduling(), like the bkl did, because the code checks just after if the item has moved after sleeping. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:09 +02:00
Frederic Weisbecker	e6950a4da3	kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() When do_journal_end() copies data to the journal blocks buffers in memory, it reschedules if needed between each block copied and dirtyfied. We can also release the write lock at this rescheduling stage, like did the bkl implicitly. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:08 +02:00
Frederic Weisbecker	dc8f6d8936	kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode Impact: fix a deadlock reiserfs_dirty_inode() is the super_operations::dirty_inode() callback of reiserfs. It can be called from different contexts where the write lock can be already held. But this function also grab the write lock (possibly recursively). Subsequent release of the lock before sleep will actually not release the lock if the caller of mark_inode_dirty() (which in turn calls reiserfs_dirty_inode()) already owns the lock. A typical case: reiserfs_write_end() { acquire_write_lock() mark_inode_dirty() { reiserfs_dirty_inode() { reacquire_write_lock() { journal_begin() { do_journal_begin_r() { /* * fail to release, still * one depth of lock */ release_write_lock() reiserfs_wait_on_write_block() { wait_event() The event is usually provided by something which needs the write lock but it hasn't been released. We use reiserfs_write_lock_once() here to ensure we only grab the write lock in one level. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:04 +02:00
Frederic Weisbecker	22c963addc	kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file Impact: fix a deadlock reiserfs_truncate_file() can be called from multiple context where the write lock can be already hold or not. This function also acquire (possibly recursively) the write lock. Subsequent releases before sleeping will not actually release the lock because we may be in more than one lock depth degree. A typical case is: reiserfs_file_release { acquire_the_lock() reiserfs_truncate_file() reacquire_the_lock() journal_begin() { do_journal_begin_r() { reiserfs_wait_on_write_block() { /* * Not released because still one * depth owned */ release_lock() wait_for_event() At this stage the event never happen because the one which provides it needs the write lock. We use reiserfs_write_lock_once() here to ensure that we don't acquire the write lock recursively. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:03 +02:00
Frederic Weisbecker	daf88c8983	kill-the-BKL/reiserfs: provide a tool to lock only once the write lock Sometimes we don't want to recursively hold the per superblock write lock because we want to be sure it is actually released when we come to sleep. This patch introduces the necessary tools for that. reiserfs_write_lock_once() does the same job than reiserfs_write_lock() except that it won't try to acquire recursively the lock if the current task already owns it. Also the lock_depth before the call of this function is returned. reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once() returned a depth equal to -1, ie: only if it actually locked. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:02 +02:00
Frederic Weisbecker	a412f9efdd	reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock Impact: fix a deadlock The j_flush_mutex is acquired safely in journal.c: if we can't take it, we free the reiserfs per superblock lock and wait a bit. But we have a remaining place in kupdate_transactions() where j_flush_mutex is still acquired traditionnaly. Thus the following scenario (warned by lockdep) can happen: A B mutex_lock(&write_lock) mutex_lock(&write_lock) mutex_lock(&j_flush_mutex) mutex_lock(&j_flush_mutex) //block mutex_unlock(&write_lock) sleep... mutex_lock(&write_lock) //deadlock Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:01 +02:00
Frederic Weisbecker	8ebc423238	reiserfs: kill-the-BKL This patch is an attempt to remove the Bkl based locking scheme from reiserfs and is intended. It is a bit inspired from an old attempt by Peter Zijlstra: http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html The bkl is heavily used in this filesystem to prevent from concurrent write accesses on the filesystem. Reiserfs makes a deep use of the specific properties of the Bkl: - It can be acqquired recursively by a same task - It is released on the schedule() calls and reacquired when schedule() returns The two properties above are a roadmap for the reiserfs write locking so it's very hard to simply replace it with a common mutex. - We need a recursive-able locking unless we want to restructure several blocks of the code. - We need to identify the sites where the bkl was implictly relaxed (schedule, wait, sync, etc...) so that we can in turn release and reacquire our new lock explicitly. Such implicit releases of the lock are often required to let other resources producer/consumer do their job or we can suffer unexpected starvations or deadlocks. So the new lock that replaces the bkl here is a per superblock mutex with a specific property: it can be acquired recursively by a same task, like the bkl. For such purpose, we integrate a lock owner and a lock depth field on the superblock information structure. The first axis on this patch is to turn reiserfs_write_(un)lock() function into a wrapper to manage this mutex. Also some explicit calls to lock_kernel() have been converted to reiserfs_write_lock() helpers. The second axis is to find the important blocking sites (schedule...(), wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit release of the write lock on these locations before blocking. Then we can safely wait for those who can give us resources or those who need some. Typically this is a fight between the current writer, the reiserfs workqueue (aka the async commiter) and the pdflush threads. The third axis is a consequence of the second. The write lock is usually on top of a lock dependency chain which can include the journal lock, the flush lock or the commit lock. So it's dangerous to release and trying to reacquire the write lock while we still hold other locks. This is fine with the bkl: T1 T2 lock_kernel() mutex_lock(A) unlock_kernel() // do something lock_kernel() mutex_lock(A) -> already locked by T1 schedule() (and then unlock_kernel()) lock_kernel() mutex_unlock(A) .... This is not fine with a mutex: T1 T2 mutex_lock(write) mutex_lock(A) mutex_unlock(write) // do something mutex_lock(write) mutex_lock(A) -> already locked by T1 schedule() mutex_lock(write) -> already locked by T2 deadlock The solution in this patch is to provide a helper which releases the write lock and sleep a bit if we can't lock a mutex that depend on it. It's another simulation of the bkl behaviour. The last axis is to locate the fs callbacks that are called with the bkl held, according to Documentation/filesystem/Locking. Those are: - reiserfs_remount - reiserfs_fill_super - reiserfs_put_super Reiserfs didn't need to explicitly lock because of the context of these callbacks. But now we must take care of that with the new locking. After this patch, reiserfs suffers from a slight performance regression (for now). On UP, a high volume write with dd reports an average of 27 MB/s instead of 30 MB/s without the patch applied. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Bron Gondwana <brong@fastmail.fm> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:17:59 +02:00
Benny Halevy	4be36ca0ce	nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-09-13 15:57:39 -04:00
Linus Torvalds	86d710146f	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits) NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3' NFS: Allow the "nfs" file system type to support NFSv4 NFS: Move details of nfs4_get_sb() to a helper NFS: Refactor NFSv4 text-based mount option validation NFS: Mount option parser should detect missing "port=" NFS: out of date comment regarding O_EXCL above nfs3_proc_create() NFS: Handle a zero-length auth flavor list SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc... nfs: fix compile error in rpc_pipefs.h nfs: Remove reference to generic_osync_inode from a comment SUNRPC: cache must take a reference to the cache detail's module on open() NFS: Use the DNS resolver in the mount code. NFS: Add a dns resolver for use with NFSv4 referrals and migration SUNRPC: Fix a typo in cache_pipefs_files nfs: nfs4xdr: optimize low level decoding nfs: nfs4xdr: get rid of READ_BUF nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline nfs: nfs4xdr: get rid of COPYMEM nfs: nfs4xdr: introduce decode_sessionid helper nfs: nfs4xdr: introduce decode_verifier helper ...	2009-09-11 16:39:11 -07:00
Chris Mason	83ebade34b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable	2009-09-11 19:07:25 -04:00
Theodore Ts'o	7ad9bb651f	ext4: Fix initalization of s_flex_groups The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-11 16:51:28 -04:00
Linus Torvalds	774a694f8c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...	2009-09-11 13:23:18 -07:00
Trond Myklebust	ab3bbaa8b2	Merge branch 'nfs-for-2.6.32'	2009-09-11 14:59:37 -04:00
Chris Mason	93c82d5750	Btrfs: zero page past end of inline file items When btrfs_get_extent is reading inline file items for readpage, it needs to copy the inline extent into the page. If the inline extent doesn't cover all of the page, that means there is a hole in the file, or that our file is smaller than one page. readpage does zeroing for the case where the file is smaller than one page, but nobody is currently zeroing for the case where there is a hole after the inline item. This commit changes btrfs_get_extent to zero fill the page past the end of the inline item. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:08 -04:00
Chris Mason	50a9b214bc	Btrfs: fix btrfs page_mkwrite to return locked page This closes a whole where the page may be written before the page_mkwrite caller has a chance to dirty it (thanks to Nick Piggin) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:08 -04:00
Chris Mason	a1ed835e1a	Btrfs: Fix extent replacment race Data COW means that whenever we write to a file, we replace any old extent pointers with new ones. There was a window where a readpage might find the old extent pointers on disk and cache them in the extent_map tree in ram in the middle of a given write replacing them. Even though both the readpage and the write had their respective bytes in the file locked, the extent readpage inserts may cover more bytes than it had locked down. This commit closes the race by keeping the new extent pinned in the extent map tree until after the on-disk btree is properly setup with the new extent pointers. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	8b62b72b26	Btrfs: Use PagePrivate2 to track pages in the data=ordered code. Btrfs writes go through delalloc to the data=ordered code. This makes sure that all of the data is on disk before the metadata that references it. The tracking means that we have to make sure each page in an extent is fully written before we add that extent into the on-disk btree. This was done in the past by setting the EXTENT_ORDERED bit for the range of an extent when it was added to the data=ordered code, and then clearing the EXTENT_ORDERED bit in the extent state tree as each page finished IO. One of the reasons we had to do this was because sometimes pages are magically dirtied without page_mkwrite being called. The EXTENT_ORDERED bit is checked at writepage time, and if it isn't there, our page become dirty without going through the proper path. These bit operations make for a number of rbtree searches for each page, and can cause considerable lock contention. This commit switches from the EXTENT_ORDERED bit to use PagePrivate2. As pages go into the ordered code, PagePrivate2 is set on each one. This is a cheap operation because we already have all the pages locked and ready to go. As IO finishes, the PagePrivate2 bit is cleared and the ordered accoutning is updated for each page. At writepage time, if the PagePrivate2 bit is missing, we go into the writepage fixup code to handle improperly dirtied pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	9655d2982b	Btrfs: use a cached state for extent state operations during delalloc This changes the btrfs code to find delalloc ranges in the extent state tree to use the new state caching code from set/test bit. It reduces one of the biggest causes of rbtree searches in the writeback path. test_range_bit is also modified to take the cached state as a starting point while searching. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:07 -04:00
Chris Mason	d5550c6315	Btrfs: don't lock bits in the extent tree during writepage At writepage time, we have the page locked and we have the extent_map entry for this extent pinned in the extent_map tree. So, the page can't go away and its mapping can't change. There is no need for the extra extent_state lock bits during writepage. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	2c64c53d8d	Btrfs: cache values for locking extents Many of the btrfs extent state tree users follow the same pattern. They lock an extent range in the tree, do some operation and then unlock. This translates to at least 2 rbtree searches, and maybe more if they are doing operations on the extent state tree. A locked extent in the tree isn't going to be merged or changed, and so we can safely return the extent state structure as a cached handle. This changes set_extent_bit to give back a cached handle, and also changes both set_extent_bit and clear_extent_bit to use the cached handle if it is available. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	1edbb734b4	Btrfs: reduce CPU usage in the extent_state tree Btrfs is currently mirroring some of the page state bits into its extent state tree. The goal behind this was to use it in supporting blocksizes other than the page size. But, we don't currently support that, and we're using quite a lot of CPU on the rb tree and its spin lock. This commit starts a series of cleanups to reduce the amount of work done in the extent state tree as part of each IO. This commit: * Adds the ability to lock an extent in the state tree and also set other bits. The idea is to do locking and delalloc in one call * Removes the EXTENT_WRITEBACK and EXTENT_DIRTY bits. Btrfs is using a combination of the page bits and the ordered write code for this instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:06 -04:00
Chris Mason	e48c465bb3	Btrfs: Fix new state initialization order As the extent state tree is manipulated, there are call backs that are used to take extra actions when different state bits are set or cleared. One example of this is a counter for the total number of delayed allocation bytes in a single inode and in the whole FS. When new states are inserted, this callback is being done before we properly setup the new state. This hasn't caused problems before because the lock bit was always done first, and the existing call backs don't care about the lock bit. This patch makes sure the state is properly setup before using the callback, which is important for later optimizations that do more work without using the lock bit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	890871be85	Btrfs: switch extent_map to a rw lock There are two main users of the extent_map tree. The first is regular file inodes, where it is evenly spread between readers and writers. The second is the chunk allocation tree, which maps blocks from logical addresses to phyiscal ones, and it is 99.99% reads. The mapping tree is a point of lock contention during heavy IO workloads, so this commit switches things to a rw lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	57fd5a5ff8	Btrfs: tweak congestion backoff The btrfs io submission thread tries to back off congested devices in favor of rotating off to another disk. But, it tries to make sure it submits at least some IO before rotating on (the others may be congested too), and so it has a magic number of requests it tries to write before it hops. This makes the magic number smaller. Testing shows that we're spending too much time on congested devices and leaving the other devices idle. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:05 -04:00
Chris Mason	a97adc9fff	Btrfs: use larger nr_to_write for larger extents When btrfs fills a large delayed allocation extent, it is a good idea to try and convince the write_cache_pages caller to go ahead and write a good chunk of that extent. The extra IO is basically free because we know it is contiguous. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	4f878e8475	Btrfs: reduce worker thread spin_lock_irq hold times This changes the btrfs worker threads to batch work items into a local list. It allows us to pull work items in large chunks and significantly reduces the number of times we need to take the worker thread spinlock. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	4e3f9c5042	Btrfs: keep irqs on more often in the worker threads The btrfs worker thread spinlock was being used both for the queueing of IO and for the processing of ordered events. The ordered events never happen from end_io handlers, and so they don't need to use the _irq version of spinlocks. This adds a dedicated lock to the ordered lists so they don't have to run with irqs off. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:04 -04:00
Chris Mason	40431d6c12	Btrfs: optimize set extent bit The Btrfs set_extent_bit call currently searches the rbtree every time it needs to find more extent_state objects to fill the requested operation. This adds a simple test with rb_next to see if the next object in the tree was adjacent to the one we just found. If so, we skip the search and just use the next object. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:31:03 -04:00
Chris Mason	9042846bc7	Btrfs: Allow worker threads to exit when idle The Btrfs worker threads don't currently die off after they have been idle for a while, leading to a lot of threads sitting around doing nothing for each mount. Also, they are unable to start atomically (from end_io hanlders). This commit reworks the worker threads so they can be started from end_io handlers (just setting a flag that asks for a thread to be added at a later date) and so they can exit if they have been idle for a long time. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-09-11 13:30:56 -04:00
Linus Torvalds	a9c86d4259	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (377 commits) ASoC: au1x: PSC-AC97 bugfixes ALSA: dummy - Increase MAX_PCM_SUBSTREAMS to 128 ALSA: dummy - Add debug proc file ALSA: Add const prefix to proc helper functions ALSA: Re-export snd_pcm_format_name() function ALSA: hda - Use auto model for HP laptops with ALC268 codec ALSA: cs46xx - Fix minimum period size ASoC: Fix WM835x Out4 capture enumeration ALSA: Remove unneeded ifdef from sound/core.h ALSA: Remove struct snd_monitor_file from public sound/core.h ASoC: Remove unuused hw_read_t sound: oxygen: work around MCE when changing volume ALSA: dummy - Fake buffer allocations ALSA: hda/realtek: Added support for CLEVO M540R subsystem, 6 channel + digital ASoC: fix pxa2xx-ac97.c breakage ALSA: dummy - Fix the timer calculation in systimer mode ALSA: dummy - Add more description ALSA: dummy - Better jiffies handling ALSA: dummy - Support high-res timer mode ALSA: Release v1.0.21 ...	2009-09-11 09:19:35 -07:00
Linus Torvalds	a12e4d304c	Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block * 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: check for registered bdi in flusher add and inode dirty writeback: add name to backing_dev_info writeback: add some debug inode list counters to bdi stats writeback: get rid of pdflush completely writeback: switch to per-bdi threads for flushing data writeback: move dirty inodes from super_block to backing_dev_info writeback: get rid of generic_sync_sb_inodes() export	2009-09-11 09:17:05 -07:00
Linus Torvalds	f6f7919086	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits) binfmt_elf: fix PT_INTERP bss handling TPM: Fixup boot probe timeout for tpm_tis driver sysfs: Add labeling support for sysfs LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. KEYS: Add missing linux/tracehook.h #inclusions KEYS: Fix default security_session_to_parent() Security/SELinux: includecheck fix kernel/sysctl.c KEYS: security_cred_alloc_blank() should return int under all circumstances IMA: open new file for read KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6] KEYS: Do some whitespace cleanups [try #6] KEYS: Make /proc/keys use keyid not numread as file position [try #6] KEYS: Add garbage collection for dead, revoked and expired keys. [try #6] KEYS: Flag dead keys to induce EKEYREVOKED [try #6] KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6] KEYS: Deal with dead-type keys appropriately [try #6] CRED: Add some configurable debugging [try #6] selinux: Support for the new TUN LSM hooks ...	2009-09-11 08:55:49 -07:00
Miklos Szeredi	723590ed52	splice: update mtime and atime on files Splice should update the modification and access times on regular files just like read and write. Not updating mtime will confuse backup tools, etc... This patch only adds the time updates for regular files. For pipes and other special files that splice touches the need for updating the times is less clear. Let's discuss and fix that separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:34:33 +02:00
Jens Axboe	1f98a13f62	bio: first step in sanitizing the bio->bi_rw flag testing Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:31 +02:00
Jens Axboe	500b067c5e	writeback: check for registered bdi in flusher add and inode dirty Also a debugging aid. We want to catch dirty inodes being added to backing devices that don't do writeback. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 09:20:26 +02:00

... 16 17 18 19 20 ...

16084 Commits