Report separate components (anon, file, and shmem) for PSS in
smaps_rollup.
This helps understand and tune the memory manager behavior in consumer
devices, particularly mobile devices. Many of them (e.g. chromebooks and
Android-based devices) use zram for anon memory, and perform disk reads
for discarded file pages. The difference in latency is large (e.g.
reading a single page from SSD is 30 times slower than decompressing a
zram page on one popular device), thus it is useful to know how much of
the PSS is anon vs. file.
All the information is already present in /proc/pid/smaps, but much more
expensive to obtain because of the large size of that procfs entry.
This patch also removes a small code duplication in smaps_account, which
would have gotten worse otherwise.
Also updated Documentation/filesystems/proc.txt (the smaps section was a
bit stale, and I added a smaps_rollup section) and
Documentation/ABI/testing/procfs-smaps_rollup.
[semenzato@chromium.org: v5]
Link: http://lkml.kernel.org/r/20190626234333.44608-1-semenzato@chromium.org
Link: http://lkml.kernel.org/r/20190626180429.174569-1-semenzato@chromium.org
Signed-off-by: Luigi Semenzato <semenzato@chromium.org>
Acked-by: Yu Zhao <yuzhao@chromium.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Yu Zhao <yuzhao@chromium.org>
Cc: Brian Geffon <bgeffon@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit d46eb14b73 ("fs: fsnotify: account fsnotify metadata to
kmemcg") added remote memcg charging for fanotify and inotify event
objects. The aim was to charge the memory to the listener who is
interested in the events but without triggering the OOM killer.
Otherwise there would be security concerns for the listener.
At the time, oom-kill trigger was not in the charging path. A parallel
work added the oom-kill back to charging path i.e. commit 29ef680ae7
("memcg, oom: move out_of_memory back to the charge path"). So to not
trigger oom-killer in the remote memcg, explicitly add
__GFP_RETRY_MAYFAIL to the fanotigy and inotify event allocations.
Link: http://lkml.kernel.org/r/20190514212259.156585-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2 file system uses locking_state file under debugfs to dump each
ocfs2 file system's dlm lock resources, but the dlm lock resources in
memory are becoming more and more after the files were touched by the
user. it will become a bit difficult to analyze these dlm lock resource
records in locking_state file by the upper scripts, though some files
are not active for now, which were accessed long time ago.
Then, I'd like to add last pr/ex unlock times in locking_state file for
each dlm lock resource record, the the upper scripts can use last unlock
time to filter inactive dlm lock resource record.
Link: http://lkml.kernel.org/r/20190611015414.27754-1-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct dlm_migratable_lockres
{
...
struct dlm_migratable_lock ml[0]; // 16 bytes each, begins at byte 112
};
Make use of the struct_size() helper instead of an open-coded version in
order to avoid any potential type mistakes.
So, replace the following form:
sizeof(struct dlm_migratable_lockres) + (mres->num_locks * sizeof(struct dlm_migratable_lock))
with:
struct_size(mres, ml, mres->num_locks)
Notice that, in this case, variable sz is not necessary, hence it is
removed.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190605204926.GA24467@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NFSoRDMA client updates for 5.3
New features:
- Add a way to place MRs back on the free list
- Reduce context switching
- Add new trace events
Bugfixes and cleanups:
- Fix a BUG when tracing is enabled with NFSv4.1
- Fix a use-after-free in rpcrdma_post_recvs
- Replace use of xdr_stream_pos in rpcrdma_marshal_req
- Fix occasional transport deadlock
- Fix show_nfs_errors macros, other tracing improvements
- Remove RPCRDMA_REQ_F_PENDING and fr_state
- Various simplifications and refactors
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull char / misc driver updates from Greg KH:
"Here is the "large" pull request for char and misc and other assorted
smaller driver subsystems for 5.3-rc1.
It seems that this tree is becoming the funnel point of lots of
smaller driver subsystems, which is fine for me, but that's why it is
getting larger over time and does not just contain stuff under
drivers/char/ and drivers/misc.
Lots of small updates all over the place here from different driver
subsystems:
- habana driver updates
- coresight driver updates
- documentation file movements and updates
- Android binder fixes and updates
- extcon driver updates
- google firmware driver updates
- fsi driver updates
- smaller misc and char driver updates
- soundwire driver updates
- nvmem driver updates
- w1 driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (188 commits)
coresight: Do not default to CPU0 for missing CPU phandle
dt-bindings: coresight: Change CPU phandle to required property
ocxl: Allow contexts to be attached with a NULL mm
fsi: sbefifo: Don't fail operations when in SBE IPL state
coresight: tmc: Smatch: Fix potential NULL pointer dereference
coresight: etm3x: Smatch: Fix potential NULL pointer dereference
coresight: Potential uninitialized variable in probe()
coresight: etb10: Do not call smp_processor_id from preemptible
coresight: tmc-etf: Do not call smp_processor_id from preemptible
coresight: tmc-etr: alloc_perf_buf: Do not call smp_processor_id from preemptible
coresight: tmc-etr: Do not call smp_processor_id() from preemptible
docs: misc-devices: convert files without extension to ReST
fpga: dfl: fme: align PR buffer size per PR datawidth
fpga: dfl: fme: remove copy_to_user() in ioctl for PR
fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.
intel_th: msu: Start read iterator from a non-empty window
intel_th: msu: Split sgt array and pointer in multiwindow mode
intel_th: msu: Support multipage blocks
intel_th: pci: Add Ice Lake NNPI support
intel_th: msu: Fix single mode with disabled IOMMU
...
Pull pstore updates from Kees Cook:
- Improve backward compatibility with older Chromebooks (Douglas
Anderson)
- Refactor debugfs initialization (Greg KH)
- Fix double-free in pstore_mkfile() failure path (Norbert Manthey)
* tag 'pstore-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Fix double-free in pstore_mkfile() failure path
pstore: no need to check return value of debugfs_create functions
pstore/ram: Improve backward compatibility with older Chromebooks
Pull networking updates from David Miller:
"Some highlights from this development cycle:
1) Big refactoring of ipv6 route and neigh handling to support
nexthop objects configurable as units from userspace. From David
Ahern.
2) Convert explored_states in BPF verifier into a hash table,
significantly decreased state held for programs with bpf2bpf
calls, from Alexei Starovoitov.
3) Implement bpf_send_signal() helper, from Yonghong Song.
4) Various classifier enhancements to mvpp2 driver, from Maxime
Chevallier.
5) Add aRFS support to hns3 driver, from Jian Shen.
6) Fix use after free in inet frags by allocating fqdirs dynamically
and reworking how rhashtable dismantle occurs, from Eric Dumazet.
7) Add act_ctinfo packet classifier action, from Kevin
Darbyshire-Bryant.
8) Add TFO key backup infrastructure, from Jason Baron.
9) Remove several old and unused ISDN drivers, from Arnd Bergmann.
10) Add devlink notifications for flash update status to mlxsw driver,
from Jiri Pirko.
11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.
12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.
13) Various enhancements to ipv6 flow label handling, from Eric
Dumazet and Willem de Bruijn.
14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
der Merwe, and others.
15) Various improvements to axienet driver including converting it to
phylink, from Robert Hancock.
16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.
17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
Radulescu.
18) Add devlink health reporting to mlx5, from Moshe Shemesh.
19) Convert stmmac over to phylink, from Jose Abreu.
20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
Shalom Toledo.
21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.
22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.
23) Track spill/fill of constants in BPF verifier, from Alexei
Starovoitov.
24) Support bounded loops in BPF, from Alexei Starovoitov.
25) Various page_pool API fixes and improvements, from Jesper Dangaard
Brouer.
26) Just like ipv4, support ref-countless ipv6 route handling. From
Wei Wang.
27) Support VLAN offloading in aquantia driver, from Igor Russkikh.
28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.
29) Add flower GRE encap/decap support to nfp driver, from Pieter
Jansen van Vuuren.
30) Protect against stack overflow when using act_mirred, from John
Hurley.
31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.
32) Use page_pool API in netsec driver, Ilias Apalodimas.
33) Add Google gve network driver, from Catherine Sullivan.
34) More indirect call avoidance, from Paolo Abeni.
35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.
36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.
37) Add MPLS manipulation actions to TC, from John Hurley.
38) Add sending a packet to connection tracking from TC actions, and
then allow flower classifier matching on conntrack state. From
Paul Blakey.
39) Netfilter hw offload support, from Pablo Neira Ayuso"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
net/mlx5e: Return in default case statement in tx_post_resync_params
mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
net: dsa: add support for BRIDGE_MROUTER attribute
pkt_sched: Include const.h
net: netsec: remove static declaration for netsec_set_tx_de()
net: netsec: remove superfluous if statement
netfilter: nf_tables: add hardware offload support
net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
net: flow_offload: add flow_block_cb_is_busy() and use it
net: sched: remove tcf block API
drivers: net: use flow block API
net: sched: use flow block API
net: flow_offload: add flow_block_cb_{priv, incref, decref}()
net: flow_offload: add list handling functions
net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
net: flow_offload: add flow_block_cb_setup_simple()
net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
...
The variable buffer_index is being initialized however this is never
read and later it is being reassigned to a new value. The initialization
is redundant and hence can be removed.
Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
If the DLM lowcomms stack is shut down before any DLM
traffic can be generated, flush_workqueue() and
destroy_workqueue() can be called on empty send and/or recv
workqueues.
Insert guard conditionals to only call flush_workqueue()
and destroy_workqueue() on workqueues that are not NULL.
Signed-off-by: David Windsor <dwindsor@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Pull m68nommu updates from Greg Ungerer:
"A series of cleanups for the FLAT format binary loader, binfmt_flat,
from Christoph.
The end goal is to support no-MMU on RISC-V, and the last patch
enables that"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
riscv: add binfmt_flat support
binfmt_flat: don't offset the data start
binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
binfmt_flat: provide an asm-generic/flat.h
binfmt_flat: make support for old format binaries optional
binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
binfmt_flat: add endianess annotations
binfmt_flat: use fixed size type for the on-disk format
binfmt_flat: consolidate two version of flat_v2_reloc_t
binfmt_flat: remove the unused OLD_FLAT_FLAG_RAM definition
binfmt_flat: remove the uapi <linux/flat.h> header
binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
binfmt_flat: remove flat_old_ram_flag
binfmt_flat: provide a default version of flat_get_relocate_addr
binfmt_flat: remove flat_set_persistent
binfmt_flat: remove flat_reloc_valid
Pull nfsd updates from Bruce Fields:
"Highlights:
- Add a new /proc/fs/nfsd/clients/ directory which exposes some
long-requested information about NFSv4 clients (like open files)
and allows forced revocation of client state.
- Replace the global duplicate reply cache by a cache per network
namespace; previously, a request in one network namespace could
incorrectly match an entry from another, though we haven't seen
this in production. This is the last remaining container bug that
I'm aware of; at this point you should be able to run separate
nfsd's in each network namespace, each with their own set of
exports, and everything should work.
- Cleanup and modify lock code to show the pid of lockd as the owner
of NLM locks. This is the correct version of the bugfix originally
attempted in b8eee0e90f ("lockd: Show pid of lockd for remote
locks")"
* tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd: Make __get_nfsdfs_client() static
nfsd: Make two functions static
nfsd: Fix misuse of strlcpy
sunrpc/cache: remove the exporting of cache_seq_next
nfsd: decode implementation id
nfsd: create xdr_netobj_dup helper
nfsd: allow forced expiration of NFSv4 clients
nfsd: create get_nfsdfs_clp helper
nfsd4: show layout stateids
nfsd: show lock and deleg stateids
nfsd4: add file to display list of client's opens
nfsd: add more information to client info file
nfsd: escape high characters in binary data
nfsd: copy client's address including port number to cl_addr
nfsd4: add a client info file
nfsd: make client/ directory names small ints
nfsd: add nfsd/clients directory
nfsd4: use reference count to free client
nfsd: rename cl_refcount
nfsd: persist nfsd filesystem across mounts
...
Pull gfs2 updates from Andreas Gruenbacher:
"Some relatively minor changes for gfs2:
- An initial batch of obvious cleanups and fixes from Bob's recovery
patch queue.
- Two iomap conversion patches and some cleanups from Christoph
Hellwig.
- A cosmetic cleanup from Kefeng Wang (Huawei).
- Another minor fix and cleanup by me"
* tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove unused gfs2_iomap_alloc argument
gfs2: don't use buffer_heads in gfs2_allocate_page_backing
gfs2: use iomap_bmap instead of generic_block_bmap
gfs2: mark stuffed_readpage static
gfs2: merge gfs2_writepage_common into gfs2_writepage
gfs2: merge gfs2_writeback_aops and gfs2_ordered_aops
gfs2: remove the unused gfs2_stuffed_write_end function
gfs2: use page_offset in gfs2_page_mkwrite
gfs2: replace more printk with calls to fs_info and friends
gfs2: dump fsid when dumping glock problems
gfs2: simplify gfs2_freeze by removing case
gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
gfs2: Warn when a journal replay overwrites a rgrp with buffers
gfs2: log which portion of the journal is replayed
gfs2: eliminate tr_num_revoke_rm
gfs2: kthread and remount improvements
gfs2: Use IS_ERR_OR_NULL
gfs2: Clean up freeing struct gfs2_sbd
Pull ext4 updates from Ted Ts'o:
"Many bug fixes and cleanups, and an optimization for case-insensitive
lookups"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix coverity warning on error path of filename setup
ext4: replace ktype default_attrs with default_groups
ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
ext4: refactor initialize_dirent_tail()
ext4: rename "dirent_csum" functions to use "dirblock"
ext4: allow directory holes
jbd2: drop declaration of journal_sync_buffer()
ext4: use jbd2_inode dirty range scoping
jbd2: introduce jbd2_inode dirty range scoping
mm: add filemap_fdatawait_range_keep_errors()
ext4: remove redundant assignment to node
ext4: optimize case-insensitive lookups
ext4: make __ext4_get_inode_loc plug
ext4: clean up kerneldoc warnigns when building with W=1
ext4: only set project inherit bit for directory
ext4: enforce the immutable flag on open files
ext4: don't allow any modifications to an immutable file
jbd2: fix typo in comment of journal_submit_inode_data_buffers
jbd2: fix some print format mistakes
ext4: gracefully handle ext4_break_layouts() failure during truncate
Pull afs updates from David Howells:
"A set of minor changes for AFS:
- Remove an unnecessary check in afs_unlink()
- Add a tracepoint for tracking callback management
- Add a tracepoint for afs_server object usage
- Use struct_size()
- Add mappings for AFS UAE abort codes to Linux error codes, using
symbolic names rather than hex numbers in the .c file"
* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Add support for the UAE error table
fs/afs: use struct_size() in kzalloc()
afs: Trace afs_server usage
afs: Add some callback management tracepoints
afs: afs_unlink() doesn't need to check dentry->d_inode
Pull fscrypt updates from Eric Biggers:
- Preparations for supporting encryption on ext4 filesystems where the
filesystem block size is smaller than PAGE_SIZE.
- Don't allow setting encryption policies on dead directories.
- Various cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: document testing with xfstests
fscrypt: remove selection of CONFIG_CRYPTO_SHA256
fscrypt: remove unnecessary includes of ratelimit.h
fscrypt: don't set policy for a dead directory
ext4: encrypt only up to last block in ext4_bio_write_page()
ext4: decrypt only the needed block in __ext4_block_zero_page_range()
ext4: decrypt only the needed blocks in ext4_block_write_begin()
ext4: clear BH_Uptodate flag on decryption error
fscrypt: decrypt only the needed blocks in __fscrypt_decrypt_bio()
fscrypt: support decrypting multiple filesystem blocks per page
fscrypt: introduce fscrypt_decrypt_block_inplace()
fscrypt: handle blocksize < PAGE_SIZE in fscrypt_zeroout_range()
fscrypt: support encrypting multiple filesystem blocks per page
fscrypt: introduce fscrypt_encrypt_block_inplace()
fscrypt: clean up some BUG_ON()s in block encryption/decryption
fscrypt: rename fscrypt_do_page_crypto() to fscrypt_crypt_block()
fscrypt: remove the "write" part of struct fscrypt_ctx
fscrypt: simplify bounce page handling
Pull copy_file_range updates from Darrick Wong:
"This fixes numerous parameter checking problems and inconsistent
behaviors in the new(ish) copy_file_range system call.
Now the system call will actually check its range parameters
correctly; refuse to copy into files for which the caller does not
have sufficient privileges; update mtime and strip setuid like file
writes are supposed to do; and allows copying up to the EOF of the
source file instead of failing the call like we used to.
Summary:
- Create a generic copy_file_range handler and make individual
filesystems responsible for calling it (i.e. no more assuming that
do_splice_direct will work or is appropriate)
- Refactor copy_file_range and remap_range parameter checking where
they are the same
- Install missing copy_file_range parameter checking(!)
- Remove suid/sgid and update mtime like any other file write
- Change the behavior so that a copy range crossing the source file's
eof will result in a short copy to the source file's eof instead of
EINVAL
- Permit filesystems to decide if they want to handle
cross-superblock copy_file_range in their local handlers"
* tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fuse: copy_file_range needs to strip setuid bits and update timestamps
vfs: allow copy_file_range to copy across devices
xfs: use file_modified() helper
vfs: introduce file_modified() helper
vfs: add missing checks to copy_file_range
vfs: remove redundant checks from generic_remap_checks()
vfs: introduce generic_file_rw_checks()
vfs: no fallback for ->copy_file_range
vfs: introduce generic_copy_file_range()
Pull iomap updates from Darrick Wong:
"There are a few fixes for gfs2 but otherwise it's pretty quiet so far.
- Only mark inode dirty at the end of writing to a file (instead of
once for every page written).
- Fix for an accounting error in the page_done callback"
* tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix page_done callback for short writes
fs: fold __generic_write_end back into generic_write_end
iomap: don't mark the inode dirty in iomap_write_end
Pull ext2, udf and quota updates from Jan Kara:
- some ext2 fixes and cleanups
- a fix of udf bug when extending files
- a fix of quota Q_XGETQSTAT[V] handling
* tag 'for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
ext2: Use kmemdup rather than duplicating its implementation
quota: honor quota type in Q_XGETQSTAT[V] calls
ext2: Always brelse bh on failure in ext2_iget()
ext2: add missing brelse() in ext2_iget()
ext2: Fix a typo in ext2_getattr argument
ext2: fix a typo in comment
ext2: add missing brelse() in ext2_new_inode()
ext2: optimize ext2_xattr_get()
ext2: introduce new helper for xattr entry comparison
ext2: merge xattr next entry check to ext2_xattr_entry_valid()
ext2: code cleanup for ext2_preread_inode()
ext2: code cleanup by using test_opt() and clear_opt()
doc: ext2: update description of quota options for ext2
ext2: Strengthen xattr block checks
ext2: Merge loops in ext2_xattr_set()
ext2: introduce helper for xattr entry validation
ext2: introduce helper for xattr header validation
quota: add dqi_dirty_list description to comment of Dquot List Management
Pull fsnotify updates from Jan Kara:
"This contains cleanups of the fsnotify name removal hook and also a
patch to disable fanotify permission events for 'proc' filesystem"
* tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: get rid of fsnotify_nameremove()
fsnotify: move fsnotify_nameremove() hook out of d_delete()
configfs: call fsnotify_rmdir() hook
debugfs: call fsnotify_{unlink,rmdir}() hooks
debugfs: simplify __debugfs_remove_file()
devpts: call fsnotify_unlink() hook
tracefs: call fsnotify_{unlink,rmdir}() hooks
rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
btrfs: call fsnotify_rmdir() hook
fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
fanotify: Disallow permission events for proc filesystem
Pull file locking updates from Jeff Layton:
"Just a couple of small lease-related patches this cycle.
One from Ira to add a new tracepoint that fires during lease conflict
checks, and another patch from Amir to reduce false positives when
checking for lease conflicts"
* tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: eliminate false positive conflicts for write lease
locks: Add trace_leases_conflict
As Park Ju Hyung suggested:
"I'd like to suggest to write down an actual version of f2fs-tools
here as we've seen older versions of fsck doing even more damage
and the users might not have the latest f2fs-tools installed."
This patch give a more detailed info of how we fix such corruption
to user to avoid damageable repair with low version fsck.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
blkoff_off might over 512 due to fs corrupt or security
vulnerability. That should be checked before being using.
Use ENTRIES_IN_SUM to protect invalid value in cur_data_blkoff.
Signed-off-by: Ocean Chen <oceanchen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In umount, we give an constand time to handle pending discard, previously,
in __issue_discard_cmd() we missed to check timeout condition in loop,
result in delaying long time, fix it.
Signed-off-by: Heng Xiao <heng.xiao@unisoc.com>
[Chao Yu: add commit message]
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
RHBZ: 1672539
In smb2_query_symlink(), if we are parsing the error buffer but it is not something
we recognize as a symlink we should return -EINVAL and not -ENOENT.
I.e. the entry does exist, it is just not something we recognize.
Additionally, add check to verify that that the errortag and the reparsetag all make sense.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
We need to chain the earlier bios to the later ones, so that
submit_bio_wait waits on the bio that all the completions are
dispatched to.
Fixes: 6ad5b3255b ("xfs: use bios directly to read and write the log recovery buffers")
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When writeback IOs are bounced through async layers, the IOs should
only be accounted against the wbc from the original bdi writeback to
avoid confusing cgroup inode ownership arbitration. Add
wbc->no_cgroup_owner to allow disabling wbc cgroup owner accounting.
This will be used make btrfs compression work well with cgroup IO
control.
v2: Renamed from no_wbc_acct to no_cgroup_owner and added comment as
per Jan.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
wbc_account_io() does a very specific job - try to see which cgroup is
actually dirtying an inode and transfer its ownership to the majority
dirtier if needed. The name is too generic and confusing. Let's
rename it to something more specific.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
btrfs is going to use css_put() and wbc helpers to improve cgroup
writeback support. Add dummy css_get() definition and export wbc
helpers to prepare for module and !CONFIG_CGROUP builds.
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, running into a shrink list that contains dentries from different
filesystems can cause several unpleasant things for shrink_dcache_parent()
and for umount(2).
The first problem is that there's a window during shrink_dentry_list() between
__dentry_kill() takes a victim out and dropping reference to its parent. During
that window the parent looks like a genuine busy dentry. shrink_dcache_parent()
(or, worse yet, shrink_dcache_for_umount()) coming at that time will see no
eviction candidates and no indication that it needs to wait for some
shrink_dentry_list() to proceed further.
That applies for any shrink list that might intersect with the subtree we are
trying to shrink; the only reason it does not blow on umount(2) in the mainline
is that we unregister the memory shrinker before hitting shrink_dcache_for_umount().
Another problem happens if something in a mixed-filesystem shrink list gets
be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for
the stuck shrinker to get around to our dentries.
Solution:
1) have shrink_dentry_list() decrement the parent's refcount and
make sure it's on a shrink list (ours unless it already had been on some
other) before calling __dentry_kill(). That eliminates the window when
shrink_dcache_parent() would've blown past the entire subtree without
noticing anything with zero refcount not on shrink lists.
2) when shrink_dcache_parent() has found no eviction candidates,
but some dentries are still sitting on shrink lists, rather than
repeating the scan in hope that shrinkers have progressed, scan looking
for something on shrink lists with zero refcount. If such a thing is
found, grab rcu_read_lock() and stop the scan, with caller locking
it for eviction, dropping out of RCU and doing __dentry_kill(), with
the same treatment for parent as shrink_dentry_list() would do.
Note that right now mixed-filesystem shrink lists do not occur, so this
is not a mainline bug. Howevere, there's a bunch of uses for such
beasts (e.g. the "try and evict everything we can out of given page"
patches; there are potential uses in mount-related code, considerably
simplifying the life in fs/namespace.c, etc.)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In some cases, using the 'truncate' command to extend a UDF file results
in a mismatch between the length of the file's extents (specifically, due
to incorrect length of the final NOT_ALLOCATED extent) and the information
(file) length. The discrepancy can prevent other operating systems
(i.e., Windows 10) from opening the file.
Two particular errors have been observed when extending a file:
1. The final extent is larger than it should be, having been rounded up
to a multiple of the block size.
B. The final extent is not shorter than it should be, due to not having
been updated when the file's information length was increased.
[JK: simplified udf_do_extend_final_block(), fixed up some types]
Fixes: 2c948b3f86 ("udf: Avoid IO in udf_clear_inode")
CC: stable@vger.kernel.org
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Link: https://lore.kernel.org/r/1561948775-5878-1-git-send-email-steve@digidescorp.com
Signed-off-by: Jan Kara <jack@suse.cz>
Fix sparse warning:
fs/nfsd/nfsctl.c:1221:22: warning:
symbol '__get_nfsdfs_client' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>