Commit Graph

68698 Commits

Author SHA1 Message Date
Fengnan Chang
ed493d61fe FROMGIT: f2fs: fix to use WHINT_MODE
Since active_logs can be set to 2 or 4 or NR_CURSEG_PERSIST_TYPE(6),
it cannot be set to NR_CURSEG_TYPE(8).
That is, whint_mode is always off.

Therefore, the condition is changed from NR_CURSEG_TYPE to
NR_CURSEG_PERSIST_TYPE.

Bug: 202812742
Cc: Chao Yu <chao@kernel.org>
Fixes: d0b9e42ab6 (f2fs: introduce inmem curseg)
Reported-by: tanghuan <tanghuan@vivo.com>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 011e0868e0cf1237675b22e36fffa958fb08f46e
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/ dev)

Change-Id: Ic2a2b055e1f37f6263d8243527e83fe6bd4f7070
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
2021-10-14 04:22:27 +00:00
Gao Xiang
dcd77f0b74 UPSTREAM: erofs: fix 1 lcluster-sized pcluster for big pcluster
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:

       HEAD     HEAD / PLAIN    lcluster type
   ____________ ____________
  |_:__________|_________:__|   file data (uncompressed)
   .                .
  .____________.
  |____________|                pcluster data (compressed)

Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.

It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.

[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>

Bug: 201372112
Change-Id: I7e46baa993790f8908287ac36e19d32e536116db
(cherry picked from commit 0852b6ca941ef3ff75076e85738877bd3271e1cd)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:33:46 +00:00
Gao Xiang
e085d3f0d0 UPSTREAM: erofs: enable big pcluster feature
Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
all settled properly.

Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I85a9045c146b6877eb371904e82d0481e21bbb75
(cherry picked from commit 8e6c8fa9f2e95c88a642521a5da19a8e31748846)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:33:20 +00:00
Gao Xiang
ed0607cc52 UPSTREAM: erofs: support decompress big pcluster for lz4 backend
Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,

 - (maptype 0) if there is only one compressed page + no need
   to copy inplace I/O, just map it directly what we did before;

 - (maptype 1) if there are more compressed pages + no need to
   copy inplace I/O, vmap such compressed pages instead;

 - (maptype 2) if inplace I/O needs to be copied, use per-CPU
   buffers for decompression then.

Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.

Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.

Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I8e8b90bd0a401850ea81d895dc55e5d8a2772ed7
(cherry picked from commit 598162d050801e556750defff4ddab499e5d76ed)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:32:20 +00:00
Gao Xiang
d34cb6cdc0 UPSTREAM: erofs: support parsing big pcluster compact indexes
Different from non-compact indexes, several lclusters are packed
as the compact form at once and an unique base blkaddr is stored for
each pack, so each lcluster index would take less space on avarage
(e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
switch should be consistent for compact head0/1.

Prior to big pcluster, the size of all pclusters was 1 lcluster.
Therefore, when a new HEAD lcluster was scanned, blkaddr would be
bumped by 1 lcluster. However, that way doesn't work anymore for
big pcluster since we actually don't know the compressed size of
pclusters in advance (before reading CBLKCNT lcluster).

So, instead, let blkaddr of each pack be the first pcluster blkaddr
with a valid CBLKCNT, in detail,

 1) if CBLKCNT starts at the pack, this first valid pcluster is
    itself, e.g.
  _____________________________________________________________
 |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
 ^ = blkaddr base          ^ += CBLKCNT0           ^ += CBLKCNT1

 2) if CBLKCNT doesn't start at the pack, the first valid pcluster
    is the next pcluster, e.g.
  _________________________________________________________
 | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
                ^ = blkaddr base        ^ += CBLKCNT0
                                               ^ += 1

When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
lclusters, or a new HEAD is found immediately, bump blkaddr by 1
instead (see the picture above.)

Also noted if CBLKCNT is the end of the pack, instead of storing
delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
it still uses the compressed block count (delta0) since delta1
can be calculated indirectly but the block count can't.

Adjust decoding logic to fit big pcluster compact indexes as well.

Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I09d6bb4c9d390dc6169cb9dd4efbc7fd6b3be5d0
(cherry picked from commit b86269f43892316ef5a177d7180d09d101a46f22)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:31:56 +00:00
Gao Xiang
051d76b899 UPSTREAM: erofs: support parsing big pcluster compress indexes
When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.

If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.

Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I97b8d1cc54e057789d22f1cdfafc21ed6a69b149
(cherry picked from commit cec6e93beadfd145758af2c0854fcc2abb8170cb)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:31:46 +00:00
Gao Xiang
d149931601 UPSTREAM: erofs: adjust per-CPU buffers according to max_pclusterblks
Adjust per-CPU buffers on demand since big pcluster definition is
available. Also, bail out unsupported pcluster size according to
Z_EROFS_PCLUSTER_MAX_SIZE.

Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I86665be0519f328614d93aa24cb06043655576b9
(cherry picked from commit 4fea63f7d76e425965033938bab6488e48579e3f)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:31:21 +00:00
Gao Xiang
95a1d5df84 UPSTREAM: erofs: add big physical cluster definition
Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)

When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.

Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.

Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: Ibd33f2764f4420f370f9293ed325efdba9ea70a7
(cherry picked from commit 5404c33010cb8ee063c05376d4a2eba129872281)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:30:24 +00:00
Gao Xiang
8043aaed1d UPSTREAM: erofs: fix up inplace I/O pointer for big pcluster
When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.

Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.

Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I752769d004c18f74636bcfdea767572daa6f7072
(cherry picked from commit 81382f5f5cb0c9c5694c19d36460f757a8c96841)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:29:52 +00:00
Gao Xiang
6ad2f8f169 UPSTREAM: erofs: introduce physical cluster slab pools
Since multiple pcluster sizes could be used at once, the number of
compressed pages will become a variable factor. It's necessary to
introduce slab pools rather than a single slab cache now.

This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
use now.

Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I821f3cf35820dbb320eeedc3ae934fd1d455dfd7
(cherry picked from commit 9f6cc76e6ff0631a99cd94eab8af137057633a52)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:29:22 +00:00
Gao Xiang
432f58b100 UPSTREAM: erofs: introduce multipage per-CPU buffers
To deal the with the cases which inplace decompression is infeasible
for some inplace I/O. Per-CPU buffers was introduced to get rid of page
allocation latency and thrash for low-latency decompression algorithms
such as lz4.

For the big pcluster feature, introduce multipage per-CPU buffers to
keep such inplace I/O pclusters temporarily as well but note that
per-CPU pages are just consecutive virtually.

When a new big pcluster fs is mounted, its max pclustersize will be
read and per-CPU buffers can be growed if needed. Shrinking adjustable
per-CPU buffers is more complex (because we don't know if such size
is still be used), so currently just release them all when unloading.

Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I5b3ab69ef58aaea635911b0c08cceeb4b38122da
(cherry picked from commit 524887347fcb67faa0a63dd3c4c02ab48d4968d4)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:28:24 +00:00
Vladimir Zapolskiy
571c9a0bd3 UPSTREAM: erofs: remove a void EROFS_VERSION macro set in Makefile
Since commit 4f761fa253 ("erofs: rename errln/infoln/debugln to
erofs_{err, info, dbg}") the defined macro EROFS_VERSION has no affect,
therefore removing it from the Makefile is a non-functional change.

Link: https://lore.kernel.org/r/20201030122839.25431-1-vladimir@tuxera.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I06df2883e85d4bfdc08bccb7aa7c014570cd156b
(cherry picked from commit a426ce9d6751cc8e709f031fa546900e4239f125)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:27:55 +00:00
Gao Xiang
431d73396d UPSTREAM: erofs: reserve physical_clusterbits[]
Formal big pcluster design is actually more powerful / flexable than
the previous thought whose pclustersize was fixed as power-of-2 blocks,
which was obviously inefficient and space-wasting. Instead, pclustersize
can now be set independently for each pcluster, so various pcluster
sizes can also be used together in one file if mkfs wants (for example,
according to data type and/or compression ratio).

Let's get rid of previous physical_clusterbits[] setting (also notice
that corresponding on-disk fields are still 0 for now). Therefore,
head1/2 can be used for at most 2 different algorithms in one file and
again pclustersize is now independent of these.

Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I7db236f06b202949b882a366b347f0c4e9dc0c3e
(cherry picked from commit 54e0b6c873dcbd02b9b479c893f6fba8fcbc6a9c)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:27:22 +00:00
Ruiqi Gong
89dbc6246a UPSTREAM: erofs: Clean up spelling mistakes found in fs/erofs
zmap.c: s/correspoinding/corresponding
zdata.c: s/endding/ending

Link: https://lore.kernel.org/r/20210331093920.31923-1-gongruiqi1@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ruiqi Gong <gongruiqi1@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: Ie848ca314462a8093fe94405b260c12b23b663b9
(cherry picked from commit fe6adcce7e297fcb49f40c62df42334690c3f848)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:26:25 +00:00
Gao Xiang
ac1f14e9d5 UPSTREAM: erofs: add on-disk compression configurations
Add a bitmap for available compression algorithms and a variable-sized
on-disk table for compression options in preparation for upcoming big
pcluster and LZMA algorithm, which follows the end of super block.

To parse the compression options, the bitmap is scanned one by one.
For each available algorithm, there is data followed by 2-byte `length'
correspondingly (it's enough for most cases, or entire fs blocks should
be used.)

With such available algorithm bitmap, kernel itself can also refuse to
mount such filesystem if any unsupported compression algorithm exists.

Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER.

Link: https://lore.kernel.org/r/20210329100012.12980-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I9f9c69d5cd05da8c5f17e3650511e83c5f89200e
(cherry picked from commit 14373711dd54be8a84e2f4f624bc58787f80cfbd)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:26:06 +00:00
Gao Xiang
cd21e62366 UPSTREAM: erofs: introduce on-disk lz4 fs configurations
Introduce z_erofs_lz4_cfgs to store all lz4 configurations.
Currently it's only max_distance, but will be used for new
features later.

Link: https://lore.kernel.org/r/20210329012308.28743-4-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: Ib9873b530bdc82f8cc5e79ac85ede737d883ff12
(cherry picked from commit 46249cded18ac0c4ffb7b177219510a133a51c00)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:25:23 +00:00
Gao Xiang
e17fd2ac9d UPSTREAM: erofs: introduce erofs_sb_has_xxx() helpers
Introduce erofs_sb_has_xxx() to make long checks short, especially
for later big pcluster & LZMA features.

Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I145fa9c670284b609b59a246f918fb09dc562356
(cherry picked from commit de06a6a375414be03ce5b1054f2d836591923a1d)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:24:39 +00:00
Yue Hu
ba1a3d1fb2 UPSTREAM: erofs: don't use erofs_map_blocks() any more
Currently, erofs_map_blocks() will be called only from
erofs_{bmap, read_raw_page} which are all for uncompressed files.
So, the compression branch in erofs_map_blocks() is pointless. Let's
remove it and use erofs_map_blocks_flatmode() directly. Also update
related comments.

Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: Ic3362c62d7c917f05f1cd11120ee6280a22221b8
(cherry picked from commit 8137824eddd2e790c61c70c20d70a087faca95fa)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:23:24 +00:00
Gao Xiang
384b2cdaf8 UPSTREAM: erofs: complete a missing case for inplace I/O
Add a missing case which could cause unnecessary page allocation but
not directly use inplace I/O instead, which increases runtime extra
memory footprint.

The detail is, considering an online file-backed page, the right half
of the page is chosen to be cached (e.g. the end page of a readahead
request) and some of its data doesn't exist in managed cache, so the
pcluster will be definitely kept in the submission chain. (IOWs, it
cannot be decompressed without I/O, e.g., due to the bypass queue).

Currently, DELAYEDALLOC/TRYALLOC cases can be downgraded as NOINPLACE,
and stop online pages from inplace I/O. After this patch, unneeded page
allocations won't be observed in pickup_page_for_submission() then.

Link: https://lore.kernel.org/r/20210321183227.5182-1-hsiangkao@aol.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>

Bug: 201372112
Change-Id: I639279e87b749b8625d2a8e77055a9fc52073542
(cherry picked from commit 0b964600d3aae56ff9d5bdd710a79f39a44c572c)
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
2021-09-28 08:21:51 +00:00
Minchan Kim
a9ac6ae90e BACKPORT: UPSTREAM: mm: fs: invalidate bh_lrus for only cold path
kernel test robot reported the regression of fio.write_iops[1] with [2].

Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.

This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).

"Xing, Zhengjun" confirmed

: I test the patch, the regression reduced to -2.9%.

[1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/
[2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration

Bug: 194673488
Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org
(cherry picked from commit 243418e3925d5b5b0657ae54c322d43035e97eed)
[Chris: resolved conflicts due to Minchan's AOSP LRU commits]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Change-Id: Icc5e456b058df516480b4378853464d6d7b43505
2021-09-27 17:48:37 -07:00
Alessio Balsini
cfc0a49c73 ANDROID: fs/fuse: Keep FUSE file times consistent with lower file
When FUSE passthrough is used, the lower file system file is manipulated
directly, but neither mtime, atime or ctime of the referencing FUSE file
is updated.

Fix by updating the file times when passthrough operations are
performed.

Bug: 200779468
Reported-by: Fengnan Chang <changfengnan@vivo.com>
Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I35b72196b2cc1d79a9f62ddb32e2cfa934c3b6d3
2021-09-24 13:30:08 +00:00
Biao Li
4652709913 ANDROID: fuse: Allocate zeroed memory for canonical path
The page used to contain the fuse_dentry_canonical_path to be handled in
fuse_dev_do_write is allocated using __get_free_pages(GFP_KERNEL).
The returned page may contain undefined data, that by chance may be
considered as a valid path name that is not in the cache. In that case,
if the FUSE daemon mistakenly doesn't fill the canonical path buffer,
the FUSE driver may fall into two blocking

  request_wait_answer(fuse_dev_write->kern_path->fuse_lookup_name)

causing a deadlock condition.

The stack is as follows:
find            S    0 20511  20117 0x00000000
Call trace:
[<ffffff8008085e78>] __switch_to+0xb8/0xd4
[<ffffff8008a0cac4>] __schedule+0x458/0x714
[<ffffff8008a0ce0c>] schedule+0x8c/0xa8
[<ffffff800833865c>] request_wait_answer+0x74/0x220
[<ffffff8008339f70>] __fuse_request_send+0x8c/0xa0
[<ffffff8008339fe4>] fuse_request_send+0x60/0x6c
[<ffffff800833c1a8>] fuse_dentry_canonical_path+0xb8/0x104
[<ffffff800820b14c>] do_sys_open+0x1b4/0x260
[<ffffff800820b27c>] SyS_openat+0x3c/0x4c
[<ffffff8008083540>] el0_svc_naked+0x34/0x38
mount.ntfs-3g   S    0  5845      1 0x00000000
Call trace:
[<ffffff8008085e78>] __switch_to+0xb8/0xd4
[<ffffff8008a0cac4>] __schedule+0x458/0x714
[<ffffff8008a0ce0c>] schedule+0x8c/0xa8
[<ffffff800833865c>] request_wait_answer+0x74/0x220
[<ffffff8008339f70>] __fuse_request_send+0x8c/0xa0
[<ffffff8008339fe4>] fuse_request_send+0x60/0x6c
[<ffffff800833bdb0>] fuse_simple_request+0x128/0x16c
[<ffffff800833dddc>] fuse_lookup_name+0x104/0x1b0
[<ffffff800833dee4>] fuse_lookup+0x5c/0x11c
[<ffffff800821861c>] lookup_slow+0xfc/0x174
[<ffffff800821b474>] walk_component+0xf0/0x290
[<ffffff800821bbac>] path_lookupat+0xa0/0x128
[<ffffff800821c7f4>] filename_lookup+0x84/0x124
[<ffffff800821c8d8>] kern_path+0x44/0x54
[<ffffff800833b0c8>] fuse_dev_do_write+0x828/0xa0c
[<ffffff800833b610>] fuse_dev_write+0x90/0xb4
[<ffffff800820b770>] do_iter_readv_writev+0xf4/0x13c
[<ffffff800820cc88>] do_readv_writev+0xec/0x220
[<ffffff800820d05c>] vfs_writev+0x60/0x74
[<ffffff800820d0ec>] do_writev+0x7c/0x100
[<ffffff800820e348>] SyS_writev+0x38/0x48
[<ffffff8008083540>] el0_svc_naked+0x34/0x38

Fix by ensuring that the page allocated for the canonical path is zeroed.

Bug: 194856119
Bug: 196051870
Fixes: 24ab59f6bb42 ("ANDROID: fuse: Add support for d_canonical_path")
Signed-off-by: Biao Li <libiao@allwinnertech.com>
Signed-off-by: Shuosheng Huang <huangshuosheng@allwinnertech.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I400815dc1049d90c308f5cf87ce60de97ff82131
2021-09-17 09:18:17 +01:00
Jaegeuk Kim
96db9b84a6 FROMGIT: f2fs: should use GFP_NOFS for directory inodes
We use inline_dentry which requires to allocate dentry page when adding a link.
If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem)
twice by f2fs_lock_op(). I think this should be okay, but how about stopping
the lockdep complaint [1]?

f2fs_create()
 - f2fs_lock_op()
 - f2fs_do_add_link()
  - __f2fs_find_entry
   - f2fs_get_read_data_page()
   -> kswapd
    - shrink_node
     - f2fs_evict_inode
      - f2fs_lock_op()

[1]

fs_reclaim
){+.+.}-{0:0}
:
kswapd0:        lock_acquire+0x114/0x394
kswapd0:        __fs_reclaim_acquire+0x40/0x50
kswapd0:        prepare_alloc_pages+0x94/0x1ec
kswapd0:        __alloc_pages_nodemask+0x78/0x1b0
kswapd0:        pagecache_get_page+0x2e0/0x57c
kswapd0:        f2fs_get_read_data_page+0xc0/0x394
kswapd0:        f2fs_find_data_page+0xa4/0x23c
kswapd0:        find_in_level+0x1a8/0x36c
kswapd0:        __f2fs_find_entry+0x70/0x100
kswapd0:        f2fs_do_add_link+0x84/0x1ec
kswapd0:        f2fs_mkdir+0xe4/0x1e4
kswapd0:        vfs_mkdir+0x110/0x1c0
kswapd0:        do_mkdirat+0xa4/0x160
kswapd0:        __arm64_sys_mkdirat+0x24/0x34
kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
kswapd0:        do_el0_svc+0x28/0xa0
kswapd0:        el0_svc+0x24/0x38
kswapd0:        el0_sync_handler+0x88/0xec
kswapd0:        el0_sync+0x1c0/0x200
kswapd0:
-> #1
(
&sbi->cp_rwsem
){++++}-{3:3}
:
kswapd0:        lock_acquire+0x114/0x394
kswapd0:        down_read+0x7c/0x98
kswapd0:        f2fs_do_truncate_blocks+0x78/0x3dc
kswapd0:        f2fs_truncate+0xc8/0x128
kswapd0:        f2fs_evict_inode+0x2b8/0x8b8
kswapd0:        evict+0xd4/0x2f8
kswapd0:        iput+0x1c0/0x258
kswapd0:        do_unlinkat+0x170/0x2a0
kswapd0:        __arm64_sys_unlinkat+0x4c/0x68
kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
kswapd0:        do_el0_svc+0x28/0xa0
kswapd0:        el0_svc+0x24/0x38
kswapd0:        el0_sync_handler+0x88/0xec
kswapd0:        el0_sync+0x1c0/0x200

Bug: 198991863
Cc: stable@vger.kernel.org
Fixes: bdbc90fa55 ("f2fs: don't put dentry page in pagecache into highmem")
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Light Hsieh <light.hsieh@mediatek.com>
Tested-by: Light Hsieh <light.hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 92d602bc7177325e7453189a22e0c8764ed3453e
 https://https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: Ie25d567390ea5185955356b330e04ebfa5c8836f
2021-09-16 08:46:29 -07:00
Jaegeuk Kim
90c60a51f5 UPSTREAM: f2fs: guarantee to write dirty data when enabling checkpoint back
We must flush all the dirty data when enabling checkpoint back. Let's guarantee
that first by adding a retry logic on sync_inodes_sb(). In addition to that,
this patch adds to flush data in fsync when checkpoint is disabled, which can
mitigate the sync_inodes_sb() failures in advance.

Bug: 194449609
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit dddd3d65293a52c2c3850c19b1e5115712e534d8)
Change-Id: I5bbef7386ddbb44fd925262fb68a8ef0a4960993
2021-09-09 10:41:52 +00:00
Linus Torvalds
3de34cc5ea UPSTREAM: pipe: make pipe writes always wake up readers
Since commit 1b6b26ae70 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.

In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already.  Doing
extraneous wakeups will only cause potential thundering herd problems.

However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it".  Even if there was no edge in sight.

Quoting Sandeep Patil:
 "The commit 1b6b26ae70 ('pipe: fix and clarify pipe write wakeup
  logic') changed pipe write logic to wakeup readers only if the pipe
  was empty at the time of write. However, there are libraries that
  relied upon the older behavior for notification scheme similar to
  what's described in [1]

  One such library 'realm-core'[2] is used by numerous Android
  applications. The library uses a similar notification mechanism as GNU
  Make but it never drains the pipe until it is full. When Android moved
  to v5.10 kernel, all applications using this library stopped working.

  The library has since been fixed[3] but it will be a while before all
  applications incorporate the updated library"

Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong.  Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.

So add the extraneous wakeup, to approximate the old behavior.

[ I say "approximate", because the exact old behavior was to do a wakeup
  not for each write(), but for each pipe buffer chunk that was filled
  in. The behavior introduced by this change is not that - this is just
  "every write will cause a wakeup, whether necessary or not", which
  seems to be sufficient for the broken library use. ]

It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.

See commit f467a6a664 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".

Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/

Bug: 193851993
Reported-by: Sandeep Patil <sspatil@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3a34b13a88caeb2800ab44a4918f230041b37dd9)
Signed-off-by: Sandeep Patil <sspatil@android.com>
Change-Id: Idcf3e8faa31bff47ada4b815237a355e0757b964
2021-08-03 21:20:32 +00:00
Sandeep Patil
6b7e007164 ANDROID: Revert "ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data"
This reverts commit
76879a1964 ("ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data")
to replace that with the bug fix that landed upstream at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a34b13a88caeb2800ab44a4918f230041b37dd9

Bug: 193851993
Test: Build and boot cuttlefish.

Signed-off-by: Sandeep Patil <sspatil@android.com>
Change-Id: Ic4f5e2cc516b4ea68ae7d63225d1529217990431
2021-08-03 21:20:13 +00:00
Kalesh Singh
36fbb55631 FROMGIT: procfs: prevent unpriveleged processes accessing fdinfo dir
The file permissions on the fdinfo dir from were changed from
S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added
for opening the fdinfo files [1].  However, the ptrace permission check
was not added to the directory, allowing anyone to get the open FD numbers
by reading the fdinfo directory.

Add the missing ptrace permission check for opening the fdinfo directory.

[1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com

Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com
Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Brown <broonie@kernel.org>

Bug: 151772539
Change-Id: I274b30aa0a5ce8412eae7161d31c6ee955035da9
(cherry picked from commit fc73829fa54b0c7af32d6da7c972eb3390957da4
 git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master)
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
2021-07-29 15:10:21 -04:00
Jaegeuk Kim
fc2d64ec5d FROMGIT: f2fs: don't sleep while grabing nat_tree_lock
This tries to fix priority inversion in the below condition resulting in
long checkpoint delay.

f2fs_get_node_info()
 - nat_tree_lock
  -> sleep to grab journal_rwsem by contention

                                     checkpoint
                                     - waiting for nat_tree_lock

In order to let checkpoint go, let's release nat_tree_lock, if there's a
journal_rwsem contention.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 191987855
(cherry picked from commit 2eeb0dce728a7eac3e4dfe355d98af40d61f7a26
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I97ac4f9d3bde399ab4f17f5b3a6e949ae9b79f0f
Signed-off-by: Daeho Jeong <daehojeong@google.com>
2021-07-29 01:29:53 +00:00
Sandeep Patil
76879a1964 ANDROID: fs: pipe: wakeup readers on small writes even if pipe had data
commit '1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic")'
change `pipe_write()` wakeup logic to wakeup readers only if the pipe
was empty.

This meant that applications that are not draining the pipe
before each write were exposed to unexpected timeouts / hangs in
epoll_wait() waiting for data in a pipe using EPOLLIN | EPOLLET flags.

This behaviour can be easily tested with android12-5.4 kernel where
the test that uses pipes for notifications in this way works while it
fails 100% with android12-5.10.

This change restores the old behavior to wakeup all pipe_readers if any
new data is written to the pipe.

Bug: 193851993
Bug: 193846582

Change-Id: If0c5a844091ccf16d5236bd072326325d4d5447a
Signed-off-by: Sandeep Patil <sspatil@google.com>
2021-07-27 17:35:49 +00:00
Jaegeuk Kim
e33cf9dd43 FROMGIT: f2fs: let's keep writing IOs on SBI_NEED_FSCK
SBI_NEED_FSCK is an indicator that fsck.f2fs needs to be triggered, so it
is not fully critical to stop any IO writes. So, let's allow to write data
instead of reporting EIO forever given SBI_NEED_FSCK, but do keep OPU.

Bug: 193659742
Fixes: 955772787667 ("f2fs: drop inplace IO if fs status is abnormal")
Cc: <stable@kernel.org> # v5.13+
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 1ffc8f5f7751f91fe6af527d426a723231b741a6
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: I9585358c1cee864064d9d210ee643f49ff1e6749
2021-07-22 23:53:01 +00:00
Matthias Maennich
d0a88ae479 ANDROID: Enable GKI Dr. No Enforcement
This effectively locks down OWNERS approval to a small group to guard
the code base against unintentional breakages.

Bug: 194314089
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Ifd1ea97639a622320ea83f901f6451e2e52b38d4
2021-07-21 20:51:47 +01:00
Daeho Jeong
11cec52238 FROMGIT: f2fs: add sysfs nodes to get GC info for each GC mode
Added gc_reclaimed_segments and gc_segment_mode sysfs nodes.
1) "gc_reclaimed_segments" shows how many segments have been
reclaimed by GC during a specific GC mode.
2) "gc_segment_mode" is used to control for which gc mode
the "gc_reclaimed_segments" node shows.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 182708936
(cherry picked from commit 07c6b5933ebf58b6132aea9f3e72a62486882bfb
 git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Change-Id: Ie8c2ccf5d36ce2f388c98b77e22848f3ff6645c3
Signed-off-by: Daeho Jeong <daehojeong@google.com>
2021-07-16 00:21:51 +00:00
Prasad Sodagudi
9b136eab76 ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region
Some of the platforms might be still expecting dedicated memory region
for ramoops node. So add logic to detect the start and size of the
ramoops memory region by looking up reserved memory region with
of_reserved_mem_lookup() when platform_get_resource() failed.

Bug: 191636717
Change-Id: Idc479b45fb3f637f7235efd6eabac62059d5e92b
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2021-07-15 18:27:15 +00:00
Kalesh Singh
1f0c32a667 UPSTREAM: procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and
/proc/<pid>/task/<tid>/fdinfo/<FD>.

The inode numbers can be used to uniquely identify DMA buffers in user
space and avoids a dependency on /proc/<pid>/fd/* when accounting
per-process DMA buffer sizes.

Link: https://lkml.kernel.org/r/20210308170651.919148-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3845f256a8b527127bfbd4ced21e93d9e89aa6d7)

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Id07beac3edcc95c0b42805e24e5486965acbb46e
2021-07-12 22:38:22 +00:00
Kalesh Singh
0c8c125f57 UPSTREAM: procfs: allow reading fdinfo with PTRACE_MODE_READ
Android captures per-process system memory state when certain low memory
events (e.g a foreground app kill) occur, to identify potential memory
hoggers.  In order to measure how much memory a process actually consumes,
it is necessary to include the DMA buffer sizes for that process in the
memory accounting.  Since the handle to DMA buffers are raw FDs, it is
important to be able to identify which processes have FD references to a
DMA buffer.

Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and
/proc/<pid>/fdinfo -- both are only readable by the process owner, as
follows:

  1. Do a readlink on each FD.
  2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD.
  3. stat the file to get the dmabuf inode number.
  4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size.

Accessing other processes' fdinfo requires root privileges.  This limits
the use of the interface to debugging environments and is not suitable for
production builds.  Granting root privileges even to a system process
increases the attack surface and is highly undesirable.

Since fdinfo doesn't permit reading process memory and manipulating
process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED.

Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7bc3fa0172a423afb34e6df7a3998e5f23b1a94a)

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I842b689670f731138592f45c7124ef446d9aa59a
2021-07-12 22:38:16 +00:00
Kalesh Singh
2e0476a465 Revert "FROMLIST: procfs: Allow reading fdinfo with PTRACE_MODE_READ"
Revert submission 1578844

Reason for revert: Will be replaced by upstream version
Reverted Changes:
Ic9c551998:FROMLIST: BACKPORT: procfs/dmabuf: Add inode numbe...
I41407760c:FROMLIST: procfs: Allow reading fdinfo with PTRACE...

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Iede4e9a65f87dad1ca0c6ecdeb42de47af4b37c8
2021-07-12 22:38:09 +00:00
Kalesh Singh
5ded961aa2 Revert "FROMLIST: BACKPORT: procfs/dmabuf: Add inode number to /..."
Revert submission 1578844

Reason for revert: Will be replaced by upstream version
Reverted Changes:
Ic9c551998:FROMLIST: BACKPORT: procfs/dmabuf: Add inode numbe...
I41407760c:FROMLIST: procfs: Allow reading fdinfo with PTRACE...

Bug: 159126739
Bug: 167141117
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If02a6dc9a525193f286a138791de49085cd91972
2021-07-12 22:38:01 +00:00
Jaegeuk Kim
3ee5565017 UPSTREAM: f2fs: initialize page->private when using for our internal use
We need to guarantee it's initially zero. Otherwise, it'll hurt entire flag
operations.

Bug: 192173168
Fixes: b763f3bedc2d ("f2fs: restructure f2fs page.private layout")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
(cherry picked from commit c9ebd3df43c0 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master)
Change-Id: I02211d4bd2b1cb526e5fbcd55673043e3082d011
2021-07-12 21:01:34 +00:00
Kuan-Ying Lee
a7a3b31d58 ANDROID: syscall_check: add vendor hook for open syscall
Through this vendor hook, we can get the timing to check
current running task for the validation of its credential
and open operation.

Bug: 191291287

Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Change-Id: Ia644ceb02dbc230ee1d25cad3630c2c3f908e41a
2021-07-09 13:48:44 +00:00
Isaac J. Manjarres
bd2ca0ba5b FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region
The reserved memory region for ramoops is assumed to be at a fixed
and known location when read from the devicetree. This is not desirable
in environments where it is preferred for the region to be dynamically
allocated at runtime, as opposed to it being fixed at compile time.

Change the logic for detecting the start and size of the ramoops
memory region by looking up the reserved memory region instead of
using platform_get_resource(), which assumes that the location
of the memory is known ahead of time.

Bug: 191636717
Link: https://lore.kernel.org/patchwork/patch/1451704/
Change-Id: I24066de9f4fe1f1575cb1bbb1687c37a2b1938a4
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2021-07-08 18:16:31 +00:00
Jaegeuk Kim
8102df91f2 Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.10.y' into android12-5.10
* aosp/upstream-f2fs-stable-linux-5.10.y:
  Revert "f2fs: avoid attaching SB_ACTIVE flag during mount/remount"
  f2fs: remove false alarm on iget failure during GC
  f2fs: enable extent cache for compression files in read-only
  f2fs: fix to avoid adding tab before doc section
  f2fs: introduce f2fs_casefolded_name slab cache
  f2fs: swap: support migrating swapfile in aligned write mode
  f2fs: swap: remove dead codes
  f2fs: compress: add compress_inode to cache compressed blocks
  f2fs: clean up /sys/fs/f2fs/<disk>/features
  f2fs: add pin_file in feature list
  f2fs: Advertise encrypted casefolding in sysfs
  f2fs: Show casefolding support only when supported
  f2fs: support RO feature
  f2fs: logging neatening

Bug: 186107892
Bug: 190759634
Bug: 190517210
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I8eb93d8a43304b98166676da52a9c2434b15b942
2021-06-23 19:54:41 -07:00
Jaegeuk Kim
bb51a33182 Revert "f2fs: avoid attaching SB_ACTIVE flag during mount/remount"
This reverts commit 42bbf0bcc2.
2021-06-23 01:42:05 -07:00
Jaegeuk Kim
c81ac64da1 f2fs: remove false alarm on iget failure during GC
This patch removes setting SBI_NEED_FSCK when GC gets an error on f2fs_iget,
since f2fs_iget can give ENOMEM and others by race condition.
If we set this critical fsck flag, we'll get EIO during fsync via the below
code path.

In f2fs_inplace_write_data(),

	if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) {
		err = -EIO;
		goto drop_bio;
	}

Fixes: 9557727876674 ("f2fs: drop inplace IO if fs status is abnormal")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-23 01:41:57 -07:00
Daeho Jeong
cdeff03989 f2fs: enable extent cache for compression files in read-only
Let's allow extent cache for RO partition.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-21 07:24:26 -07:00
Chao Yu
44e0be85eb f2fs: introduce f2fs_casefolded_name slab cache
Add a slab cache: "f2fs_casefolded_name" for memory allocation
of casefold name.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-06-21 07:24:24 -07:00
Greg Kroah-Hartman
9e08e97ec6 Merge 5.10.43 into android12-5.10
Changes in 5.10.43
	btrfs: tree-checker: do not error out if extent ref hash doesn't match
	net: usb: cdc_ncm: don't spew notifications
	hwmon: (dell-smm-hwmon) Fix index values
	hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
	netfilter: conntrack: unregister ipv4 sockopts on error unwind
	efi/fdt: fix panic when no valid fdt found
	efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
	efi/libstub: prevent read overflow in find_file_option()
	efi: cper: fix snprintf() use in cper_dimm_err_location()
	vfio/pci: Fix error return code in vfio_ecap_init()
	vfio/pci: zap_vma_ptes() needs MMU
	samples: vfio-mdev: fix error handing in mdpy_fb_probe()
	vfio/platform: fix module_put call in error flow
	ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
	HID: logitech-hidpp: initialize level variable
	HID: pidff: fix error return code in hid_pidff_init()
	HID: i2c-hid: fix format string mismatch
	devlink: Correct VIRTUAL port to not have phys_port attributes
	net/sched: act_ct: Offload connections with commit action
	net/sched: act_ct: Fix ct template allocation for zone 0
	mptcp: always parse mptcp options for MPC reqsk
	nvme-rdma: fix in-casule data send for chained sgls
	ACPICA: Clean up context mutex during object deletion
	perf probe: Fix NULL pointer dereference in convert_variable_location()
	net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
	net: sock: fix in-kernel mark setting
	net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
	net/tls: Fix use-after-free after the TLS device goes down and up
	net/mlx5e: Fix incompatible casting
	net/mlx5: Check firmware sync reset requested is set before trying to abort it
	net/mlx5e: Check for needed capability for cvlan matching
	net/mlx5: DR, Create multi-destination flow table with level less than 64
	nvmet: fix freeing unallocated p2pmem
	netfilter: nft_ct: skip expectations for confirmed conntrack
	netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
	drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
	bpf: Simplify cases in bpf_base_func_proto
	bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
	ieee802154: fix error return code in ieee802154_add_iface()
	ieee802154: fix error return code in ieee802154_llsec_getparams()
	igb: add correct exception tracing for XDP
	ixgbevf: add correct exception tracing for XDP
	cxgb4: fix regression with HASH tc prio value update
	ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
	ice: Fix allowing VF to request more/less queues via virtchnl
	ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
	ice: handle the VF VSI rebuild failure
	ice: report supported and advertised autoneg using PHY capabilities
	ice: Allow all LLDP packets from PF to Tx
	i2c: qcom-geni: Add shutdown callback for i2c
	cxgb4: avoid link re-train during TC-MQPRIO configuration
	i40e: optimize for XDP_REDIRECT in xsk path
	i40e: add correct exception tracing for XDP
	ice: simplify ice_run_xdp
	ice: optimize for XDP_REDIRECT in xsk path
	ice: add correct exception tracing for XDP
	ixgbe: optimize for XDP_REDIRECT in xsk path
	ixgbe: add correct exception tracing for XDP
	arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
	optee: use export_uuid() to copy client UUID
	bus: ti-sysc: Fix am335x resume hang for usb otg module
	arm64: dts: ls1028a: fix memory node
	arm64: dts: zii-ultra: fix 12V_MAIN voltage
	arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
	ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
	ARM: dts: imx7d-pico: Fix the 'tuning-step' property
	ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
	bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
	tipc: add extack messages for bearer/media failure
	tipc: fix unique bearer names sanity check
	serial: stm32: fix threaded interrupt handling
	riscv: vdso: fix and clean-up Makefile
	io_uring: fix link timeout refs
	io_uring: use better types for cflags
	drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
	drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
	Bluetooth: fix the erroneous flush_work() order
	Bluetooth: use correct lock to prevent UAF of hdev object
	wireguard: do not use -O3
	wireguard: peer: allocate in kmem_cache
	wireguard: use synchronize_net rather than synchronize_rcu
	wireguard: selftests: remove old conntrack kconfig value
	wireguard: selftests: make sure rp_filter is disabled on vethc
	wireguard: allowedips: initialize list head in selftest
	wireguard: allowedips: remove nodes in O(1)
	wireguard: allowedips: allocate nodes in kmem_cache
	wireguard: allowedips: free empty intermediate nodes when removing single node
	net: caif: added cfserl_release function
	net: caif: add proper error handling
	net: caif: fix memory leak in caif_device_notify
	net: caif: fix memory leak in cfusbl_device_notify
	HID: i2c-hid: Skip ELAN power-on command after reset
	HID: magicmouse: fix NULL-deref on disconnect
	HID: multitouch: require Finger field to mark Win8 reports as MT
	gfs2: fix scheduling while atomic bug in glocks
	ALSA: timer: Fix master timer notification
	ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
	ALSA: hda: update the power_state during the direct-complete
	ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
	ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
	ext4: fix memory leak in ext4_fill_super
	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
	ext4: fix fast commit alignment issues
	ext4: fix memory leak in ext4_mb_init_backend on error path.
	ext4: fix accessing uninit percpu counter variable with fast_commit
	usb: dwc2: Fix build in periphal-only mode
	pid: take a reference when initializing `cad_pid`
	ocfs2: fix data corruption by fallocate
	mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
	mm/page_alloc: fix counting of free pages after take off from buddy
	x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
	x86/sev: Check SME/SEV support in CPUID first
	nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
	drm/amdgpu: Don't query CE and UE errors
	drm/amdgpu: make sure we unpin the UVD BO
	x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
	powerpc/kprobes: Fix validation of prefixed instructions across page boundary
	btrfs: mark ordered extent and inode with error if we fail to finish
	btrfs: fix error handling in btrfs_del_csums
	btrfs: return errors from btrfs_del_csums in cleanup_ref_head
	btrfs: fixup error handling in fixup_inode_link_counts
	btrfs: abort in rename_exchange if we fail to insert the second ref
	btrfs: fix deadlock when cloning inline extents and low on available space
	mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
	drm/msm/dpu: always use mdp device to scale bandwidth
	btrfs: fix unmountable seed device after fstrim
	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
	KVM: arm64: Fix debug register indexing
	x86/kvm: Teardown PV features on boot CPU as well
	x86/kvm: Disable kvmclock on all CPUs on shutdown
	x86/kvm: Disable all PV features on crash
	lib/lz4: explicitly support in-place decompression
	i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
	netfilter: nf_tables: missing error reporting for not selected expressions
	xen-netback: take a reference to the RX task thread
	neighbour: allow NUD_NOARP entries to be forced GCed
	Linux 5.10.43

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d7ec0878193e4e454076809b7fb71fcc4e3d810
2021-06-12 14:48:14 +02:00
Anand Jain
fe910d20e2 btrfs: fix unmountable seed device after fstrim
commit 5e753a817b2d5991dfe8a801b7b1e8e79a1c5a20 upstream.

The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:

  $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
  $ btrfstune -S 1 /dev/loop0
  $ mount /dev/loop0 /btrfs
  $ btrfs dev add -f /dev/loop1 /btrfs
  BTRFS info (device loop0): relocating block group 290455552 flags system
  BTRFS info (device loop0): relocating block group 1048576 flags system
  BTRFS info (device loop0): disk added /dev/loop1
  $ umount /btrfs

Mount the sprout device and run fstrim:

  $ mount /dev/loop1 /btrfs
  $ fstrim /btrfs
  $ umount /btrfs

Now try to mount the seed device, and it fails:

  $ mount /dev/loop0 /btrfs
  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device:

 $ dmesg -kt | tail
 <snip>
 BTRFS error (device loop0): bad tree block start, want 5292032 have 0
 BTRFS warning (device loop0): couldn't read-tree root
 BTRFS error (device loop0): open_ctree failed

>From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:

  $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
  <snip>
  item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
  	block group used 114688 chunk_objectid 256 flags METADATA
  <snip>

>From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:

  $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
  <snip>
  item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
  	block group used 32768 chunk_objectid 256 flags METADATA
  <snip>

BPF kernel tracing the fstrim command finds the missing block 5292032
within the range of the discarded blocks as below:

  kprobe:btrfs_discard_extent {
  	printf("freeing start %llu end %llu num_bytes %llu:\n",
  		arg1, arg1+arg2, arg2);
  }

  freeing start 5259264 end 5406720 num_bytes 147456
  <snip>

Fix this by avoiding the discard command to the readonly seed device.

Reported-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Filipe Manana
baa6763123 btrfs: fix deadlock when cloning inline extents and low on available space
commit 76a6d5cd74479e7ec8a7f9a29bce63d5549b6b2e upstream.

There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
0df50d47d1 btrfs: abort in rename_exchange if we fail to insert the second ref
commit dc09ef3562726cd520c8338c1640872a60187af5 upstream.

Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00
Josef Bacik
48568f3944 btrfs: fixup error handling in fixup_inode_link_counts
commit 011b28acf940eb61c000059dd9e2cfcbf52ed96b upstream.

This function has the following pattern

	while (1) {
		ret = whatever();
		if (ret)
			goto out;
	}
	ret = 0
out:
	return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

	ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10 13:39:28 +02:00