Sometimes, alloc_contig_range fails at test_pages_isolated.
Report the failed page to page_pinner for tracking them to
be able to investigate it.
Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ifcb913faa87a131915efd72848e6ca59c15b75b4
Make function name more clear to indicate what it's doing.
Bug: 192475091
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I6adabc0df6a54cf24d8287bf0f22cf7dcdc7ad03
CMA allocation can fail by temporal page refcount increasement
by get_page API as well as get_user_pages friends.
However, since get_page is one of the most hot function, it is
hard to hook get_page to get callstack everytime due to
performance concern. Furthermore, get_page could be nested
multiple times so we couldn't track all of the pin sites on
limited space of page_pinner.
Thus, here approach is keep tracking of put_page callsite rather
than get_page once VM found the page migration failed.
It's based on assumption:
1. Since it's temporal page refcount, it could be released soon
before overflowing dmesg log buffer
2. developer can find the pair of get_page by reviewing put_page.
By default, it's eanbled. If you want to disable it:
echo 0 > $debugfs/page_pinner/failure_tracking
You can capture the tracking using:
cat $debugfs/page_pinner/alloc_contig_failed
note: the example below is artificial:
Page pinned ts 386067292 us count 0
PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked)
__page_pinner_migration_failed+0x30/0x104
putback_lru_page+0x90/0xac
putback_movable_pages+0xc4/0x204
__alloc_contig_migrate_range+0x290/0x31c
alloc_contig_range+0x114/0x2bc
cma_alloc+0x2d8/0x698
cma_alloc_write+0x58/0xb8
simple_attr_write+0xd4/0x124
debugfs_attr_write+0x50/0xd8
full_proxy_write+0x70/0xf8
vfs_write+0x168/0x3a8
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
Page pinned ts 385867394 us count 0
PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked)
__page_pinner_migration_failed+0x30/0x104
__alloc_contig_migrate_range+0x200/0x31c
alloc_contig_range+0x114/0x2bc
cma_alloc+0x2d8/0x698
cma_alloc_write+0x58/0xb8
simple_attr_write+0xd4/0x124
debugfs_attr_write+0x50/0xd8
full_proxy_write+0x70/0xf8
vfs_write+0x168/0x3a8
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x198/0x1c0
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ie79902c18390eb9f320d823839bb9d9a7fdcdb31
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.
The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure. So, this patch keeps
tracking only get_user_pages/follow_page with (FOLL_GET|PIN friends
because they are the very common APIs to pin user pages which could
cause migration failure and the less frequent than get_page so
runtime cost wouldn't be that big but could cover many cases
effectively.
This patch also introduces put_user_page API. It aims for attributing
"the pinner releases the page from now on" while it release the
page refcount. Thus, any user of get_user_pages/follow_page(FOLL_GET)
must use put_user_page as pair of those functions. Otherwise,
page_pinner will treat them long term pinner as false postive but
nothing should affect stability.
* $debugfs/page_pinner/threshold
It indicates threshold(microsecond) to flag long term pinning.
It's configurable(Default is 300000us). Once you write new value
to the threshold, old data will clear.
* $debugfs/page_pinner/longterm_pinner
It shows call sites where the duration of pinning was greater than
the threshold. Internally, it uses a static array to keep 4096
elements and overwrites old ones once overflow happens. Therefore,
you could lose some information.
example)
Page pinned ts 76953865787 us count 1
PFN 9856945 Block 9625 type Movable Flags 0x8000000000080014(uptodate|lru|swapbacked)
__set_page_pinner+0x34/0xcc
try_grab_page+0x19c/0x1a0
follow_page_pte+0x1c0/0x33c
follow_page_mask+0xc0/0xc8
__get_user_pages+0x178/0x414
__gup_longterm_locked+0x80/0x148
internal_get_user_pages_fast+0x140/0x174
pin_user_pages_fast+0x24/0x40
CCC
BBB
AAA
__arm64_sys_ioctl+0x94/0xd0
el0_svc_common+0xa4/0x180
do_el0_svc+0x28/0x88
el0_svc+0x14/0x24
note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.
Bug: 183414571
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ife37ec360eef993d390b9c131732218a4dfd2f04