android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Minchan Kim	dba79c3af3	ANDROID: mm: page_pinner: report test_page_isolation_failure Sometimes, alloc_contig_range fails at test_pages_isolated. Report the failed page to page_pinner for tracking them to be able to investigate it. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ifcb913faa87a131915efd72848e6ca59c15b75b4	2021-07-12 13:57:40 -07:00
Minchan Kim	9f47e5fdda	ANDROID: mm: page_pinner: change function names Make function name more clear to indicate what it's doing. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I6adabc0df6a54cf24d8287bf0f22cf7dcdc7ad03	2021-07-12 13:57:39 -07:00
Minchan Kim	ddc4a48797	ANDROID: mm: page_pinner: introduce failure_tracking feature CMA allocation can fail by temporal page refcount increasement by get_page API as well as get_user_pages friends. However, since get_page is one of the most hot function, it is hard to hook get_page to get callstack everytime due to performance concern. Furthermore, get_page could be nested multiple times so we couldn't track all of the pin sites on limited space of page_pinner. Thus, here approach is keep tracking of put_page callsite rather than get_page once VM found the page migration failed. It's based on assumption: 1. Since it's temporal page refcount, it could be released soon before overflowing dmesg log buffer 2. developer can find the pair of get_page by reviewing put_page. By default, it's eanbled. If you want to disable it: echo 0 > $debugfs/page_pinner/failure_tracking You can capture the tracking using: cat $debugfs/page_pinner/alloc_contig_failed note: the example below is artificial: Page pinned ts 386067292 us count 0 PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate\|dirty\|swapbacked) __page_pinner_migration_failed+0x30/0x104 putback_lru_page+0x90/0xac putback_movable_pages+0xc4/0x204 __alloc_contig_migrate_range+0x290/0x31c alloc_contig_range+0x114/0x2bc cma_alloc+0x2d8/0x698 cma_alloc_write+0x58/0xb8 simple_attr_write+0xd4/0x124 debugfs_attr_write+0x50/0xd8 full_proxy_write+0x70/0xf8 vfs_write+0x168/0x3a8 ksys_write+0x7c/0xec __arm64_sys_write+0x20/0x30 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 Page pinned ts 385867394 us count 0 PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate\|dirty\|swapbacked) __page_pinner_migration_failed+0x30/0x104 __alloc_contig_migrate_range+0x200/0x31c alloc_contig_range+0x114/0x2bc cma_alloc+0x2d8/0x698 cma_alloc_write+0x58/0xb8 simple_attr_write+0xd4/0x124 debugfs_attr_write+0x50/0xd8 full_proxy_write+0x70/0xf8 vfs_write+0x168/0x3a8 ksys_write+0x7c/0xec __arm64_sys_write+0x20/0x30 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 el0_sync_handler+0x88/0xec el0_sync+0x198/0x1c0 Bug: 183414571 Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ie79902c18390eb9f320d823839bb9d9a7fdcdb31	2021-04-30 09:13:34 -07:00
Minchan Kim	6e12c5b7d4	ANDROID: mm: introduce page_pinner For CMA allocation, it's really critical to migrate a page but sometimes it fails. One of the reasons is some driver holds a page refcount for a long time so VM couldn't migrate the page at that time. The concern here is there is no way to find the who hold the refcount of the page effectively. This patch introduces feature to keep tracking page's pinner. All get_page sites are vulnerable to pin a page for a long time but the cost to keep track it would be significat since get_page is the most frequent kernel operation. Furthermore, the page could be not user page but kernel page which is not related to the page migration failure. So, this patch keeps tracking only get_user_pages/follow_page with (FOLL_GET\|PIN friends because they are the very common APIs to pin user pages which could cause migration failure and the less frequent than get_page so runtime cost wouldn't be that big but could cover many cases effectively. This patch also introduces put_user_page API. It aims for attributing "the pinner releases the page from now on" while it release the page refcount. Thus, any user of get_user_pages/follow_page(FOLL_GET) must use put_user_page as pair of those functions. Otherwise, page_pinner will treat them long term pinner as false postive but nothing should affect stability. * $debugfs/page_pinner/threshold It indicates threshold(microsecond) to flag long term pinning. It's configurable(Default is 300000us). Once you write new value to the threshold, old data will clear. * $debugfs/page_pinner/longterm_pinner It shows call sites where the duration of pinning was greater than the threshold. Internally, it uses a static array to keep 4096 elements and overwrites old ones once overflow happens. Therefore, you could lose some information. example) Page pinned ts 76953865787 us count 1 PFN 9856945 Block 9625 type Movable Flags 0x8000000000080014(uptodate\|lru\|swapbacked) __set_page_pinner+0x34/0xcc try_grab_page+0x19c/0x1a0 follow_page_pte+0x1c0/0x33c follow_page_mask+0xc0/0xc8 __get_user_pages+0x178/0x414 __gup_longterm_locked+0x80/0x148 internal_get_user_pages_fast+0x140/0x174 pin_user_pages_fast+0x24/0x40 CCC BBB AAA __arm64_sys_ioctl+0x94/0xd0 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 note: page_pinner doesn't guarantee attributing/unattributing are atomic if they happen at the same time. It's just best effort so false-positive could happen. Bug: 183414571 Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ife37ec360eef993d390b9c131732218a4dfd2f04	2021-04-30 09:13:34 -07:00

4 Commits