android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Greg Kroah-Hartman	b129c98dc6	Merge 5.10.17 into android12-5.10 Changes in 5.10.17 objtool: Fix seg fault with Clang non-section symbols Revert "dts: phy: add GPIO number and active state used for phy reset" gpio: mxs: GPIO_MXS should not default to y unconditionally gpio: ep93xx: fix BUG_ON port F usage gpio: ep93xx: Fix single irqchip with multi gpiochips tracing: Do not count ftrace events in top level enable output tracing: Check length before giving out the filter buffer drm/i915: Fix overlay frontbuffer tracking arm/xen: Don't probe xenbus as part of an early initcall cgroup: fix psi monitor for root cgroup Revert "drm/amd/display: Update NV1x SR latency values" drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it drm/dp_mst: Don't report ports connected if nothing is attached to them dmaengine: move channel device_node deletion to driver tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1 arm64: dts: rockchip: Fix PCIe DT properties on rk3399 arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node platform/x86: hp-wmi: Disable tablet-mode reporting by default arm64: dts: rockchip: Disable display for NanoPi R2S ovl: perform vfs_getxattr() with mounter creds cap: fix conversions on getxattr ovl: skip getxattr of security labels scsi: lpfc: Fix EEH encountering oops with NVMe traffic x86/split_lock: Enable the split lock feature on another Alder Lake CPU nvme-pci: ignore the subsysem NQN on Phison E16 drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL drm/amd/display: Add more Clock Sources to DCN2.1 drm/amd/display: Release DSC before acquiring drm/amd/display: Fix dc_sink kref count in emulated_link_detect drm/amd/display: Free atomic state after drm_atomic_commit drm/amd/display: Decrement refcount of dc_sink before reassignment riscv: virt_addr_valid must check the address belongs to linear mapping bfq-iosched: Revert "bfq: Fix computation of shallow depth" ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL kallsyms: fix nonconverging kallsyms table with lld ARM: ensure the signal page contains defined contents ARM: kexec: fix oops after TLB are invalidated ubsan: implement __ubsan_handle_alignment_assumption Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs" x86/efi: Remove EFI PGD build time checks lkdtm: don't move ctors to .rodata KVM: x86: cleanup CR3 reserved bits checks cgroup-v1: add disabled controller check in cgroup1_parse_param() dmaengine: idxd: fix misc interrupt completion ath9k: fix build error with LEDS_CLASS=m mt76: dma: fix a possible memory leak in mt76_add_fragment() drm/vc4: hvs: Fix buffer overflow with the dlist handling dmaengine: idxd: check device state before issue command bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3 bpf: Check for integer overflow when using roundup_pow_of_two() netfilter: xt_recent: Fix attempt to update deleted entry selftests: netfilter: fix current year netfilter: nftables: fix possible UAF over chains from packet path in netns netfilter: flowtable: fix tcp and udp header checksum update xen/netback: avoid race in xenvif_rx_ring_slots_available() net: hdlc_x25: Return meaningful error code in x25_open net: ipa: set error code in gsi_channel_setup() hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive() net: enetc: initialize the RFS and RSS memories selftests: txtimestamp: fix compilation issue net: stmmac: set TxQ mode back to DCB after disabling CBS ibmvnic: Clear failover_pending if unable to schedule netfilter: conntrack: skip identical origin tuple in same zone only scsi: scsi_debug: Fix a memory leak x86/build: Disable CET instrumentation in the kernel for 32-bit too net: dsa: felix: implement port flushing on .phylink_mac_link_down net: hns3: add a check for queue_id in hclge_reset_vf_queue() net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() net: hns3: add a check for index in hclge_get_rss_key() firmware_loader: align .builtin_fw to 8 drm/sun4i: tcon: set sync polarity for tcon1 channel drm/sun4i: dw-hdmi: always set clock rate drm/sun4i: Fix H6 HDMI PHY configuration drm/sun4i: dw-hdmi: Fix max. frequency for H6 clk: sunxi-ng: mp: fix parent rate change flag check i2c: stm32f7: fix configuration of the digital filter h8300: fix PREEMPTION build, TI_PRE_COUNT undefined scripts: set proper OpenSSL include dir also for sign-file x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init() arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page rxrpc: Fix clearance of Tx/Rx ring when releasing a call udp: fix skb_copy_and_csum_datagram with odd segment sizes net: dsa: call teardown method on probe failure cpufreq: ACPI: Extend frequency tables to cover boost frequencies cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there net: gro: do not keep too many GRO packets in napi->rx_list net: fix iteration for sctp transport seq_files net/vmw_vsock: fix NULL pointer dereference net/vmw_vsock: improve locking in vsock_connect_timeout() net: watchdog: hold device global xmit lock during tx disable bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT vsock/virtio: update credit only if socket is not closed vsock: fix locking in vsock_shutdown() net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net/qrtr: restrict user-controlled length in qrtr_tun_write_iter() ovl: expand warning in ovl_d_real() kcov, usb: only collect coverage from __usb_hcd_giveback_urb in softirq Linux 5.10.17 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id0300681f52b51d3f466f1e66ec3a6c25f65f4d3	2021-02-18 11:21:01 +01:00
Chen Zhou	3e53d64e9a	cgroup-v1: add disabled controller check in cgroup1_parse_param() [ Upstream commit 61e960b07b637f0295308ad91268501d744c21b5 ] When mounting a cgroup hierarchy with disabled controller in cgroup v1, all available controllers will be attached. For example, boot with cgroup_no_v1=cpu or cgroup_disable=cpu, and then mount with "mount -t cgroup -ocpu cpu /sys/fs/cgroup/cpu", then all enabled controllers will be attached except cpu. Fix this by adding disabled controller check in cgroup1_parse_param(). If the specified controller is disabled, just return error with information "Disabled controller xx" rather than attaching all the other enabled controllers. Fixes: `f5dfb5315d` ("cgroup: take options parsing into ->parse_monolithic()") Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Reviewed-by: Zefan Li <lizefan.x@bytedance.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-02-17 11:02:25 +01:00
Greg Kroah-Hartman	9cf2ceaffd	Merge 5.10.5 into android12-5.10 Changes in 5.10.5 net/sched: sch_taprio: reset child qdiscs before freeing them mptcp: fix security context on server socket ethtool: fix error paths in ethnl_set_channels() ethtool: fix string set id check md/raid10: initialize r10_bio->read_slot before use. drm/amd/display: Add get_dig_frontend implementation for DCEx io_uring: close a small race gap for files cancel jffs2: Allow setting rp_size to zero during remounting jffs2: Fix NULL pointer dereference in rp_size fs option parsing spi: dw-bt1: Fix undefined devm_mux_control_get symbol opp: fix memory leak in _allocate_opp_table opp: Call the missing clk_put() on error scsi: block: Fix a race in the runtime power management code mm/hugetlb: fix deadlock in hugetlb_cow error path mm: memmap defer init doesn't work as expected lib/zlib: fix inflating zlib streams on s390 io_uring: don't assume mm is constant across submits io_uring: use bottom half safe lock for fixed file data io_uring: add a helper for setting a ref node io_uring: fix io_sqe_files_unregister() hangs uapi: move constants from <linux/kernel.h> to <linux/const.h> tools headers UAPI: Sync linux/const.h with the kernel headers cgroup: Fix memory leak when parsing multiple source parameters zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c scsi: cxgb4i: Fix TLS dependency Bluetooth: hci_h5: close serdev device and free hu in h5_close fbcon: Disable accelerated scrolling reiserfs: add check for an invalid ih_entry_count misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells() media: gp8psk: initialize stats at power control logic f2fs: fix shift-out-of-bounds in sanity_check_raw_super() ALSA: seq: Use bool for snd_seq_queue internal flags ALSA: rawmidi: Access runtime->avail always in spinlock bfs: don't use WARNING: string when it's just info. ext4: check for invalid block size early when mounting a file system fcntl: Fix potential deadlock in send_sig{io, urg}() io_uring: check kthread stopped flag when sq thread is unparked rtc: sun6i: Fix memleak in sun6i_rtc_clk_init module: set MODULE_STATE_GOING state when a module fails to load quota: Don't overflow quota file offsets rtc: pl031: fix resource leak in pl031_probe powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() i3c master: fix missing destroy_workqueue() on error in i3c_master_register NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode f2fs: avoid race condition for shrinker count f2fs: fix race of pending_pages in decompression module: delay kobject uevent until after module init call powerpc/64: irq replay remove decrementer overflow check fs/namespace.c: WARN if mnt_count has become negative watchdog: rti-wdt: fix reference leak in rti_wdt_probe um: random: Register random as hwrng-core device um: ubd: Submit all data segments atomically NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails drm/amd/display: updated wm table for Renoir tick/sched: Remove bogus boot "safety" check s390: always clear kernel stack backchain before calling functions io_uring: remove racy overflow list fast checks ALSA: pcm: Clear the full allocated memory at hw_params dm verity: skip verity work if I/O error when system is shutting down ext4: avoid s_mb_prefetch to be zero in individual scenarios device-dax: Fix range release Linux 5.10.5 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2b481bfac06bafdef2cf3cc1ac2c2a4ddf9913dc	2021-01-10 12:19:03 +01:00
Qinglang Miao	bf81221a40	cgroup: Fix memory leak when parsing multiple source parameters commit 2d18e54dd8662442ef5898c6bdadeaf90b3cebbc upstream. A memory leak is found in cgroup1_parse_param() when multiple source parameters overwrite fc->source in the fs_context struct without free. unreferenced object 0xffff888100d930e0 (size 16): comm "mount", pid 520, jiffies 4303326831 (age 152.783s) hex dump (first 16 bytes): 74 65 73 74 6c 65 61 6b 00 00 00 00 00 00 00 00 testleak........ backtrace: [<000000003e5023ec>] kmemdup_nul+0x2d/0xa0 [<00000000377dbdaa>] vfs_parse_fs_string+0xc0/0x150 [<00000000cb2b4882>] generic_parse_monolithic+0x15a/0x1d0 [<000000000f750198>] path_mount+0xee1/0x1820 [<0000000004756de2>] do_mount+0xea/0x100 [<0000000094cafb0a>] __x64_sys_mount+0x14b/0x1f0 Fix this bug by permitting a single source parameter and rejecting with an error all subsequent ones. Fixes: `8d2451f499` ("cgroup1: switch to option-by-option parsing") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-06 14:56:51 +01:00
Greg Kroah-Hartman	34ed0e2946	Merge `5364abc579` ("Merge tag 'arc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc") into android-mainline Steps along the 5.7-rc1 merge. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib9f87147ac3d81985496818b0c61bdd086140eed	2020-04-08 09:25:42 +02:00
Greg Kroah-Hartman	ae56fd997e	Merge 5.6-rc6 into android-mainline Linux 5.6-rc6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6c2d7aff44ad5a9b75030b72d34ca5dbd5ad3ceb	2020-03-16 08:09:43 +01:00
Tejun Heo	e7b20d9796	cgroup: Restructure release_agent_path handling cgrp->root->release_agent_path is protected by both cgroup_mutex and release_agent_path_lock and readers can hold either one. The dual-locking scheme was introduced while breaking a locking dependency issue around cgroup_mutex but doesn't make sense anymore given that the only remaining reader which uses cgroup_mutex is cgroup1_releaes_agent(). This patch updates cgroup1_release_agent() to use release_agent_path_lock so that release_agent_path is always protected only by release_agent_path_lock. While at it, convert strlen() based empty string checks to direct tests on the first character as suggested by Linus. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2020-03-12 16:44:35 -04:00
Tycho Andersen	2e5383d790	cgroup1: don't call release_agent when it is "" Older (and maybe current) versions of systemd set release_agent to "" when shutting down, but do not set notify_on_release to 0. Since `64e90a8acb` ("Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()"), we filter out such calls when the user mode helper path is "". However, when used in conjunction with an actual (i.e. non "") STATIC_USERMODEHELPER, the path is never "", so the real usermode helper will be called with argv[0] == "". Let's avoid this by not invoking the release_agent when it is "". Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-04 11:53:33 -05:00
Vasily Averin	db8dd96972	cgroup-v1: cgroup_pidlist_next should update position index if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. # mount \| grep cgroup # dd if=/mnt/cgroup.procs bs=1 # normal output ... 1294 1295 1296 1304 1382 584+0 records in 584+0 records out 584 bytes copied dd: /mnt/cgroup.procs: cannot skip to specified offset 83 <<< generates end of last line 1383 <<< ... and whole last line once again 0+1 records in 0+1 records out 8 bytes copied dd: /mnt/cgroup.procs: cannot skip to specified offset 1386 <<< generates last line anyway 0+1 records in 0+1 records out 5 bytes copied https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2020-02-12 16:53:35 -05:00
Greg Kroah-Hartman	aa601dde64	Merge `c9d35ee049` ("Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs") into android-mainline Tiny steps to deal with merge issues in sdcardfs due to fs param passing api changes. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I03ba8763e8cc324c25fb6316c363b59957103474	2020-02-10 08:39:09 -08:00
Al Viro	58c025f0e8	cgroup1: switch to use of errorfc() et.al. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:43 -05:00
Al Viro	d7167b1499	fs_parse: fold fs_parameter_desc/fs_parameter_spec The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:37 -05:00
Eric Sandeen	96cafb9ccb	fs_parser: remove fs_parameter_description name field Unused now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:36 -05:00
Al Viro	fbc2d1686d	get rid of cg_invalf() pointless alias for invalf()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-02-07 14:48:31 -05:00
Dmitry Torokhov	a88f616760	CHROMIUM: cgroups: relax permissions on moving tasks between cgroups Android expects system_server to be able to move tasks between different cgroups/cpusets, but does not want to be running as root. Let's relax permission check so that processes can move other tasks if they have CAP_SYS_NICE in the affected task's user namespace. BUG=b:31790445,chromium:647994 Bug: 147109865 TEST=Boot android container, examine logcat Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/394927 Reviewed-by: Ricky Zhou <rickyz@chromium.org> [AmitP: Refactored original changes to align with upstream commit `201af4c0fa` ("cgroup: move cgroup files under kernel/cgroup/")] Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b Signed-off-by: Amit Pundir <amit.pundir@linaro.org> (cherry picked from commit ec54762b84a1d06de188bc846655305d3f7acf75)	2020-01-07 01:56:09 +00:00
Michal Koutný	9a3284fad4	cgroup: Optimize single thread migration There are reports of users who use thread migrations between cgroups and they report performance drop after `d59cfc09c3` ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"). The effect is pronounced on machines with more CPUs. The migration is affected by forking noise happening in the background, after the mentioned commit a migrating thread must wait for all (forking) processes on the system, not only of its threadgroup. There are several places that need to synchronize with migration: a) do_exit, b) de_thread, c) copy_process, d) cgroup_update_dfl_csses, e) parallel migration (cgroup_{proc,thread}s_write). In the case of self-migrating thread, we relax the synchronization on cgroup_threadgroup_rwsem to avoid the cost of waiting. d) and e) are excluded with cgroup_mutex, c) does not matter in case of single thread migration and the executing thread cannot exec(2) or exit(2) while it is writing into cgroup.threads. In case of do_exit because of signal delivery, we either exit before the migration or finish the migration (of not yet PF_EXITING thread) and die afterwards. This patch handles only the case of self-migration by writing "0" into cgroup.threads. For simplicity, we always take cgroup_threadgroup_rwsem with numeric PIDs. This change improves migration dependent workload performance similar to per-signal_struct state. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-10-07 07:11:53 -07:00
Marc Koderer	653a23ca7e	Use kvmalloc in cgroups-v1 Instead of using its own logic for k-/vmalloc rely on kvmalloc which is actually doing quite the same. Signed-off-by: Marc Koderer <marc@koderer.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-08-07 11:37:58 -07:00
Thomas Gleixner	457c899653	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:45 +02:00
Roman Gushchin	aade7f9efb	cgroup: implement __cgroup_task_count() helper The helper is identical to the existing cgroup_task_count() except it doesn't take the css_set_lock by itself, assuming that the caller does. Also, move cgroup_task_count() implementation into kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com	2019-04-19 11:26:48 -07:00
David Howells	06a2ae56b5	vfs: Add some logging to the core users of the fs_context log Add some logging to the core users of the fs_context log so that information can be extracted from them as to the reason for failure. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:38 -05:00
Al Viro	cca8f32714	cgroup: store a reference to cgroup_ns into cgroup_fs_context ... and trim cgroup_do_mount() arguments (renaming it to cgroup_do_get_tree()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:34 -05:00
Al Viro	6678889f07	cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:33 -05:00
Al Viro	71d883c37e	cgroup_do_mount(): massage calling conventions pass it fs_context instead of fs_type/flags/root triple, have it return int instead of dentry and make it deal with setting fc->root. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:33 -05:00
Al Viro	cf6299b1d0	cgroup: stash cgroup_root reference into cgroup_fs_context Note that this reference is NOT contributing to refcount of cgroup_root in question and is valid only until cgroup_do_mount() returns. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:32 -05:00
Al Viro	8d2451f499	cgroup1: switch to option-by-option parsing [dhowells should be the author - it's carved out of his patch] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:31 -05:00
Al Viro	f5dfb5315d	cgroup: take options parsing into ->parse_monolithic() Store the results in cgroup_fs_context. There's a nasty twist caused by the enabling/disabling subsystems - we can't do the checks sensitive to that until cgroup_mutex gets grabbed. Frankly, these checks are complete bullshit (e.g. all,none combination is accepted if all subsystems are disabled; so's cpusets,none and all,cpusets when cpusets is disabled, etc.), but touching that would be a userland-visible behaviour change ;-/ So we do parsing in ->parse_monolithic() and have the consistency checks done in check_cgroupfs_options(), with the latter called (on already parsed options) from cgroup1_get_tree() and cgroup1_reconfigure(). Freeing the strdup'ed strings is done from fs_context destructor, which somewhat simplifies the life for cgroup1_{get_tree,reconfigure}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:30 -05:00
Al Viro	7feeef5869	cgroup: fold cgroup1_mount() into cgroup1_get_tree() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:30 -05:00
Al Viro	90129625d9	cgroup: start switching to fs_context Unfortunately, cgroup is tangled into kernfs infrastructure. To avoid converting all kernfs-based filesystems at once, we need to untangle the remount part of things, instead of having it go through kernfs_sop_remount_fs(). Fortunately, it's not hard to do. This commit just gets cgroup/cgroup1 to use fs_context to deliver options on mount and remount paths. Parsing those is going to be done in the next commits; for now we do pretty much what legacy case does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-02-28 03:29:29 -05:00
Al Viro	35ac118424	cgroup: saner refcounting for cgroup_root * make the reference from superblock to cgroup_root counting - do cgroup_put() in cgroup_kill_sb() whether we'd done percpu_ref_kill() or not; matching grab is done when we allocate a new root. That gives the same refcounting rules for all callers of cgroup_do_mount() - a reference to cgroup_root has been grabbed by caller and it either is transferred to new superblock or dropped. * have cgroup_kill_sb() treat an already killed refcount as "just don't bother killing it, then". * after successful cgroup_do_mount() have cgroup1_mount() recheck if we'd raced with mount/umount from somebody else and cgroup_root got killed. In that case we drop the superblock and bugger off with -ERESTARTSYS, same as if we'd found it in the list already dying. * don't bother with delayed initialization of refcount - it's unreliable and not needed. No need to prevent attempts to bump the refcount if we find cgroup_root of another mount in progress - sget will reuse an existing superblock just fine and if the other sb manages to die before we get there, we'll catch that immediately after cgroup_do_mount(). * don't bother with kernfs_pin_sb() - no need for doing that either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-01-17 11:53:02 -05:00
Tejun Heo	3fc9c12d27	cgroup: Add named hierarchy disabling to cgroup_no_v1 boot param It can be useful to inhibit all cgroup1 hierarchies especially during transition and for debugging. cgroup_no_v1 can block hierarchies with controllers which leaves out the named hierarchies. Expand it to cover the named hierarchies so that "cgroup_no_v1=all,named" disables all cgroup1 hierarchies. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Marcin Pawlowski <mpawlowski@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-12-28 10:34:12 -08:00
Steven Rostedt (VMware)	e4f8d81c73	cgroup/tracing: Move taking of spin lock out of trace event handlers It is unwise to take spin locks from the handlers of trace events. Mainly, because they can introduce lockups, because it introduces locks in places that are normally not tested. Worse yet, because trace events are tucked away in the include/trace/events/ directory, locks that are taken there are forgotten about. As a general rule, I tell people never to take any locks in a trace event handler. Several cgroup trace event handlers call cgroup_path() which eventually takes the kernfs_rename_lock spinlock. This injects the spinlock in the code without people realizing it. It also can cause issues for the PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event handlers are called with preemption disabled. By moving the calculation of the cgroup_path() out of the trace event handlers and into a macro (surrounded by a trace_cgroup_##type##_enabled()), then we could place the cgroup_path into a string, and pass that to the trace event. Not only does this remove the taking of the spinlock out of the trace event handler, but it also means that the cgroup_path() only needs to be called once (it is currently called twice, once to get the length to reserver the buffer for, and once again to get the path itself. Now it only needs to be done once. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2018-07-11 10:48:47 -07:00
Kees Cook	42bc47b353	treewide: Use array_size() in vmalloc() The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| vmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| vmalloc( - sizeof(char) * COUNT + COUNT , ...) \| vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) \| vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) \| vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Kees Cook	6da2ec5605	treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) \| kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) \| kmalloc(sizeof(TYPE) * C2, ...) \| kmalloc(C1 * C2 * C3, ...) \| kmalloc(C1 * C2, ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Christoph Hellwig	3f3942aca6	proc: introduce proc_create_single{,_data} Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-16 07:23:35 +02:00
Prateek Sood	116d2f7496	cgroup: Fix deadlock in cpu hotplug path Deadlock during cgroup migration from cpu hotplug path when a task T is being moved from source to destination cgroup. kworker/0:0 cpuset_hotplug_workfn() cpuset_hotplug_update_tasks() hotplug_update_tasks_legacy() remove_tasks_in_empty_cpuset() cgroup_transfer_tasks() // stuck in iterator loop cgroup_migrate() cgroup_migrate_add_task() In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T. Task T will not migrate to destination cgroup. css_task_iter_start() will keep pointing to task T in loop waiting for task T cg_list node to be removed. Task T do_exit() exit_signals() // sets PF_EXITING exit_task_namespaces() switch_task_namespaces() free_nsproxy() put_mnt_ns() drop_collected_mounts() namespace_unlock() synchronize_rcu() _synchronize_rcu_expedited() schedule_work() // on cpu0 low priority worker pool wait_event() // waiting for work item to execute Task T inserted a work item in the worklist of cpu0 low priority worker pool. It is waiting for expedited grace period work item to execute. This work item will only be executed once kworker/0:0 complete execution of cpuset_hotplug_workfn(). kworker/0:0 ==> Task T ==>kworker/0:0 In case of PF_EXITING task being migrated from source to destination cgroup, migrate next available task in source cgroup. Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-12-19 05:38:47 -08:00
Waiman Long	e1cba4b85d	cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup A new mount option "cpuset_v2_mode" is added to the v1 cgroupfs filesystem to enable cpuset controller to use v2 behavior in a v1 cgroup. This mount option applies only to cpuset controller and have no effect on other controllers. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-08-18 08:24:21 -07:00
Tejun Heo	8cfd8147df	cgroup: implement cgroup v2 thread support This patch implements cgroup v2 thread support. The goal of the thread mode is supporting hierarchical accounting and control at thread granularity while staying inside the resource domain model which allows coordination across different resource controllers and handling of anonymous resource consumptions. A cgroup is always created as a domain and can be made threaded by writing to the "cgroup.type" file. When a cgroup becomes threaded, it becomes a member of a threaded subtree which is anchored at the closest ancestor which isn't threaded. The threads of the processes which are in a threaded subtree can be placed anywhere without being restricted by process granularity or no-internal-process constraint. Note that the threads aren't allowed to escape to a different threaded subtree. To be used inside a threaded subtree, a controller should explicitly support threaded mode and be able to handle internal competition in the way which is appropriate for the resource. The root of a threaded subtree, the nearest ancestor which isn't threaded, is called the threaded domain and serves as the resource domain for the whole subtree. This is the last cgroup where domain controllers are operational and where all the domain-level resource consumptions in the subtree are accounted. This allows threaded controllers to operate at thread granularity when requested while staying inside the scope of system-level resource distribution. As the root cgroup is exempt from the no-internal-process constraint, it can serve as both a threaded domain and a parent to normal cgroups, so, unlike non-root cgroups, the root cgroup can have both domain and threaded children. Internally, in a threaded subtree, each css_set has its ->dom_cset pointing to a matching css_set which belongs to the threaded domain. This ensures that thread root level cgroup_subsys_state for all threaded controllers are readily accessible for domain-level operations. This patch enables threaded mode for the pids and perf_events controllers. Neither has to worry about domain-level resource consumptions and it's enough to simply set the flag. For more details on the interface and behavior of the thread mode, please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added by this patch. v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create(). Spotted by Waiman. - Documentation updated as suggested by Waiman. - cgroup.type content slightly reformatted. - Mark the debug controller threaded. v4: - Updated to the general idea of marking specific cgroups domain/threaded as suggested by PeterZ. v3: - Dropped "join" and always make mixed children join the parent's threaded subtree. v2: - After discussions with Waiman, support for mixed thread mode is added. This should address the issue that Peter pointed out where any nesting should be avoided for thread subtrees while coexisting with other domain cgroups. - Enabling / disabling thread mode now piggy backs on the existing control mask update mechanism. - Bug fixes and cleanup. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>	2017-07-21 11:14:51 -04:00
Tejun Heo	bc2fb7ed08	cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS css_task_iter currently always walks all tasks. With the scheduled cgroup v2 thread support, the iterator would need to handle multiple types of iteration. As a preparation, add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS. If the flag is not specified, it walks all tasks as before. When asserted, the iterator only walks the group leaders. For now, the only user of the flag is cgroup v2 "cgroup.procs" file which no longer needs to skip non-leader tasks in cgroup_procs_next(). Note that cgroup v1 "cgroup.procs" can't use the group leader walk as v1 "cgroup.procs" doesn't mean "list all thread group leaders in the cgroup" but "list all thread group id's with any threads in the cgroup". While at it, update cgroup_procs_show() to use task_pid_vnr() instead of task_tgid_vnr(). As the iteration guarantees that the function only sees group leaders, this doesn't change the output and will allow sharing the function for thread iteration. Signed-off-by: Tejun Heo <tj@kernel.org>	2017-07-21 11:14:51 -04:00
Tejun Heo	715c809d9a	cgroup: reorganize cgroup.procs / task write path Currently, writes "cgroup.procs" and "cgroup.tasks" files are all handled by __cgroup_procs_write() on both v1 and v2. This patch reoragnizes the write path so that there are common helper functions that different write paths use. While this somewhat increases LOC, the different paths are no longer intertwined and each path has more flexibility to implement different behaviors which will be necessary for the planned v2 thread support. v3: - Restructured so that cgroup_procs_write_permission() takes @src_cgrp and @dst_cgrp. v2: - Rolled in Waiman's task reference count fix. - Updated on top of nsdelegate changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com>	2017-07-21 11:14:50 -04:00
Waiman Long	a28f8f5e99	cgroup: Move debug cgroup to its own file The debug cgroup currently resides within cgroup-v1.c and is enabled only for v1 cgroup. To enable the debug cgroup also for v2, it makes sense to put the code into its own file as it will no longer be v1 specific. There is no change to the debug cgroup specific code. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-06-14 16:01:21 -04:00
Waiman Long	73a7242a06	cgroup: Keep accurate count of tasks in each css_set The reference count in the css_set data structure was used as a proxy of the number of tasks attached to that css_set. However, that count is actually not an accurate measure especially with thread mode support. So a new variable nr_tasks is added to the css_set to keep track of the actual task count. This new variable is protected by the css_set_lock. Functions that require the actual task count are updated to use the new variable. tj: s/task_count/nr_tasks/ for consistency with cgroup_root->nr_cgrps. Refreshed on top of cgroup/for-v4.13 which dropped on css_set_populated() -> nr_tasks conversion. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-06-14 16:01:21 -04:00
Linus Torvalds	9410091dd5	Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing major. Two notable fixes are Li's second stab at fixing the long-standing race condition in the mount path and suppression of spurious warning from cgroup_get(). All other changes are trivial" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: mark cgroup_get() with __maybe_unused cgroup: avoid attaching a cgroup root to two different superblocks, take 2 cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() cgroup: move cgroup_subsys_state parent field for cache locality cpuset: Remove cpuset_update_active_cpus()'s parameter. cgroup: switch to BUG_ON() cgroup: drop duplicate header nsproxy.h kernel: convert css_set.refcount from atomic_t to refcount_t kernel: convert cgroup_namespace.count from atomic_t to refcount_t	2017-05-01 13:52:24 -07:00
Zefan Li	9732adc5d6	cgroup: avoid attaching a cgroup root to two different superblocks, take 2 Commit `bfb0b80db5` ("cgroup: avoid attaching a cgroup root to two different superblocks") is broken. Now we try to fix the race by delaying the initialization of cgroup root refcnt until a superblock has been allocated. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-04-28 18:04:54 -04:00
Tejun Heo	330c418638	Revert "cgroup: avoid attaching a cgroup root to two different superblocks" This reverts commit `bfb0b80db5`. Andrei reports CRIU test hangs with the patch applied. The bug fixed by the patch isn't too likely to trigger in actual uses. Revert the patch for now. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Link: http://lkml.kernel.org/r/20170414232737.GC20350@outlook.office365.com	2017-04-16 23:17:37 +09:00
Zefan Li	bfb0b80db5	cgroup: avoid attaching a cgroup root to two different superblocks Run this: touch file0 for ((; ;)) { mount -t cpuset xxx file0 } And this concurrently: touch file1 for ((; ;)) { mount -t cpuset xxx file1 } We'll trigger a warning like this: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0 percpu_ref_kill_and_confirm called more than once on css_release! CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 Call Trace: dump_stack+0x63/0x84 __warn+0xd1/0xf0 warn_slowpath_fmt+0x5f/0x80 percpu_ref_kill_and_confirm+0x92/0xb0 cgroup_kill_sb+0x95/0xb0 deactivate_locked_super+0x43/0x70 deactivate_super+0x46/0x60 ... ---[ end trace a79f61c2a2633700 ]--- Here's a race: Thread A Thread B cgroup1_mount() # alloc a new cgroup root cgroup_setup_root() cgroup1_mount() # no sb yet, returns NULL kernfs_pin_sb() # but succeeds in getting the refcnt, # so re-use cgroup root percpu_ref_tryget_live() # alloc sb with cgroup root cgroup_do_mount() cgroup_kill_sb() # alloc another sb with same root cgroup_do_mount() cgroup_kill_sb() We end up using the same cgroup root for two different superblocks, so percpu_ref_kill() will be called twice on the same root when the two superblocks are destroyed. We should fix to make sure the superblock pinning is really successful. Cc: stable@vger.kernel.org # 3.16+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-04-11 09:00:57 +09:00
Elena Reshetova	4b9502e63b	kernel: convert css_set.refcount from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-03-08 17:46:03 -05:00
Kees Cook	b6a6759daf	cgroups: censor kernel pointer in debug files As found in grsecurity, this avoids exposing a kernel pointer through the cgroup debug entries. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-03-06 15:16:03 -05:00
Ingo Molnar	50ff9d1300	sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h> It's not used by any of the scheduler methods, but <linux/sched/task_stack.h> needs it to pick up STACK_END_MAGIC. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-03 01:45:39 +01:00
Ingo Molnar	56cd697366	sched/headers: Move the task_lock()/unlock() APIs to <linux/sched/task.h> The task_lock()/task_unlock() APIs are not realated to core scheduling, they are task lifetime APIs, i.e. they belong into <linux/sched/task.h>. Move them. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-03 01:45:22 +01:00
Ingo Molnar	c3edc4010e	sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h> task_struct::signal and task_struct::sighand are pointers, which would normally make it straightforward to not define those types in sched.h. That is not so, because the types are accompanied by a myriad of APIs (macros and inline functions) that dereference them. Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>. With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore, trying to put accessors into sched.h as a test fails the following way: ./include/linux/sched.h: In function ‘test_signal_types’: ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’ ^ This reduces the size and complexity of sched.h significantly. Update all headers and .c code that relied on getting the signal handling functionality from <linux/sched.h> to include <linux/sched/signal.h>. The list of affected files in the preparatory patch was partly generated by grepping for the APIs, and partly by doing coverage build testing, both all[yes\|mod\|def\|no]config builds on 64-bit and 32-bit x86, and an array of cross-architecture builds. Nevertheless some (trivial) build breakage is still expected related to rare Kconfig combinations and in-flight patches to various kernel code, but most of it should be handled by this patch. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-03 01:43:37 +01:00

1 2

57 Commits