android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Tejun Heo	b1e3aeb11c	cpuset: Minor cgroup2 interface updates * Rename the partition file from "cpuset.sched.partition" to "cpuset.cpus.partition". * When writing to the partition file, drop "0" and "1" and only accept "member" and "root". Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com>	2018-11-13 12:09:48 -08:00
Lance Roy	04547728b7	locking/mutex: Replace spin_is_locked() with lockdep lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:06:22 -08:00
Paul E. McKenney	5f1a6ef374	rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs() Subtracting INT_MIN can be interpreted as unconditional signed integer overflow, which according to the C standard is undefined behavior. Therefore, kernel build arguments notwithstanding, it would be good to future-proof the code. This commit therefore substitutes INT_MAX for INT_MIN in order to avoid undefined behavior. While in the neighborhood, this commit also creates some meaningful names for INT_MAX and friends in order to improve readability, as suggested by Joel Fernandes. Reported-by: Ran Rozenstein <ranro@mellanox.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	117f683c6e	rcu: Replace this_cpu_ptr() with __this_cpu_read() Because __this_cpu_read() can be lighter weight than equivalent uses of this_cpu_ptr(), this commit replaces the latter with the former. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	05f415715c	rcu: Speed up expedited GPs when interrupting RCU reader In PREEMPT kernels, an expedited grace period might send an IPI to a CPU that is executing an RCU read-side critical section. In that case, it would be nice if the rcu_read_unlock() directly interacted with the RCU core code to immediately report the quiescent state. And this does happen in the case where the reader has been preempted. But it would also be a nice performance optimization if immediate reporting also happened in the preemption-free case. This commit therefore adds an ->exp_hint field to the task_struct structure's ->rcu_read_unlock_special field. The IPI handler sets this hint when it has interrupted an RCU read-side critical section, and this causes the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(), which, if preemption is enabled, reports the quiescent state immediately. If preemption is disabled, then the report is required to be deferred until preemption (or bottom halves or interrupts or whatever) is re-enabled. Because this is a hint, it does nothing for more complicated cases. For example, if the IPI interrupts an RCU reader, but interrupts are disabled across the rcu_read_unlock(), but another rcu_read_lock() is executed before interrupts are re-enabled, the hint will already have been cleared. If you do crazy things like this, reporting will be deferred until some later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	0a89e5a402	rcu: Trace end of grace period before end of grace period Currently, rcu_gp_cleanup() traces the end of the old grace period after the old grace period has officially ended. This might make intuitive sense, but it also makes for confusing event-trace output because the "end" trace displays not the old but instead the new grace-period number. This commit therefore traces the end of an old grace period just before that grace period officially ends. Reported-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Zhouyi Zhou	2320bda26d	rcu: Adjust the comment of function rcu_is_watching Because RCU avoids interrupting idle CPUs, rcu_is_watching() is used to test whether or not it is currently legal to run RCU read-side critical sections on this CPU. However, the first sentence and last sentences of current comment for rcu_is_watching have opposite meaning of what is expected. This commit therefore fixes this header comment. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	c669c014d1	rcu: Add jiffies-since-GP-activity to show_rcu_gp_kthreads() This commit adds a printout of the number of jiffies since the last time that the RCU grace-period kthread did any processing. This can be useful when tracking down forward-progress issues. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	691960197e	rcu: Add state name to show_rcu_gp_kthreads() output This commit adds the name of the RCU grace-period state to the show_rcu_gp_kthreads() output in order to ease debugging. This commit also moves gp_state_getname() up in the code so that show_rcu_gp_kthreads() can use it. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	791416c471	rcu: Parameterize rcu_check_gp_start_stall() In order to debug forward-progress stalls, it is necessary to check for excessively delayed grace-period starts. This is currently done for RCU CPU stall warnings by rcu_check_gp_start_stall(), which checks to see if the start of a requested grace period has been delayed by an RCU CPU stall warning period. Because rcutorture will need to check for the time consumed by an RCU forward-progress delay, this commit promotes gpssdelay from a local variable to a formal parameter. It is not necessary to export rcu_check_gp_start_stall() because rcutorture will access it via a wrapper function. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	b3c1d9ec7c	rcu: Avoid double multiply by HZ The rcu_check_gp_start_stall() function multiplies the return value from rcu_jiffies_till_stall_check() by HZ, but the units are already in jiffies. This commit therefore avoids the need for introduction of a jiffies-squared unit by removing the extraneous multiplication. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 09:03:59 -08:00
Paul E. McKenney	f0ad56e876	rcu: Eliminate BUG_ON() for kernel/rcu/update.c The update.c file has a number of calls to BUG_ON(), which panics the kernel, which is not a good strategy for devices (like embedded) that don't have a way to capture console output. This commit therefore converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 08:15:59 -08:00
Paul E. McKenney	9213784b48	rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h The tree_plugin.h file has a number of calls to BUG_ON(), which panics the kernel, which is not a good strategy for devices (like embedded) that don't have a way to capture console output. This commit therefore converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix typo: s/rcuo/rcub/. ]	2018-11-12 08:15:16 -08:00
Jan Kara	f905c2fc39	audit: Use 'mark' name for fsnotify_mark variables Variables pointing to fsnotify_mark are sometimes called 'entry' and sometimes 'mark'. Use 'mark' in all places. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> [PM: minor merge fuzz due to updated patches previously in the series] Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:55:16 -05:00
Jan Kara	83d23bc8ae	audit: Replace chunk attached to mark instead of replacing mark Audit tree code currently associates new fsnotify mark with each new chunk. As chunk attached to an inode is replaced when new tag is added / removed, we also need to remove old fsnotify mark and add a new one on such occasion. This is cumbersome and makes locking rules somewhat difficult to follow. Fix these problems by allocating fsnotify mark independently of chunk and keeping it all the time while there is some chunk attached to an inode. Also add documentation about the locking rules so that things are easier to follow. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> [PM: minor merge fuzz due to updated patches previously in the series] Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:55:16 -05:00
Jan Kara	8432c70062	audit: Simplify locking around untag_chunk() untag_chunk() has to be called with hash_lock, it drops it and reacquires it when returning. The unlocking of hash_lock is thus hidden from the callers of untag_chunk() with is rather error prone. Reorganize the code so that untag_chunk() is called without hash_lock, only with mark reference preventing the chunk from going away. Since this requires some more code in the caller of untag_chunk() to assure forward progress, factor out loop pruning tree from all chunks into a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:56 -05:00
Jan Kara	c22fcde775	audit: Drop all unused chunk nodes during deletion When deleting chunk from a tree, drop all unused nodes in a chunk instead of just the one used by the tree. This gets rid of possibly lingering unused nodes (created due to fallback path in untag_chunk()) and also removes some special cases and will allow us to simplify locking in untag_chunk(). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:49 -05:00
Jan Kara	49a4ee7d98	audit: Guarantee forward progress of chunk untagging When removing chunk from a tree, we do shrink the chunk. This can fail for various reasons (due to races, ENOMEM, etc.) and in some cases we just bail from untag_chunk() relying on someone else to cleanup. Although this currently works, later we will need to add new failure situation which would break. Also this simplifies the code and will allow us to make locking around untag_chunk() less awkward. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:49 -05:00
Jan Kara	5f5161300d	audit: Allocate fsnotify mark independently of chunk Allocate fsnotify mark independently instead of embedding it inside chunk. This will allow us to just replace chunk attached to mark when growing / shrinking chunk instead of replacing mark attached to inode which is a more complex operation. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:49 -05:00
Jan Kara	a8375713fb	audit: Provide helper for dropping mark's chunk reference Provide a helper function audit_mark_put_chunk() for dropping mark's reference (which has to happen only after RCU grace period expires). Currently that happens only from a single place but in later patches we introduce more callers. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:49 -05:00
Jan Kara	8cd0feb523	audit: Remove pointless check in insert_hash() The audit_tree_group->mark_mutex is held all the time while we create the fsnotify mark, add it to the inode, and insert chunk into the hash. Hence mark cannot get detached during this time and so the check whether the mark is attached in insert_hash() is pointless. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	d31b326d3c	audit: Factor out chunk replacement code Chunk replacement code is very similar for the cases where we grow or shrink chunk. Factor the code out into a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	1635e57223	audit: Make hash table insertion safe against concurrent lookups Currently, the audit tree code does not make sure that when a chunk is inserted into the hash table, it is fully initialized. So in theory a user of RCU lookup could see uninitialized structure in the hash table and crash. Add appropriate barriers between initialization of the structure and its insertion into hash table. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	8d20d6e930	audit: Embed key into chunk Currently chunk hash key (which is in fact pointer to the inode) is derived as chunk->mark.conn->obj. It is tricky to make this dereference reliable for hash table lookups only under RCU as mark can get detached from the connector and connector gets freed independently of the running lookup. Thus there is a possible use after free / NULL ptr dereference issue: CPU1 CPU2 untag_chunk() ... audit_tree_lookup() list_for_each_entry_rcu(p, list, hash) { list_del_rcu(&chunk->hash); fsnotify_destroy_mark(entry); fsnotify_put_mark(entry) chunk_to_key(p) if (!chunk->mark.connector) ... hlist_del_init_rcu(&mark->obj_list); if (hlist_empty(&conn->list)) { inode = fsnotify_detach_connector_from_object(conn); mark->connector = NULL; ... frees connector from workqueue chunk->mark.connector->obj This race is probably impossible to hit in practice as the race window on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's better to have this fixed. Since the inode the chunk is attached to is constant during chunk's lifetime it is easy to cache the key in the chunk itself and thus avoid these issues. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	b1e4603b92	audit: Fix possible tagging failures Audit tree code is replacing marks attached to inodes in non-atomic way. Thus fsnotify_find_mark() in tag_chunk() may find a mark that belongs to a chunk that is no longer valid one and will soon be destroyed. Tags added to such chunk will be simply lost. Fix the problem by making sure old mark is marked as going away (through fsnotify_detach_mark()) before dropping mark_mutex and thus in an atomic way wrt tag_chunk(). Note that this does not fix the problem completely as if tag_chunk() finds a mark that is going away, it fails with -ENOENT. But at least the failure is not silent and currently there's no way to search for another fsnotify mark attached to the inode. We'll fix this problem in later patch. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	a5789b07b3	audit: Fix possible spurious -ENOSPC error When an inode is tagged with a tree, tag_chunk() checks whether there is audit_tree_group mark attached to the inode and adds one if not. However nothing protects another tag_chunk() to add the mark between we've checked and try to add the fsnotify mark thus resulting in an error from fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk(). Fix the problem by holding mark_mutex over the whole check-insert code sequence. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:48 -05:00
Jan Kara	9f16d2e624	audit_tree: Remove mark->lock locking Currently, audit_tree code uses mark->lock to protect against detaching of mark from an inode. In most places it however also uses mark->group->mark_mutex (as we need to atomically replace attached marks) and this provides protection against mark detaching as well. So just remove protection with mark->lock from audit tree code and replace it with mark->group->mark_mutex protection in all the places. It simplifies the code and gets rid of some ugly catches like calling fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only because we hold a reference to another mark attached to the same inode). Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2018-11-12 09:54:29 -05:00
Viresh Kumar	3e18450108	sched/core: Clean up the #ifdef block in add_nr_running() There is no point in keeping the conditional statement of the #if block outside of the #ifdef block, while all of its body is contained within the #ifdef block. Move the conditional statement under the #ifdef block as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/78cbd78a615d6f9fdcd3327f1ead68470f92593e.1541482935.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 11:18:06 +01:00
Muchun Song	ed8885a144	sched/fair: Make some variables static The variables are local to the source and do not need to be in global scope, so make them static. Signed-off-by: Muchun Song <smuchun@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181110075202.61172-1-smuchun@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 06:18:15 +01:00
Viresh Kumar	1da1843f9f	sched/core: Create task_has_idle_policy() helper We already have task_has_rt_policy() and task_has_dl_policy() helpers, create task_has_idle_policy() as well and update sched core to start using it. While at it, use task_has_dl_policy() at one more place. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/ce3915d5b490fc81af926a3b6bfb775e7188e005.1541416894.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 06:17:52 +01:00
Patrick Bellasi	b5c0ce7bd1	sched/fair: Add lsub_positive() and use it consistently The following pattern: var -= min_t(typeof(var), var, val); is used multiple times in fair.c. The existing sub_positive() already captures that pattern, but it also adds an explicit load-store to properly support lockless observations. In other cases the pattern above is used to update local, and/or not concurrently accessed, variables. Let's add a simpler version of sub_positive(), targeted at local variables updates, which gives the same readability benefits at calling sites, without enforcing {READ,WRITE}_ONCE() barriers. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/lkml/20181031184527.GA3178@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 06:17:52 +01:00
Patrick Bellasi	92a801e5d5	sched/fair: Mask UTIL_AVG_UNCHANGED usages The _task_util_est() is mainly used to add/remove the task contribution to/from the rq's estimated utilization at task enqueue/dequeue time. In both cases we ensure the UTIL_AVG_UNCHANGED flag is set to keep consistency between enqueue and dequeue time while still being transparent to update_load_avg calls which will eventually reset the flag. Let's move the flag forcing within _task_util_est() itself so that we can simplify calling code by hiding that estimated utilization implementation detail into one of its internal functions. This will affect also the "public" API task_util_est() but we know that the flag will (eventually) impact just on the LSB of the estimated utilization, thus it's certainly acceptable. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20181105145400.935-3-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 06:17:52 +01:00
Ingo Molnar	59e1678c29	Merge branch 'sched/urgent' into sched/core, to pick up dependent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 05:15:33 +01:00
Patrick Bellasi	c469933e77	sched/fair: Fix cpu_util_wake() for 'execl' type workloads A ~10% regression has been reported for UnixBench's execl throughput test by Aaron Lu and Ye Xiaolong: https://lkml.org/lkml/2018/10/30/765 That test is pretty simple, it does a "recursive" execve() syscall on the same binary. Starting from the syscall, this sequence is possible: do_execve() do_execveat_common() __do_execve_file() sched_exec() select_task_rq_fair() <==\| Task already enqueued find_idlest_cpu() find_idlest_group() capacity_spare_wake() <==\| Functions not called from cpu_util_wake() \| the wakeup path which means we can end up calling cpu_util_wake() not only from the "wakeup path", as its name would suggest. Indeed, the task doing an execve() syscall is already enqueued on the CPU we want to get the cpu_util_wake() for. The estimated utilization for a CPU computed in cpu_util_wake() was written under the assumption that function can be called only from the wakeup path. If instead the task is already enqueued, we end up with a utilization which does not remove the current task's contribution from the estimated utilization of the CPU. This will wrongly assume a reduced spare capacity on the current CPU and increase the chances to migrate the task on execve. The regression is tracked down to: commit `d519329f72` ("sched/fair: Update util_est only on util_avg updates") because in that patch we turn on by default the UTIL_EST sched feature. However, the real issue is introduced by: commit `f9be3e5961` ("sched/fair: Use util_est in LB and WU paths") Let's fix this by ensuring to always discount the task estimated utilization from the CPU's estimated utilization when the task is also the current one. The same benchmark of the bug report, executed on a dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine, reports these "Execl Throughput" figures (higher the better): mainline : 48136.5 lps mainline+fix : 55376.5 lps which correspond to a 15% speedup. Moreover, since {cpu_util,capacity_spare}_wake() are not really only used from the wakeup path, let's remove this ambiguity by using a better matching name: {cpu_util,capacity_spare}_without(). Since we are at that, let's also improve the existing documentation. Reported-by: Aaron Lu <aaron.lu@intel.com> Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Tested-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: `f9be3e5961` (sched/fair: Use util_est in LB and WU paths) Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/ Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-11-12 05:00:46 +01:00
Linus Torvalds	08b5278650	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Just the removal of a redundant call into the sched deadline overrun check" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Remove useless call to check_dl_overrun()	2018-11-11 16:37:41 -06:00
Linus Torvalds	024d4d4c0c	Merge branch 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two small scheduler fixes: - Take hotplug lock in sched_init_smp(). Technically not really required, but lockdep will complain other. - Trivial comment fix in sched/fair" * 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix a comment in task_numa_fault() sched/core: Take the hotplug lock in sched_init_smp()	2018-11-11 16:33:00 -06:00
Linus Torvalds	0b002cdd50	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A couple of fixlets for the core: - Kernel doc function documentation fixes - Missing prototypes for weak watchdog functions" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: resource/docs: Complete kernel-doc style function documentation watchdog/core: Add missing prototypes for weak functions resource/docs: Fix new kernel-doc warnings	2018-11-11 16:14:05 -06:00
Paul E. McKenney	9cac83a57e	rcu: Stop expedited grace periods from relying on stop-machine The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption to prevent the cpu_online_mask from changing. However, this relies on the stop-machine mechanism in the CPU-hotplug offline code, which is not desirable (it would be good to someday remove the stop-machine mechanism). This commit therefore instead uses the relevant leaf rcu_node structure's ->ffmask, which has a bit set for all CPUs that are fully functional. A given CPU's bit is cleared very early during offline processing by rcutree_offline_cpu() and set very late during online processing by rcutree_online_cpu(). Therefore, if a CPU's bit is set in this mask, and preemption is disabled, we have to be before the synchronize_sched() in the CPU-hotplug offline code, which means that the CPU is guaranteed to be workqueue-ready throughout the duration of the enclosing preempt_disable() region of code. This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for this leaf rcu_node structure are offline, which is an acceptable difference in behavior. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-11-11 11:23:01 -08:00
Andrey Ignatov	46f53a65d2	bpf: Allow narrow loads with offset > 0 Currently BPF verifier allows narrow loads for a context field only with offset zero. E.g. if there is a __u32 field then only the following loads are permitted: * off=0, size=1 (narrow); * off=0, size=2 (narrow); * off=0, size=4 (full). On the other hand LLVM can generate a load with offset different than zero that make sense from program logic point of view, but verifier doesn't accept it. E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code: #define DST_IP4 0xC0A801FEU /* 192.168.1.254 / ... if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) && where ctx is struct bpf_sock_addr. Some versions of LLVM can produce the following byte code for it: 8: 71 12 07 00 00 00 00 00 r2 = (u8 )(r1 + 7) 9: 67 02 00 00 18 00 00 00 r2 <<= 24 10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll 12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6> where `(u8 )(r1 + 7)` means narrow load for ctx->user_ip4 with size=1 and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently rejected by verifier. Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok() what means any is_valid_access implementation, that uses the function, works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or sock_addr_is_valid_access() for bpf_sock_addr. The patch makes such loads supported. Offset can be in [0; size_default) but has to be multiple of load size. E.g. for __u32 field the following loads are supported now: off=0, size=1 (narrow); * off=1, size=1 (narrow); * off=2, size=1 (narrow); * off=3, size=1 (narrow); * off=0, size=2 (narrow); * off=2, size=2 (narrow); * off=0, size=4 (full). Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 22:29:59 -08:00
Quentin Monnet	16a8cb5cff	bpf: do not pass netdev to translate() and prepare() offload callbacks The kernel functions to prepare verifier and translate for offloaded program retrieve "offload" from "prog", and "netdev" from "offload". Then both "prog" and "netdev" are passed to the callbacks. Simplify this by letting the drivers retrieve the net device themselves from the offload object attached to prog - if they need it at all. There is currently no need to pass the netdev as an argument to those functions. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	a40a26322a	bpf: pass prog instead of env to bpf_prog_offload_verifier_prep() Function bpf_prog_offload_verifier_prep(), called from the kernel BPF verifier to run a driver-specific callback for preparing for the verification step for offloaded programs, takes a pointer to a struct bpf_verifier_env object. However, no driver callback needs the whole structure at this time: the two drivers supporting this, nfp and netdevsim, only need a pointer to the struct bpf_prog instance held by env. Update the callback accordingly, on kernel side and in these two drivers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	eb9119471e	bpf: pass destroy() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to program destruction to the struct and remove the subcommand that was used to call them through the NDO. Remove function __bpf_offload_ndo(), which is no longer used. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	b07ade27e9	bpf: pass translate() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to code translation to the struct and remove the subcommand that was used to call them through the NDO. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	00db12c3d1	bpf: call verifier_prep from its callback in struct bpf_offload_dev In a way similar to the change previously brought to the verify_insn hook and to the finalize callback, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to prepare driver verifiers. Since the dev_ops pointer in struct bpf_prog_offload is no longer used by any callback, we can now remove it from struct bpf_prog_offload. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	6dc18fa6f4	bpf: call finalize() from its callback in struct bpf_offload_dev In a way similar to the change previously brought to the verify_insn hook, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to perform final verification steps for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	341b3e7b7b	bpf: call verify_insn from its callback in struct bpf_offload_dev We intend to remove the dev_ops in struct bpf_prog_offload, and to only keep the ops in struct bpf_offload_dev instead, which is accessible from more locations for passing function pointers. But dev_ops is used for calling the verify_insn hook. Switch to the newly added ops in struct bpf_prog_offload instead. To avoid table lookups for each eBPF instruction to verify, we remember the offdev attached to a netdev and modify bpf_offload_find_netdev() to avoid performing more than once a lookup for a given offload object. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	1385d755cf	bpf: pass a struct with offload callbacks to bpf_offload_dev_create() For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This commit effectively passes a pointer to the struct to bpf_offload_dev_create(). We temporarily have two struct bpf_prog_offload_ops instances, one under offdev->ops and one under offload->dev_ops. The next patches will make the transition towards the former, so that offload->dev_ops can be removed, and callbacks relying on ndo_bpf() added to offdev->ops as well. While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and similarly for netdevsim). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Linus Torvalds	1de4f2ef21	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "I believe all of these are simple obviously correct bug fixes. These fall into two groups: - Fixing the implementation of MNT_LOCKED which prevents lesser privileged users from seeing unders mounts created by more privileged users. - Fixing the extended uid and group mapping in user namespaces. As well as ensuring the code looks correct I have spot tested these changes as well and in my testing the fixes are working" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mount: Prevent MNT_DETACH from disconnecting locked mounts mount: Don't allow copying MNT_UNBINDABLE\|MNT_LOCKED mounts mount: Retest MNT_LOCKED in do_umount userns: also map extents in the reverse map to kernel IDs	2018-11-10 13:27:58 -06:00
Jiong Wang	e647815a4d	bpf: let verifier to calculate and record max_pkt_offset In check_packet_access, update max_pkt_offset after the offset has passed __check_packet_access. It should be safe to use u32 for max_pkt_offset as explained in code comment. Also, when there is tail call, the max_pkt_offset of the called program is unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such case. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:16:31 +01:00
Paul E. McKenney	0607ba8403	srcu: Prevent __call_srcu() counter wrap with read-side critical section Ever since `cdf7abc461` ("srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context"), it has been permissible to use SRCU read-side critical sections in interrupt context. This allows __call_srcu() to use SRCU read-side critical sections to prevent a new SRCU grace period from ending before the call to either srcu_funnel_gp_start() or srcu_funnel_exp_start completes, thus preventing SRCU grace-period counter overflow during that time. Note that this does not permit removal of the counter-wrap checks in srcu_gp_end(). These check are necessary to handle the case where a given CPU does not interact at all with SRCU for an extended time period. This commit therefore adds an SRCU read-side critical section to __call_srcu() in order to prevent grace period counter wrap during the funnel-locking process. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-11-08 21:54:14 -08:00

... 17 18 19 20 21 ...

29864 Commits