android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Dominik Brodowski	a87d35d87a	net: socket: add __sys_bind() helper; remove in-kernel call to syscall Using the net-internal helper __sys_bind() allows us to avoid the internal calls to the sys_bind() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:07 +02:00
Dominik Brodowski	9d6a15c3f2	net: socket: add __sys_socket() helper; remove in-kernel call to syscall Using the net-internal helper __sys_socket() allows us to avoid the internal calls to the sys_socket() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:06 +02:00
Dominik Brodowski	4541e80560	net: socket: add __sys_accept4() helper; remove in-kernel call to syscall Using the net-internal helper __sys_accept4() allows us to avoid the internal calls to the sys_accept4() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:06 +02:00
Dominik Brodowski	211b634b7f	net: socket: add __sys_sendto() helper; remove in-kernel call to syscall Using the net-internal helper __sys_sendto() allows us to avoid the internal calls to the sys_sendto() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:05 +02:00
Dominik Brodowski	7a09e1eb9c	net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall Using the net-internal helper __sys_recvfrom() allows us to avoid the internal calls to the sys_recvfrom() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:04 +02:00
Dominik Brodowski	2de0db992d	mm: use do_futex() instead of sys_futex() in mm_release() sys_futex() is a wrapper to do_futex() which does not modify any values here: - uaddr, val and val3 are kept the same - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE. Therefore, val2 is always 0. - as utime is set to NULL, *timeout is NULL This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:02 +02:00
Dominik Brodowski	d53238cd51	kernel: open-code sys_rt_sigpending() in sys_sigpending() A similar but not fully equivalent code path is already open-coded three times (in sys_rt_sigpending and in the two compat stubs), so do it a fourth time here. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:00 +02:00
Linus Torvalds	486adcea4a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...	2018-04-02 11:06:34 -07:00
Takashi Iwai	903d271a3f	Merge tag 'asoc-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v4.17 This is a very big release for ASoC. Not much change in the core but there s the transition of all the individual drivers over to components which is intended to support further core work. The goal is to make it easier to do further core work by removing the need to special case all the different driver classes in the core, many of the devices end up being used in multiple roles in modern systems. We also have quite a lot of new drivers added this month of all kinds, quite a few for simple devices but also some more advanced ones with more substantial code. - The biggest thing is the huge series from Morimoto-san which converted everything over to components. This is a huge change by code volume but was fairly mechanical - Many fixes for some of the Realtek based Baytrail systems covering both the CODECs and the CPUs, contributed by Hans de Goode. - Lots of cleanups for Samsung based Odroid systems from Sylwester Nawrocki. - The Freescale SSI driver also got a lot of cleanups from Nicolin Chen. - The Blackfin drivers have been removed as part of the removal of the architecture. - New drivers for AKM AK4458 and AK5558, several AMD based machines, several Intel based machines, Maxim MAX9759, Motorola CPCAP, Socionext Uniphier SoCs, and TI PCM1789 and TDA7419	2018-04-02 19:51:39 +02:00
Linus Torvalds	701f3b3149	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in the locking subsystem in this cycle were: - Add the Linux Kernel Memory Consistency Model (LKMM) subsystem, which is an an array of tools in tools/memory-model/ that formally describe the Linux memory coherency model (a.k.a. Documentation/memory-barriers.txt), and also produce 'litmus tests' in form of kernel code which can be directly executed and tested. Here's a high level background article about an earlier version of this work on LWN.net: https://lwn.net/Articles/718628/ The design principles: "There is reason to believe that Documentation/memory-barriers.txt could use some help, and a major purpose of this patch is to provide that help in the form of a design-time tool that can produce all valid executions of a small fragment of concurrent Linux-kernel code, which is called a "litmus test". This tool's functionality is roughly similar to a full state-space search. Please note that this is a design-time tool, not useful for regression testing. However, we hope that the underlying Linux-kernel memory model will be incorporated into other tools capable of analyzing large bodies of code for regression-testing purposes." [...] "A second tool is klitmus7, which converts litmus tests to loadable kernel modules for direct testing. As with herd7, the klitmus7 code is freely available from http://diy.inria.fr/sources/index.html (and via "git" at https://github.com/herd/herdtools7)" [...] Credits go to: "This patch was the result of a most excellent collaboration founded by Jade Alglave and also including Alan Stern, Andrea Parri, and Luc Maranget." ... and to the gents listed in the MAINTAINERS entry: LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) M: Alan Stern <stern@rowland.harvard.edu> M: Andrea Parri <parri.andrea@gmail.com> M: Will Deacon <will.deacon@arm.com> M: Peter Zijlstra <peterz@infradead.org> M: Boqun Feng <boqun.feng@gmail.com> M: Nicholas Piggin <npiggin@gmail.com> M: David Howells <dhowells@redhat.com> M: Jade Alglave <j.alglave@ucl.ac.uk> M: Luc Maranget <luc.maranget@inria.fr> M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> The LKMM project already found several bugs in Linux locking primitives and improved the understanding and the documentation of the Linux memory model all around. - Add KASAN instrumentation to atomic APIs (Dmitry Vyukov) - Add RWSEM API debugging and reorganize the lock debugging Kconfig (Waiman Long) - ... misc cleanups and other smaller changes" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) locking/Kconfig: Restructure the lock debugging menu locking/Kconfig: Add LOCK_DEBUGGING_SUPPORT to make it more readable locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches lockdep: Make the lock debug output more useful locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter() locking/atomic, asm-generic, x86: Add comments for atomic instrumentation locking/atomic, asm-generic: Add KASAN instrumentation to atomic operations locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h locking/atomic, asm-generic: Add asm-generic/atomic-instrumented.h locking/xchg/alpha: Remove superfluous memory barriers from the _local() variants tools/memory-model: Finish the removal of rb-dep, smp_read_barrier_depends(), and lockless_dereference() tools/memory-model: Add documentation of new litmus test tools/memory-model: Remove mention of docker/gentoo image locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more locking/lockdep: Show unadorned pointers mutex: Drop linkage.h from mutex.h tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference tools/memory-model: Convert underscores to hyphens tools/memory-model: Add a S lock-based external-view litmus test tools/memory-model: Add required herd7 version to README file ...	2018-04-02 10:27:16 -07:00
Linus Torvalds	8747a29173	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU subsystem changes in this cycle were: - Miscellaneous fixes, perhaps most notably removing obsolete code whose only purpose in life was to gather information for the now-removed RCU debugfs facility. Other notable changes include removing NO_HZ_FULL_ALL in favor of the nohz_full kernel boot parameter, minor optimizations for expedited grace periods, some added tracing, creating an RCU-specific workqueue using Tejun's new WQ_MEM_RECLAIM flag, and several cleanups to code and comments. - SRCU cleanups and optimizations. - Torture-test updates, perhaps most notably the adding of ARMv8 support, but also including numerous cleanups and usability fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) rcu: Create RCU-specific workqueues with rescuers torture: Provide more sensible nreader/nwriter defaults for rcuperf torture: Grace periods do not piggyback off of themselves torture: Adjust rcuperf trace processing to allow for workqueues torture: Default jitter off when running rcuperf torture: Specify qemu memory size with --memory argument rcutorture: Add basic ARM64 support to run scripts rcutorture: Update kvm.sh header comment rcutorture: Record which grace-period primitives are tested rcutorture: Re-enable testing of dynamic expediting rcutorture: Avoid fake-writer use of undefined primitives rcutorture: Abstract function and module names rcutorture: Replace multi-instance kzalloc() with kcalloc() rcu: Remove SRCU throttling srcu: Remove dead code in srcu_gp_end() srcu: Reduce scans of srcu_data in counter wrap check srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap srcu: Abstract function name rcu: Make expedited RCU CPU selection avoid unnecessary stores rcu: Trace expedited GP delays due to transitioning CPUs ...	2018-04-02 09:59:09 -07:00
Linus Torvalds	cc67ccecd3	Merge branch 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull header file cleanup from Ingo Molnar: "Reduce <linux/interrupt.h> dependencies: a single change that drops two #includes from this frequently used kernel header" * 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: headers: Drop two #included headers from <linux/interrupt.h>	2018-04-02 09:27:29 -07:00
Linus Torvalds	320b164abb	Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Cannonlake and Vega12 support are probably the two major things. This pull lacks nouveau, Ben had some unforseen leave and a few other blockers so we'll see how things look or maybe leave it for this merge window. core: - Device links to handle sound/gpu pm dependency - Color encoding/range properties - Plane clipping into plane check helper - Backlight helpers - DP TP4 + HBR3 helper support amdgpu: - Vega12 support - Enable DC by default on all supported GPUs - Powerplay restructuring and cleanup - DC bandwidth calc updates - DC backlight on pre-DCE11 - TTM backing store dropping support - SR-IOV fixes - Adding "wattman" like functionality - DC crc support - Improved DC dual-link handling amdkfd: - GPUVM support for dGPU - KFD events for dGPU - Enable PCIe atomics for dGPUs - HSA process eviction support - Live-lock fixes for process eviction - VM page table allocation fix for large-bar systems panel: - Raydium RM68200 - AUO G104SN02 V2 - KEO TX31D200VM0BAA - ARM Versatile panels i915: - Cannonlake support enabled - AUX-F port support added - Icelake base enabling until internal milestone of forcewake support - Query uAPI interface (used for GPU topology information currently) - Compressed framebuffer support for sprites - kmem cache shrinking when GPU is idle - Avoid boosting GPU when waited item is being processed already - Avoid retraining LSPCON link unnecessarily - Decrease request signaling latency - Deprecation of I915_SET_COLORKEY_NONE - Kerneldoc and compiler warning cleanup for upcoming CI enforcements - Full range ycbcr toggling - HDCP support i915/gvt: - Big refactor for shadow ppgtt - KBL context save/restore via LRI cmd (Weinan) - Properly unmap dma for guest page (Changbin) vmwgfx: - Lots of various improvements etnaviv: - Use the drm gpu scheduler - prep work for GC7000L support vc4: - fix alpha blending - Expose perf counters to userspace pl111: - Bandwidth checking/limiting - Versatile panel support sun4i: - A83T HDMI support - A80 support - YUV plane support - H3/H5 HDMI support omapdrm: - HPD support for DVI connector - remove lots of static variables msm: - DSI updates from 10nm / SDM845 - fix for race condition with a3xx/a4xx fence completion irq - some refactoring/prep work for eventual a6xx support (ie. when we have a userspace) - a5xx debugfs enhancements - some mdp5 fixes/cleanups to prepare for eventually merging writeback - support (ie. when we have a userspace) tegra: - mmap() fixes for fbdev devices - Overlay plane for hw cursor fix - dma-buf cache maintenance support mali-dp: - YUV->RGB conversion support rockchip: - rk3399/chromebook fixes and improvements rcar-du: - LVDS support move to drm bridge - DT bindings for R8A77995 - Driver/DT support for R8A77970 tilcdc: - DRM panel support" * tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits) drm/i915: Fix hibernation with ACPI S0 target state drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt drm/i915: Specify which engines to reset following semaphore/event lockups drm/i915/dp: Write to SET_POWER dpcd to enable MST hub. drm/amdkfd: Use ordered workqueue to restore processes drm/amdgpu: Fix acquiring VM on large-BAR systems drm/amd/pp: clean header file hwmgr.h drm/amd/pp: use mlck_table.count for array loop index limit drm: Fix uabi regression by allowing garbage mode->type from userspace drm/amdgpu: Add an ATPX quirk for hybrid laptop drm/amdgpu: fix spelling mistake: "asssert" -> "assert" drm/amd/pp: Add new asic support in pp_psm.c drm/amd/pp: Clean up powerplay code on Vega12 drm/amd/pp: Add smu irq handlers for legacy asics drm/amd/pp: Fix set wrong temperature range on smu7 drm/amdgpu: Don't change preferred domian when fallback GTT v5 drm/vmwgfx: Bump version patchlevel and date drm/vmwgfx: use monotonic event timestamps drm/vmwgfx: Unpin the screen object backup buffer when not used drm/vmwgfx: Stricter count of legacy surface device resources ...	2018-04-02 07:59:23 -07:00
Mark Brown	70a3550b8c	Merge remote-tracking branches 'spi/topic/atmel', 'spi/topic/bcm-qspi', 'spi/topic/bcm2835aux', 'spi/topic/dw' and 'spi/topic/gpio' into spi-next	2018-04-02 15:56:34 +01:00
Viresh Kumar	8ea229511e	thermal: Add cooling device's statistics in sysfs This extends the sysfs interface for thermal cooling devices and exposes some pretty useful statistics. These statistics have proven to be quite useful specially while doing benchmarks related to the task scheduler, where we want to make sure that nothing has disrupted the test, specially the cooling device which may have put constraints on the CPUs. The information exposed here tells us to what extent the CPUs were constrained by the thermal framework. The write-only "reset" file is used to reset the statistics. The read-only "time_in_state_ms" file shows the time (in msec) spent by the device in the respective cooling states, and it prints one line per cooling state. The read-only "total_trans" file shows single positive integer value showing the total number of cooling state transitions the device has gone through since the time the cooling device is registered or the time when statistics were reset last. The read-only "trans_table" file shows a two dimensional matrix, where an entry <i,j> (row i, column j) represents the number of transitions from State_i to State_j. This is how the directory structure looks like for a single cooling device: $ ls -R /sys/class/thermal/cooling_device0/ /sys/class/thermal/cooling_device0/: cur_state max_state power stats subsystem type uevent /sys/class/thermal/cooling_device0/power: autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status /sys/class/thermal/cooling_device0/stats: reset time_in_state_ms total_trans trans_table This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and ARM 64-bit Hisilicon hikey960 board running Android. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2018-04-02 21:49:01 +08:00
Luis Henriques	fb18a57568	ceph: quota: add initial infrastructure to support cephfs quotas This patch adds the infrastructure required to support cephfs quotas as it is currently implemented in the ceph fuse client. Cephfs quotas can be set on any directory, and can restrict the number of bytes or the number of files stored beneath that point in the directory hierarchy. Quotas are set using the extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', and can be removed by setting these attributes to '0'. Link: http://tracker.ceph.com/issues/22372 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 11:17:51 +02:00
Rafael J. Wysocki	103cf0e579	Merge branches 'pm-cpuidle' and 'pm-tools' * pm-cpuidle: cpuidle: poll_state: Avoid invoking local_clock() too often PM: cpuidle/suspend: Add s2idle usage and time state attributes cpuidle: Enable coupled cpuidle support on Exynos3250 platform cpuidle: poll_state: Add time limit to poll_idle() ARM: cpuidle: Drop memory allocation error message from arm_idle_init_cpu() * pm-tools: pm-graph: AnalyzeSuspend v5.0 pm-graph: AnalyzeBoot v2.2 pm-graph: config files and installer	2018-04-02 11:01:02 +02:00
Rafael J. Wysocki	a9f36db7ec	Merge branch 'pm-cpufreq' * pm-cpufreq: (38 commits) cpufreq: CPPC: Use transition_delay_us depending transition_latency cpufreq: tegra186: Don't validate the frequency table twice cpufreq: speedstep: Don't validate the frequency table twice cpufreq: sparc: Don't validate the frequency table twice cpufreq: sh: Don't validate the frequency table twice cpufreq: sfi: Don't validate the frequency table twice cpufreq: scpi: Don't validate the frequency table twice cpufreq: sc520: Don't validate the frequency table twice cpufreq: s3c24xx: Don't validate the frequency table twice cpufreq: qoirq: Don't validate the frequency table twice cpufreq: pxa: Don't validate the frequency table twice cpufreq: ppc_cbe: Don't validate the frequency table twice cpufreq: powernow: Don't validate the frequency table twice cpufreq: p4-clockmod: Don't validate the frequency table twice cpufreq: mediatek: Don't validate the frequency table twice cpufreq: longhaul: Don't validate the frequency table twice cpufreq: ia64-acpi: Don't validate the frequency table twice cpufreq: elanfreq: Don't validate the frequency table twice cpufreq: e_powersaver: Don't validate the frequency table twice cpufreq: cpufreq-dt: Don't validate the frequency table twice ...	2018-04-02 11:00:46 +02:00
Rafael J. Wysocki	e3a495c4ee	Merge branches 'pm-core', 'pm-sleep' and 'acpi-pm' * pm-core: driver core: Introduce device links reference counting PM / wakeirq: Add wakeup name to dedicated wake irqs * pm-sleep: PM / hibernate: Change message when writing to /sys/power/resume PM / hibernate: Make passing hibernate offsets more friendly PCMCIA / PM: Avoid noirq suspend aborts during suspend-to-idle * acpi-pm: ACPI / PM: Fix keyboard wakeup from suspend-to-idle on ASUS UX331UA ACPI / PM: Allow deeper wakeup power states with no _SxD nor _SxW ACPI / PM: Reduce LPI constraints logging noise ACPI / PM: Do not reconfigure GPEs for suspend-to-idle	2018-04-02 11:00:28 +02:00
Rafael J. Wysocki	0c9ed61bdd	Merge branches 'acpi-battery', 'acpi-doc' and 'acpi-pmic' * acpi-battery: Revert "ACPI: battery: Add the ThinkPad "Not Charging" quirk" ACPI: battery: do not export degraded capacity values over 100 ACPI: battery: make function __battery_hook_unregister() static ACPI: battery: Add the ThinkPad "Not Charging" quirk thinkpad_acpi: Add support for battery thresholds power: add to_power_supply macro to the API battery: Add the battery hooking API * acpi-doc: ACPI: sysfs: Update device object sysfs documentation * acpi-pmic: ACPI / PMIC: Replace license boilerplate with SPDX license identifier	2018-04-02 10:58:13 +02:00
Chengguang Xu	bb48bd4dc4	ceph: optimize memory usage In current code, regular file and directory use same struct ceph_file_info to store fs specific data so the struct has to include some fields which are only used for directory (e.g., readdir related info), when having plenty of regular files, it will lead to memory waste. This patch introduces dedicated ceph_dir_file_info cache for readdir related thins. So that regular file does not include those unused fields anymore. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 10:12:49 +02:00
Ilya Dryomov	08c1ac508b	libceph, ceph: move ceph_calc_file_object_mapping() to striper.c ceph_calc_file_object_mapping() has nothing to do with osdmaps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 10:12:43 +02:00
Ilya Dryomov	ed0811d2d2	libceph: striping framework implementation Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 10:12:42 +02:00
Ilya Dryomov	b9e281c2b3	libceph: introduce BVECS data type In preparation for rbd "fancy" striping, introduce ceph_bvec_iter for working with bio_vec array data buffers. The wrappers are trivial, but make it look similar to ceph_bio_iter. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 10:12:39 +02:00
Ilya Dryomov	5359a17d27	libceph, rbd: new bio handling code (aka don't clone bios) The reason we clone bios is to be able to give each object request (and consequently each ceph_osd_data/ceph_msg_data item) its own pointer to a (list of) bio(s). The messenger then initializes its cursor with cloned bio's ->bi_iter, so it knows where to start reading from/writing to. That's all the cloned bios are used for: to determine each object request's starting position in the provided data buffer. Introduce ceph_bio_iter to do exactly that -- store position within bio list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-04-02 10:12:38 +02:00
Ilya Dryomov	dccbf08005	libceph, ceph: change ceph_calc_file_object_mapping() signature - make it void - xlen (object extent length) out parameter should be u32 because only a single stripe unit is mapped at a time Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2018-04-02 10:12:38 +02:00
David S. Miller	c0b458a946	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-01 19:49:34 -04:00
Atul Gupta	e0be6bea25	ethtool: enable Inline TLS in HW Ethtool option enables TLS record offload on HW, user configures the feature for netdev capable of Inline TLS. This allows user to define custom sk_prot for Inline TLS sock Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-31 23:37:32 -04:00
David S. Miller	d4069fe6fc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-31 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add raw BPF tracepoint API in order to have a BPF program type that can access kernel internal arguments of the tracepoints in their raw form similar to kprobes based BPF programs. This infrastructure also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which returns an anon-inode backed fd for the tracepoint object that allows for automatic detach of the BPF program resp. unregistering of the tracepoint probe on fd release, from Alexei. 2) Add new BPF cgroup hooks at bind() and connect() entry in order to allow BPF programs to reject, inspect or modify user space passed struct sockaddr, and as well a hook at post bind time once the port has been allocated. They are used in FB's container management engine for implementing policy, replacing fragile LD_PRELOAD wrapper intercepting bind() and connect() calls that only works in limited scenarios like glibc based apps but not for other runtimes in containerized applications, from Andrey. 3) BPF_F_INGRESS flag support has been added to sockmap programs for their redirect helper call bringing it in line with cls_bpf based programs. Support is added for both variants of sockmap programs, meaning for tx ULP hooks as well as recv skb hooks, from John. 4) Various improvements on BPF side for the nfp driver, besides others this work adds BPF map update and delete helper call support from the datapath, JITing of 32 and 64 bit XADD instructions as well as offload support of bpf_get_prandom_u32() call. Initial implementation of nfp packet cache has been tackled that optimizes memory access (see merge commit for further details), from Jakub and Jiong. 5) Removal of struct bpf_verifier_env argument from the print_bpf_insn() API has been done in order to prepare to use print_bpf_insn() soon out of perf tool directly. This makes the print_bpf_insn() API more generic and pushes the env into private data. bpftool is adjusted as well with the print_bpf_insn() argument removal, from Jiri. 6) Couple of cleanups and prep work for the upcoming BTF (BPF Type Format). The latter will reuse the current BPF verifier log as well, thus bpf_verifier_log() is further generalized, from Martin. 7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read and write support has been added in similar fashion to existing IPv6 IPV6_TCLASS socket option we already have, from Nikita. 8) Fixes in recent sockmap scatterlist API usage, which did not use sg_init_table() for initialization thus triggering a BUG_ON() in scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and uses a small helper sg_init_marker() to properly handle the affected cases, from Prashant. 9) Let the BPF core follow IDR code convention and therefore use the idr_preload() and idr_preload_end() helpers, which would also help idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory pressure, from Shaohua. 10) Last but not least, a spelling fix in an error message for the BPF cookie UID helper under BPF sample code, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-31 23:33:04 -04:00
Eric Dumazet	bf66337140	inet: frags: get rid of ipfrag_skb_cb/FRAG_CB ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately this integer is currently in a different cache line than skb->next, meaning that we use two cache lines per skb when finding the insertion point. By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields in a single cache line and save precious memory bandwidth. Note that after the fast path added by Changli Gao in commit `d6bebca92c` ("fragment: add fast path for in-order fragments") this change wont help the fast path, since we still need to access prev->len (2nd cache line), but will show great benefits when slow path is entered, since we perform a linear scan of a potentially long list. Also, note that this potential long list is an attack vector, we might consider also using an rb-tree there eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-31 23:25:40 -04:00
Eric Dumazet	e5d672a078	rhashtable: reorganize struct rhashtable layout While under frags DDOS I noticed unfortunate false sharing between @nelems and @params.automatic_shrinking Move @nelems at the end of struct rhashtable so that first cache line is shared between all cpus, because almost never dirtied. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-31 23:25:39 -04:00
Mathieu Malaterre	619e6f340c	PCI/IOV: Add missing prototypes for powerpc pcibios interfaces Add missing prototypes for: resource_size_t pcibios_default_alignment(void); int pcibios_sriov_enable(struct pci_dev pdev, u16 num_vfs); int pcibios_sriov_disable(struct pci_dev pdev); This fixes the following warnings treated as errors when using W=1: arch/powerpc/kernel/pci-common.c:236:17: error: no previous prototype for ‘pcibios_default_alignment’ [-Werror=missing-prototypes] arch/powerpc/kernel/pci-common.c:253:5: error: no previous prototype for ‘pcibios_sriov_enable’ [-Werror=missing-prototypes] arch/powerpc/kernel/pci-common.c:261:5: error: no previous prototype for ‘pcibios_sriov_disable’ [-Werror=missing-prototypes] Also, commit `978d2d6831` ("PCI: Add pcibios_iov_resource_alignment() interface") added a new function but the prototype was located in the main header instead of the CONFIG_PCI_IOV specific section. Move this function next to the newly added ones. Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>	2018-03-31 15:32:46 -05:00
Ingo Molnar	169310f71f	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 07:30:17 +02:00
Linus Torvalds	a44406ec3d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix RCU locking in xfrm_local_error(), from Taehee Yoo. 2) Fix return value assignments and thus error checking in iwl_mvm_start_ap_ibss(), from Johannes Berg. 3) Don't count header length twice in vti4, from Stefano Brivio. 4) Fix deadlock in rt6_age_examine_exception, from Eric Dumazet. 5) Fix out-of-bounds access in nf_sk_lookup_slow{v4,v6}() from Subash Abhinov. 6) Check nladdr size in netlink_connect(), from Alexander Potapenko. 7) VF representor SQ numbers are 32 not 16 bits, in mlx5 driver, from Or Gerlitz. 8) Out of bounds read in skb_network_protocol(), from Eric Dumazet. 9) r8169 driver sets driver data pointer after register_netdev() which is too late. Fix from Heiner Kallweit. 10) Fix memory leak in mlx4 driver, from Moshe Shemesh. 11) The multi-VLAN decap fix added a regression when dealing with device that lack a MAC header, such as tun. Fix from Toshiaki Makita. 12) Fix integer overflow in dynamic interrupt coalescing code. From Tal Gilboa. 13) Use after free in vrf code, from David Ahern. 14) IPV6 route leak between VRFs fix, also from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) net: mvneta: fix enable of all initialized RXQs net/ipv6: Fix route leaking between VRFs vrf: Fix use after free and double free in vrf_finish_output ipv6: sr: fix seg6 encap performances with TSO enabled net/dim: Fix int overflow vlan: Fix vlan insertion for packets without ethernet header net: Fix untag for vlan packets without ethernet header atm: iphase: fix spelling mistake: "Receiverd" -> "Received" vhost: validate log when IOTLB is enabled qede: Do not drop rx-checksum invalidated packets. hv_netvsc: enable multicast if necessary ip_tunnel: Resolve ipsec merge conflict properly. lan78xx: Crash in lan78xx_writ_reg (Workqueue: events lan78xx_deferred_multicast_write) qede: Fix barrier usage after tx doorbell write. vhost: correctly remove wait queue during poll failure net/mlx4_core: Fix memory leak while delete slave's resources net/mlx4_en: Fix mixed PFC and Global pause user control requests net/smc: use announced length in sock_recvmsg() llc: properly handle dev_queue_xmit() return value strparser: Fix sign of err codes ...	2018-03-30 18:47:28 -10:00
Masahiro Yamada	a95b37e20d	kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h> Since commit `28128c61e0` ("kconfig.h: Include compiler types to avoid missed struct attributes"), <linux/kconfig.h> pulls in kernel-space headers to unrelated places. Commit `0f9da844d8` ("MIPS: boot: Define __ASSEMBLY__ for its.S build") suppress the build error by defining __ASSEMBLY__, but ITS (i.e. DTS) is not assembly, and should not include <linux/compiler_types.h> in the first place. Looking at arch/s390/tools/Makefile, host programs gen_facilities and gen_opcode_table now pull in <linux/compiler_types.h> as well. The motivation for that commit was to define necessary attributes before any struct is defined. Obviously, this happens only in C. It is enough to include <linux/compiler_types.h> only when compiling C files, and only when compiling kernel space. Move the include to c_flags. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2018-03-31 12:22:38 +09:00
Sargun Dhillon	df0ce17331	security: convert security hooks to use hlist This changes security_hook_heads to use hlist_heads instead of the circular doubly-linked list heads. This should cut down the size of the struct by about half. In addition, it allows mutation of the hooks at the tail of the callback list without having to modify the head. The longer-term purpose of this is to enable making the heads read only. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Reviewed-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <james.morris@microsoft.com>	2018-03-31 13:18:27 +11:00
Andrey Ignatov	aac3fc320d	bpf: Post-hooks for sys_bind "Post-hooks" are hooks that are called right before returning from sys_bind. At this time IP and port are already allocated and no further changes to `struct sock` can happen before returning from sys_bind but BPF program has a chance to inspect the socket and change sys_bind result. Specifically it can e.g. inspect what port was allocated and if it doesn't satisfy some policy, BPF program can force sys_bind to fail and return EPERM to user. Another example of usage is recording the IP:port pair to some map to use it in later calls to sys_connect. E.g. if some TCP server inside cgroup was bound to some IP:port_n, it can be recorded to a map. And later when some TCP client inside same cgroup is trying to connect to 127.0.0.1:port_n, BPF hook for sys_connect can override the destination and connect application to IP:port_n instead of 127.0.0.1:port_n. That helps forcing all applications inside a cgroup to use desired IP and not break those applications if they e.g. use localhost to communicate between each other. == Implementation details == Post-hooks are implemented as two new attach types `BPF_CGROUP_INET4_POST_BIND` and `BPF_CGROUP_INET6_POST_BIND` for existing prog type `BPF_PROG_TYPE_CGROUP_SOCK`. Separate attach types for IPv4 and IPv6 are introduced to avoid access to IPv6 field in `struct sock` from `inet_bind()` and to IPv4 field from `inet6_bind()` since those fields might not make sense in such cases. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:16:26 +02:00
Andrey Ignatov	d74bad4e74	bpf: Hooks for sys_connect == The problem == See description of the problem in the initial patch of this patch set. == The solution == The patch provides much more reliable in-kernel solution for the 2nd part of the problem: making outgoing connecttion from desired IP. It adds new attach types `BPF_CGROUP_INET4_CONNECT` and `BPF_CGROUP_INET6_CONNECT` for program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both source and destination of a connection at connect(2) time. Local end of connection can be bound to desired IP using newly introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though, and doesn't support binding to port, i.e. leverages `IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this: * looking for a free port is expensive and can affect performance significantly; * there is no use-case for port. As for remote end (`struct sockaddr *` passed by user), both parts of it can be overridden, remote IP and remote port. It's useful if an application inside cgroup wants to connect to another application inside same cgroup or to itself, but knows nothing about IP assigned to the cgroup. Support is added for IPv4 and IPv6, for TCP and UDP. IPv4 and IPv6 have separate attach types for same reason as sys_bind hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. == Implementation notes == The patch introduces new field in `struct proto`: `pre_connect` that is a pointer to a function with same signature as `connect` but is called before it. The reason is in some cases BPF hooks should be called way before control is passed to `sk->sk_prot->connect`. Specifically `inet_dgram_connect` autobinds socket before calling `sk->sk_prot->connect` and there is no way to call `bpf_bind()` from hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since it'd cause double-bind. On the other hand `proto.pre_connect` provides a flexible way to add BPF hooks for connect only for necessary `proto` and call them at desired time before `connect`. Since `bpf_bind()` is allowed to bind only to IP and autobind in `inet_dgram_connect` binds only port there is no chance of double-bind. bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite of value of `bind_address_no_port` socket field. bpf_bind() sets `with_lock` to `false` when calling to __inet_bind() and __inet6_bind() since all call-sites, where bpf_bind() is called, already hold socket lock. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:15:54 +02:00
Andrey Ignatov	4fbac77d2d	bpf: Hooks for sys_bind == The problem == There is a use-case when all processes inside a cgroup should use one single IP address on a host that has multiple IP configured. Those processes should use the IP for both ingress and egress, for TCP and UDP traffic. So TCP/UDP servers should be bound to that IP to accept incoming connections on it, and TCP/UDP clients should make outgoing connections from that IP. It should not require changing application code since it's often not possible. Currently it's solved by intercepting glibc wrappers around syscalls such as `bind(2)` and `connect(2)`. It's done by a shared library that is preloaded for every process in a cgroup so that whenever TCP/UDP server calls `bind(2)`, the library replaces IP in sockaddr before passing arguments to syscall. When application calls `connect(2)` the library transparently binds the local end of connection to that IP (`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty). Shared library approach is fragile though, e.g.: * some applications clear env vars (incl. `LD_PRELOAD`); * `/etc/ld.so.preload` doesn't help since some applications are linked with option `-z nodefaultlib`; * other applications don't use glibc and there is nothing to intercept. == The solution == The patch provides much more reliable in-kernel solution for the 1st part of the problem: binding TCP/UDP servers on desired IP. It does not depend on application environment and implementation details (whether glibc is used or not). It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND` (similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`). The new program type is intended to be used with sockets (`struct sock`) in a cgroup and provided by user `struct sockaddr`. Pointers to both of them are parts of the context passed to programs of newly added types. The new attach types provides hooks in `bind(2)` system call for both IPv4 and IPv6 so that one can write a program to override IP addresses and ports user program tries to bind to and apply such a program for whole cgroup. == Implementation notes == [1] Separate attach types for `AF_INET` and `AF_INET6` are added intentionally to prevent reading/writing to offsets that don't make sense for corresponding socket family. E.g. if user passes `sockaddr_in` it doesn't make sense to read from / write to `user_ip6[]` context fields. [2] The write access to `struct bpf_sock_addr_kern` is implemented using special field as an additional "register". There are just two registers in `sock_addr_convert_ctx_access`: `src` with value to write and `dst` with pointer to context that can't be changed not to break later instructions. But the fields, allowed to write to, are not available directly and to access them address of corresponding pointer has to be loaded first. To get additional register the 1st not used by `src` and `dst` one is taken, its content is saved to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load address of pointer field, and finally the register's content is restored from the temporary field after writing `src` value. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:15:18 +02:00
Andrey Ignatov	5e43f899b0	bpf: Check attach type at prog load time == The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and Daniel Borkmann <daniel@iogearbox.net> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-31 02:14:44 +02:00
Tariq Toukan	619a8f2a42	net/mlx5e: Use linear SKB in Striding RQ Current Striding RQ HW feature utilizes the RX buffers so that there is no wasted room between the strides. This maximises the memory utilization. This prevents the use of build_skb() (which requires headroom and tailroom), and demands to memcpy the packets headers into the skb linear part. In this patch, whenever a set of conditions holds, we apply an RQ configuration that allows combining the use of linear SKB on top of a Striding RQ. To use build_skb() with Striding RQ, the following must hold: 1. packet does not cross a page boundary. 2. there is enough headroom and tailroom surrounding the packet. We can satisfy 1 and 2 by configuring: stride size = MTU + headroom + tailoom. This is possible only when: a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE. b. HW LRO is turned off. Using linear SKB has many advantages: - Saves a memcpy of the headers. - No page-boundary checks in datapath. - No filler CQEs. - Significantly smaller CQ. - SKB data continuously resides in linear part, and not split to small amount (linear part) and large amount (fragment). This saves datapath cycles in driver and improves utilization of SKB fragments in GRO. - The fragments of a resulting GRO SKB follow the IP forwarding assumption of equal-size fragments. Some implementation details: HW writes the packets to the beginning of a stride, i.e. does not keep headroom. To overcome this we make sure we can extend backwards and use the last bytes of stride i-1. Extra care is needed for stride 0 as it has no preceding stride. We make sure headroom bytes are available by shifting the buffer pointer passed to HW by headroom bytes. This configuration now becomes default, whenever capable. Of course, this implies turning LRO off. Performance testing: ConnectX-5, single core, single RX ring, default MTU. UDP packet rate, early drop in TC layer: -------------------------------------------- \| pkt size \| before \| after \| ratio \| -------------------------------------------- \| 1500byte \| 4.65 Mpps \| 5.96 Mpps \| 1.28x \| \| 500byte \| 5.23 Mpps \| 5.97 Mpps \| 1.14x \| \| 64byte \| 5.94 Mpps \| 5.96 Mpps \| 1.00x \| -------------------------------------------- TCP streams: ~20% gain Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-30 16:54:49 -07:00
Saeed Mahameed	b2d3907c23	net/mlx5: Eliminate query xsrq dead code 1. This function is not used anywhere in mlx5 driver 2. It has a memcpy statement that makes no sense and produces build warning with gcc8 drivers/net/ethernet/mellanox/mlx5/core/transobj.c: In function 'mlx5_core_query_xsrq': drivers/net/ethernet/mellanox/mlx5/core/transobj.c:347:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] Fixes: `01949d0109` ("net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-30 16:16:17 -07:00
Bjørn Mork	ad32eb2df8	PCI: Always define the of_node helpers Simply move these inline functions outside the ifdef instead of duplicating them as stubs in the !OF case. The struct device of_node field does not depend on OF. This also fixes the missing stubbed pci_bus_to_OF_node(). Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>	2018-03-30 17:45:57 -05:00
Bjorn Helgaas	842b447f00	PCI/portdrv: Encapsulate pcie_ports_auto inside the port driver "pcie_ports_auto" is only used inside the PCIe port driver itself, so move it from include/linux/pci.h to portdrv.h so it's not visible to the whole kernel. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2018-03-30 17:26:58 -05:00
Bjorn Helgaas	02bfeb4842	PCI/portdrv: Simplify PCIe feature permission checking Some PCIe features (AER, DPC, hotplug, PME) can be managed by either the platform firmware or the OS, so the host bridge driver may have to request permission from the platform before using them. On ACPI systems, this is done by negotiate_os_control() in acpi_pci_root_add(). The PCIe port driver later uses pcie_port_platform_notify() and pcie_port_acpi_setup() to figure out whether it can use these features. But all we need is a single bit for each service, so these interfaces are needlessly complicated. Simplify this by adding bits in the struct pci_host_bridge to show when the OS has permission to use each feature: + unsigned int native_aer:1; /* OS may use PCIe AER / + unsigned int native_hotplug:1; / OS may use PCIe hotplug / + unsigned int native_pme:1; / OS may use PCIe PME */ These are set when we create a host bridge, and the host bridge driver can clear the bits corresponding to any feature the platform doesn't want us to use. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-03-30 17:26:54 -05:00
Michael Ellerman	f437c51748	Merge branch 'topic/paca' into next Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.	2018-03-31 09:09:36 +11:00
Prashant Bhole	f385178679	lib/scatterlist: add sg_init_marker() helper sg_init_marker initializes sg_magic in the sg table and calls sg_mark_end() on the last entry of the table. This can be useful to avoid memset in sg_init_table() when scatterlist is already zeroed out For example: when scatterlist is embedded inside other struct and that container struct is zeroed out Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-30 22:50:15 +02:00
Dan Williams	f44c77630d	fs, dax: prepare for dax-specific address_space_operations In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Define some generic VFS aops helpers for dax. These noop implementations are there in the dax case to prevent the VFS from falling back to operations with page-cache assumptions, dax_writeback_mapping_range() may not be referenced in the FS_DAX=n case. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Matthew Wilcox <mawilcox@microsoft.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-30 11:34:55 -07:00
Andy Shevchenko	9def051018	crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array() Deduplicate le32_to_cpu_array() and cpu_to_le32_array() by moving them to the generic header. No functional change implied. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-31 01:33:09 +08:00
Tal Gilboa	f97c3dc3c0	net/dim: Fix int overflow When calculating difference between samples, the values are multiplied by 100. Large values may cause int overflow when multiplied (usually on first iteration). Fixed by forcing 100 to be of type unsigned long. Fixes: `4c4dbb4a73` ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-30 12:56:22 -04:00

... 5 6 7 8 9 ...

60631 Commits