Commit Graph

53479 Commits

Author SHA1 Message Date
Chanwoo Choi
35872fdcbf extcon: Rename the extcon_set/get_state() to maintain the function naming pattern
This patch just renames the existing extcon_get/set_cable_state_()
as following because of maintaining the function naming pattern
like as extcon APIs for property.
- extcon_set_cable_state_() -> extcon_set_state()
- extcon_get_cable_state_() -> extcon_get_state()

But, this patch remains the old extcon_set/get_cable_state_() functions
to prevent the build break. After altering new APIs, remove the old APIs.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chris Zhong <zyw@rock-chips.com>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
2016-09-10 16:48:53 +05:30
Chanwoo Choi
ceaa98f442 extcon: Add the support for the capability of each property
This patch adds the support of the property capability setting. This function
decides the supported properties of each external connector on extcon provider
driver.

Ths list of new extcon APIs to get/set the capability of property as following:
- int extcon_get_property_capability(struct extcon_dev *edev,
					unsigned int id, unsigned int prop);
- int extcon_set_property_capability(struct extcon_dev *edev,
					unsigned int id, unsigned int prop);

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chris Zhong <zyw@rock-chips.com>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
2016-09-10 16:48:52 +05:30
Chanwoo Choi
067c1652e7 extcon: Add the support for extcon property according to extcon type
This patch support the extcon property for the external connector
because each external connector might have the property according to
the H/W design and the specific characteristics.

- EXTCON_PROP_USB_[property name]
- EXTCON_PROP_CHG_[property name]
- EXTCON_PROP_JACK_[property name]
- EXTCON_PROP_DISP_[property name]

Add the new extcon APIs to get/set the property value as following:
- int extcon_get_property(struct extcon_dev *edev, unsigned int id,
			unsigned int prop,
			union extcon_property_value *prop_val)
- int extcon_set_property(struct extcon_dev *edev, unsigned int id,
			unsigned int prop,
			union extcon_property_value prop_val)

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chris Zhong <zyw@rock-chips.com>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
2016-09-10 16:48:51 +05:30
Chanwoo Choi
505cf01f98 extcon: Add the extcon_type to gather each connector into five category
This patch adds the new extcon type to group the each connecotr
into following five category. This type would be used to handle
the connectors as a group unit instead of a connector unit.
- EXTCON_TYPE_USB  : USB connector
- EXTCON_TYPE_CHG  : Charger connector
- EXTCON_TYPE_JACK : Jack connector
- EXTCON_TYPE_DISP : Display connector
- EXTCON_TYPE_MISC : Miscellaneous connector

Also, each external connector is possible to belong to one more extcon type.
In caes of EXTCON_CHG_USB_SDP, it have the EXTCON_TYPE_CHG and EXTCON_TYPE_USB.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chris Zhong <zyw@rock-chips.com>
Tested-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
2016-09-10 16:48:51 +05:30
Chanwoo Choi
84c48dc559 extcon: Block the bit masking operation for cable state except for extcon core
This patch restrict the usage of extcon_update_state() in the extcon
core because the extcon_update_state() use the bit masking to change
the state of external connector. When this function is used in device drivers,
it may occur the probelm with the handling mistake of bit masking.

Also, this patch removes the extcon_get/set_state() functions because these
functions use the bit masking which is reluctant way. Instead, extcon
provides the extcon_set/get_cable_state_() functions.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2016-09-10 16:48:49 +05:30
Chanwoo Choi
cc60211237 extcon: adc-jack: Remove the usage of extcon_set_state()
This patch removes the usage of extcon_set_state() because it uses the bit
masking to change the state of external connectors. The extcon framework
should handle the state by extcon_set/get_cable_state_() with extcon id.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2016-09-10 16:48:46 +05:30
Eric Biggers
ba63f23d69 fscrypto: require write access to mount to set encryption policy
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-10 01:18:57 -04:00
Daniel Borkmann
f3694e0012 bpf: add BPF_CALL_x macros for declaring helpers
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann
f035a51536 bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Ard Biesheuvel
dce48e351c efi: Replace runtime services spinlock with semaphore
The purpose of the efi_runtime_lock is to prevent concurrent calls into
the firmware. There is no need to use spinlocks here, as long as we ensure
that runtime service invocations from an atomic context (i.e., EFI pstore)
cannot block.

So use a semaphore instead, and use down_trylock() in the nonblocking case.
We don't use a mutex here because the mutex_trylock() function must not
be called from interrupt context, whereas the down_trylock() can.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sylvain Chouleur <sylvain.chouleur@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:43 +01:00
Sylvain Chouleur
21b3ddd39f efi: Don't use spinlocks for efi vars
All efivars operations are protected by a spinlock which prevents
interruptions and preemption. This is too restricted, we just need a
lock preventing concurrency.
The idea is to use a semaphore of count 1 and to have two ways of
locking, depending on the context:
- In interrupt context, we call down_trylock(), if it fails we return
  an error
- In normal context, we call down_interruptible()

We don't use a mutex here because the mutex_trylock() function must not
be called from interrupt context, whereas the down_trylock() can.

Signed-off-by: Sylvain Chouleur <sylvain.chouleur@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sylvain Chouleur <sylvain.chouleur@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:42 +01:00
Sylvain Chouleur
217b27d467 efi: Use a file local lock for efivars
This patch replaces the spinlock in the efivars struct with a single lock
for the whole vars.c file.  The goal of this lock is to protect concurrent
calls to efi variable services, registering and unregistering. This allows
us to register new efivars operations without having in-progress call.

Signed-off-by: Sylvain Chouleur <sylvain.chouleur@intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sylvain Chouleur <sylvain.chouleur@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:41 +01:00
Matt Fleming
31ce8cc681 efi/runtime-map: Use efi.memmap directly instead of a copy
Now that efi.memmap is available all of the time there's no need to
allocate and build a separate copy of the EFI memory map.

Furthermore, efi.memmap contains boot services regions but only those
regions that have been reserved via efi_mem_reserve(). Using
efi.memmap allows us to pass boot services across kexec reboot so that
the ESRT and BGRT drivers will now work.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:36 +01:00
Matt Fleming
816e76129e efi: Allow drivers to reserve boot services forever
Today, it is not possible for drivers to reserve EFI boot services for
access after efi_free_boot_services() has been called on x86. For
ARM/arm64 it can be done simply by calling memblock_reserve().

Having this ability for all three architectures is desirable for a
couple of reasons,

  1) It saves drivers copying data out of those regions
  2) kexec reboot can now make use of things like ESRT

Instead of using the standard memblock_reserve() which is insufficient
to reserve the region on x86 (see efi_reserve_boot_services()), a new
API is introduced in this patch; efi_mem_reserve().

efi.memmap now always represents which EFI memory regions are
available. On x86 the EFI boot services regions that have not been
reserved via efi_mem_reserve() will be removed from efi.memmap during
efi_free_boot_services().

This has implications for kexec, since it is not possible for a newly
kexec'd kernel to access the same boot services regions that the
initial boot kernel had access to unless they are reserved by every
kexec kernel in the chain.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:08:34 +01:00
Matt Fleming
c45f4da33a efi: Add efi_memmap_install() for installing new EFI memory maps
While efi_memmap_init_{early,late}() exist for architecture code to
install memory maps from firmware data and for the virtual memory
regions respectively, drivers don't care which stage of the boot we're
at and just want to swap the existing memmap for a modified one.

efi_memmap_install() abstracts the details of how the new memory map
should be mapped and the existing one unmapped.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:07:47 +01:00
Matt Fleming
60863c0d1a efi: Split out EFI memory map functions into new file
Also move the functions from the EFI fake mem driver since future
patches will require access to the memmap insertion code even if
CONFIG_EFI_FAKE_MEM isn't enabled.

This will be useful when we need to build custom EFI memory maps to
allow drivers to mark regions as reserved.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:07:46 +01:00
Matt Fleming
dca0f971ea efi: Add efi_memmap_init_late() for permanent EFI memmap
Drivers need a way to access the EFI memory map at runtime. ARM and
arm64 currently provide this by remapping the EFI memory map into the
vmalloc space before setting up the EFI virtual mappings.

x86 does not provide this functionality which has resulted in the code
in efi_mem_desc_lookup() where it will manually map individual EFI
memmap entries if the memmap has already been torn down on x86,

  /*
   * If a driver calls this after efi_free_boot_services,
   * ->map will be NULL, and the target may also not be mapped.
   * So just always get our own virtual map on the CPU.
   *
   */
  md = early_memremap(p, sizeof (*md));

There isn't a good reason for not providing a permanent EFI memory map
for runtime queries, especially since the EFI regions are not mapped
into the standard kernel page tables.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:07:43 +01:00
Matt Fleming
9479c7cebf efi: Refactor efi_memmap_init_early() into arch-neutral code
Every EFI architecture apart from ia64 needs to setup the EFI memory
map at efi.memmap, and the code for doing that is essentially the same
across all implementations. Therefore, it makes sense to factor this
out into the common code under drivers/firmware/efi/.

The only slight variation is the data structure out of which we pull
the initial memory map information, such as physical address, memory
descriptor size and version, etc. We can address this by passing a
generic data structure (struct efi_memory_map_data) as the argument to
efi_memmap_init_early() which contains the minimum info required for
initialising the memory map.

In the process, this patch also fixes a few undesirable implementation
differences:

 - ARM and arm64 were failing to clear the EFI_MEMMAP bit when
   unmapping the early EFI memory map. EFI_MEMMAP indicates whether
   the EFI memory map is mapped (not the regions contained within) and
   can be traversed.  It's more correct to set the bit as soon as we
   memremap() the passed in EFI memmap.

 - Rename efi_unmmap_memmap() to efi_memmap_unmap() to adhere to the
   regular naming scheme.

This patch also uses a read-write mapping for the memory map instead
of the read-only mapping currently used on ARM and arm64. x86 needs
the ability to update the memory map in-place when assigning virtual
addresses to regions (efi_map_region()) and tagging regions when
reserving boot services (efi_reserve_boot_services()).

There's no way for the generic fake_mem code to know which mapping to
use without introducing some arch-specific constant/hook, so just use
read-write since read-only is of dubious value for the EFI memory map.

Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09 16:06:38 +01:00
Robert Jarzmik
283e4a8299 [media] media: platform: pxa_camera: make a standalone v4l2 device
This patch removes the soc_camera API dependency from pxa_camera.
In the current status :
 - all previously captures are working the same on pxa270
 - the s_crop() call was removed, judged not working
   (see what happens soc_camera_s_crop() when get_crop() == NULL)
 - if the pixel clock is provided by then sensor, ie. not MCLK, the dual
   stage change is not handled yet.
   => there is no in-tree user of this, so I'll let it that way

 - the MCLK is not yet finished, it's as in the legacy way,
   ie. activated at video device opening and closed at video device
   closing.
   In a subsequence patch pxa_camera_mclk_ops should be used, and
   platform data MCLK ignored. It will be the sensor's duty to request
   the clock and enable it, which will end in pxa_camera_mclk_ops.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-09-09 10:52:31 -03:00
Mark Rutland
48538b5863 drivers/perf: arm_pmu: expose a cpumask in sysfs
In systems with heterogeneous CPUs, there are multiple logical CPU PMUs,
each of which covers a subset of CPUs in the system. In some cases
userspace needs to know which CPUs a given logical PMU covers, so we'd
like to expose a cpumask under sysfs, similar to what is done for uncore
PMUs.

Unfortunately, prior to commit 00e727bb38 ("perf stat: Balance
opening and reading events"), perf stat only correctly handled a cpumask
holding a single CPU, and only when profiling in system-wide mode. In
other cases, the presence of a cpumask file could cause perf stat to
behave erratically.

Thus, exposing a cpumask file would break older perf binaries in cases
where they would otherwise work.

To avoid this issue while still providing userspace with the information
it needs, this patch exposes a differently-named file (cpus) under
sysfs. New tools can look for this and operate correctly, while older
tools will not be adversely affected by its presence.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09 14:51:51 +01:00
Mark Rutland
86cdd72af9 drivers/perf: arm_pmu: add common attr group fields
In preparation for adding common attribute groups, add an array of
attribute group pointers to arm_pmu, which will be used if the
backend hasn't already set pmu::attr_groups.

Subsequent patches will move backends over to using these, before adding
common fields.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09 14:51:51 +01:00
Dave Hansen
acd547b298 x86/pkeys: Default to a restrictive init PKRU
PKRU is the register that lets you disallow writes or all access to a given
protection key.

The XSAVE hardware defines an "init state" of 0 for PKRU: its most
permissive state, allowing access/writes to everything.  Since we start off
all new processes with the init state, we start all processes off with the
most permissive possible PKRU.

This is unfortunate.  If a thread is clone()'d [1] before a program has
time to set PKRU to a restrictive value, that thread will be able to write
to all data, no matter what pkey is set on it.  This weakens any integrity
guarantees that we want pkeys to provide.

To fix this, we define a very restrictive PKRU to override the
XSAVE-provided value when we create a new FPU context.  We choose a value
that only allows access to pkey 0, which is as restrictive as we can
practically make it.

This does not cause any practical problems with applications using
protection keys because we require them to specify initial permissions for
each key when it is allocated, which override the restrictive default.

In the end, this ensures that threads which do not know how to manage their
own pkey rights can not do damage to data which is pkey-protected.

I would have thought this was a pretty contrived scenario, except that I
heard a bug report from an MPX user who was creating threads in some very
early code before main().  It may be crazy, but folks evidently _do_ it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:28 +02:00
Dave Hansen
a60f7b69d9 generic syscalls: Wire up memory protection keys syscalls
These new syscalls are implemented as generic code, so enable them for
architectures like arm64 which use the generic syscall table.

According to Arnd:

  Even if the support is x86 specific for the forseeable future, it may be
  good to reserve the number just in case.  The other architecture specific
  syscall lists are usually left to the individual arch maintainers, most a
  lot of the newer architectures share this table.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: mgorman@techsingularity.net
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163018.505A6875@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:27 +02:00
Dave Hansen
e8c24d3a23 x86/pkeys: Allocation/free syscalls
This patch adds two new system calls:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);

These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allocator
that the kernel has for file descriptors.  The kernel tracks
which numbers are in use, and only allows operations on keys that
are valid.  A key which was not obtained by pkey_alloc() may not,
for instance, be passed to pkey_mprotect().

These system calls are also very important given the kernel's use
of pkeys to implement execute-only support.  These help ensure
that userspace can never assume that it has control of a key
unless it first asks the kernel.  The kernel does not promise to
preserve PKRU (right register) contents except for allocated
pkeys.

The 'init_access_rights' argument to pkey_alloc() specifies the
rights that will be established for the returned pkey.  For
instance:

	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);

will allocate 'pkey', but also sets the bits in PKRU[1] such that
writing to 'pkey' is already denied.

The kernel does not prevent pkey_free() from successfully freeing
in-use pkeys (those still assigned to a memory range by
pkey_mprotect()).  It would be expensive to implement the checks
for this, so we instead say, "Just don't do it" since sane
software will never do it anyway.

Any piece of userspace calling pkey_alloc() needs to be prepared
for it to fail.  Why?  pkey_alloc() returns the same error code
(ENOSPC) when there are no pkeys and when pkeys are unsupported.
They can be unsupported for a whole host of reasons, so apps must
be prepared for this.  Also, libraries or LD_PRELOADs might steal
keys before an application gets access to them.

This allocation mechanism could be implemented in userspace.
Even if we did it in userspace, we would still need additional
user/kernel interfaces to tell userspace which keys are being
used by the kernel internally (such as for execute-only
mappings).  Having the kernel provide this facility completely
removes the need for these additional interfaces, or having an
implementation of this in userspace at all.

Note that we have to make changes to all of the architectures
that do not use mman-common.h because we use the new
PKEY_DENY_ACCESS/WRITE macros in arch-independent code.

1. PKRU is the Protection Key Rights User register.  It is a
   usermode-accessible register that controls whether writes
   and/or access to each individual pkey is allowed or denied.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:27 +02:00
Dave Hansen
a8502b67d7 x86/pkeys: Make mprotect_key() mask off additional vm_flags
Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE.
Three of those bits: READ/WRITE/EXEC get translated directly in to
vma->vm_flags by calc_vm_prot_bits().  If a bit is unset in
mprotect()'s 'prot' argument then it must be cleared in vma->vm_flags
during the mprotect() call.

We do this clearing today by first calculating the VMA flags we
want set, then clearing the ones we do not want to inherit from
the original VMA:

	vm_flags = calc_vm_prot_bits(prot, key);
	...
	newflags = vm_flags;
	newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));

However, we *also* want to mask off the original VMA's vm_flags in
which we store the protection key.

To do that, this patch adds a new macro:

	ARCH_VM_PKEY_FLAGS

which allows the architecture to specify additional bits that it would
like cleared.  We use that to ensure that the VM_PKEY_BIT* bits get
cleared.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163013.E48D6981@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:26 +02:00
Dave Hansen
7d06d9c9bd mm: Implement new pkey_mprotect() system call
pkey_mprotect() is just like mprotect, except it also takes a
protection key as an argument.  On systems that do not support
protection keys, it still works, but requires that key=0.
Otherwise it does exactly what mprotect does.

I expect it to get used like this, if you want to guarantee that
any mapping you create can *never* be accessed without the right
protection keys set up.

	int real_prot = PROT_READ|PROT_WRITE;
	pkey = pkey_alloc(0, PKEY_DENY_ACCESS);
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);

This way, there is *no* window where the mapping is accessible
since it was always either PROT_NONE or had a protection key set
that denied all access.

We settled on 'unsigned long' for the type of the key here.  We
only need 4 bits on x86 today, but I figured that other
architectures might need some more space.

Semantically, we have a bit of a problem if we combine this
syscall with our previously-introduced execute-only support:
What do we do when we mix execute-only pkey use with
pkey_mprotect() use?  For instance:

	pkey_mprotect(ptr, PAGE_SIZE, PROT_WRITE, 6); // set pkey=6
	mprotect(ptr, PAGE_SIZE, PROT_EXEC);  // set pkey=X_ONLY_PKEY?
	mprotect(ptr, PAGE_SIZE, PROT_WRITE); // is pkey=6 again?

To solve that, we make the plain-mprotect()-initiated execute-only
support only apply to VMAs that have the default protection key (0)
set on them.

Proposed semantics:
1. protection key 0 is special and represents the default,
   "unassigned" protection key.  It is always allocated.
2. mprotect() never affects a mapping's pkey_mprotect()-assigned
   protection key. A protection key of 0 (even if set explicitly)
   represents an unassigned protection key.
   2a. mprotect(PROT_EXEC) on a mapping with an assigned protection
       key may or may not result in a mapping with execute-only
       properties.  pkey_mprotect() plus pkey_set() on all threads
       should be used to _guarantee_ execute-only semantics if this
       is not a strong enough semantic.
3. mprotect(PROT_EXEC) may result in an "execute-only" mapping. The
   kernel will internally attempt to allocate and dedicate a
   protection key for the purpose of execute-only mappings.  This
   may not be possible in cases where there are no free protection
   keys available.  It can also happen, of course, in situations
   where there is no hardware support for protection keys.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: arnd@arndb.de
Cc: linux-api@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: luto@kernel.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20160729163012.3DDD36C4@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 13:02:26 +02:00
Arend Van Spriel
634faf3686 brcmfmac: add support for bcm4339 chip with modalias sdio:c00v02D0d4339
The driver already supports the bcm4339 chipset but only for the variant
that shares the same modalias as the bcm4335, ie. sdio:c00v02D0d4335.
It turns out that there are also bcm4339 devices out there that have a
more distiguishable modalias sdio:c00v02D0d4339.

Reported-by: Steve deRosier <derosier@gmail.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:12:14 +03:00
Jakub Kicinski
3e9b3112ec add basic register-field manipulation macros
Common approach to accessing register fields is to define
structures or sets of macros containing mask and shift pair.
Operations on the register are then performed as follows:

 field = (reg >> shift) & mask;

 reg &= ~(mask << shift);
 reg |= (field & mask) << shift;

Defining shift and mask separately is tedious.  Ivo van Doorn
came up with an idea of computing them at compilation time
based on a single shifted mask (later refined by Felix) which
can be used like this:

 #define REG_FIELD 0x000ff000

 field = FIELD_GET(REG_FIELD, reg);

 reg &= ~REG_FIELD;
 reg |= FIELD_PREP(REG_FIELD, field);

FIELD_{GET,PREP} macros take care of finding out what the
appropriate shift is based on compilation time ffs operation.

GENMASK can be used to define registers (which is usually
less error-prone and easier to match with datasheets).

This approach is the most convenient I've seen so to limit code
multiplication let's move the macros to a global header file.
Attempts to use static inlines instead of macros failed due
to false positive triggering of BUILD_BUG_ON()s, especially with
GCC < 6.0.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:24 +03:00
Nicholas Piggin
b67067f117 kbuild: allow archs to select link dead code/data elimination
Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.

On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.

    text      data        bss        dec   filename
11169741   1180744    1923176	14273661   vmlinux
10445269   1004127    1919707	13369103   vmlinux.dce

~700K text, ~170K data, 6% removed from kernel image size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-09 10:47:00 +02:00
Bjorn Andersson
4b83c52a21 rpmsg: Allow callback to return errors
Some rpmsg backends support holding on to and redelivering messages upon
failed handling of them, so provide a way for the callback to report and
error and allow the backends to handle this.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:25 -07:00
Bjorn Andersson
e88dae5da4 rpmsg: Move virtio specifics from public header
Move virtio rpmsg implementation details from the public header file to
the virtio rpmsg implementation.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:25 -07:00
Bjorn Andersson
3bf950ff23 rpmsg: virtio: Hide vrp pointer from the public API
Create a container struct virtio_rpmsg_channel around the rpmsg_channel
to keep virtio backend information separate from the rpmsg and public
API. This makes the public structures independant of virtio.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:24 -07:00
Bjorn Andersson
fade037e0f rpmsg: Hide rpmsg indirection tables
Move the device and endpoint indirection tables to the rpmsg internal
header file, to hide them from the public API.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:24 -07:00
Bjorn Andersson
c9bd6f4220 rpmsg: Move endpoint related interface to rpmsg core
Move the rpmsg_send() and rpmsg_destroy_ept() interface to the rpmsg
core, so that we eventually can hide the rpmsg_endpoint ops from the
public API.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:22 -07:00
Bjorn Andersson
8a228ecfe0 rpmsg: Indirection table for rpmsg_endpoint operations
Add indirection table for rpmsg_endpoint related operations and move
virtio implementation behind this, this finishes of the decoupling of
the virtio implementation from the public API.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:21 -07:00
Bjorn Andersson
36b72c7dca rpmsg: Introduce indirection table for rpmsg_device operations
To allow for multiple backend implementations add an indireection table
for rpmsg_device related operations and move the virtio implementation
behind this table.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:20 -07:00
Bjorn Andersson
92e1de51bf rpmsg: Clean up rpmsg device vs channel naming
The rpmsg device representing struct is called rpmsg_channel and the
variable name used throughout is rpdev, with the communication happening
on endpoints it's clearer to just call this a "device" in a public API.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:20 -07:00
Bjorn Andersson
2b263d2408 rpmsg: Make rpmsg_create_ept() take channel_info struct
As we introduce support for additional rpmsg backends, some of these
only supports point-to-point "links" represented by a name. By making
rpmsg_create_ept() take a channel_info struct we allow for these
backends to either be passed a source address, a destination address or
a name identifier.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:19 -07:00
Bjorn Andersson
2a48d7322d rpmsg: rpmsg_send() operations takes rpmsg_endpoint
The rpmsg_send() operations has been taking a rpmsg_device, but this
forces users of secondary rpmsg_endpoints to use the rpmsg_sendto()
interface - by extracting source and destination from the given data
structures. If we instead pass the rpmsg_endpoint to these functions a
service can use rpmsg_sendto() to respond to messages, even on secondary
endpoints.

In addition this would allow us to support operations on multiple
channels in future backends that does not support off-channel
operations.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2016-09-08 22:15:19 -07:00
Guenter Roeck
f9f7bb3a0e hwmon: (core) Add basic pwm attribute support to new API
Add basic pwm attribute support (no auto attributes) to new API.

Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
8faee73f92 hwmon: (core) Add fan attribute support to new API
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
6bfcca44a6 hwmon: (core) Add energy and humidity attribute support to new API
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
b308f5c744 hwmon: (core) Add power attribute support to new API
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
9b26947ce5 hwmon: (core) Add current attribute support to new API
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
00d616cf87 hwmon: (core) Add voltage attribute support to new API
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Guenter Roeck
d560168b5d hwmon: (core) New hwmon registration API
Up to now, each hwmon driver has to implement its own sysfs attributes.
This requires a lot of template code, and distracts from the driver's core
function to read and write chip registers.

To be able to reduce driver complexity, move sensor attribute handling
and thermal zone registration into hwmon core. By using the new API,
driver code and data size is typically reduced by 20-70%, depending
on driver complexity and the number of sysfs attributes supported.

With this patch, the new API only supports thermal sensors. Support for
other sensor types will be added with subsequent patches.

Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-09-08 21:34:15 -07:00
Yaogong Wang
9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
Eric Garver
fe19c4f971 vlan: Check for vlan ethernet types for 8021.q or 802.1ad
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.

Patch based on one originally by Thomas F Herbert.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Bodong Wang
e7e31ca43d net/mlx5e: Move an_disable_cap bit to a new position
Previous an_disable_cap position bit31 is deprecated to be use in driver
with newer firmware.  New firmware will advertise the same capability
in bit29.

Old capability didn't allow setting more than one protocol for a
specific speed when autoneg is off, while newer firmware will allow
this and it is indicated in the new capability location.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Lorenzo Colitti
d545caca82 net: inet: diag: expose the socket mark to privileged processes.
This adds the capability for a process that has CAP_NET_ADMIN on
a socket to see the socket mark in socket dumps.

Commit a52e95abf7 ("net: diag: allow socket bytecode filters to
match socket marks") recently gave privileged processes the
ability to filter socket dumps based on mark. This patch is
complementary: it ensures that the mark is also passed to
userspace in the socket's netlink attributes.  It is useful for
tools like ss which display information about sockets.

Tested: https://android-review.googlesource.com/270210
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:13:09 -07:00