It is easy to miss already defined region types. Let's re-arrange
the definitions a bit and add more comments to make it hopefully
a bit clearer.
No functional change.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Add an enum_fmt format flag to specifically tag coded formats where
dynamic resolution switching is supported by the device.
This is useful for some codec drivers that can support dynamic
resolution switching for one or more of their listed coded formats. It
allows userspace to know whether it should extract the video parameters
itself, or if it can rely on the device to send V4L2_EVENT_SOURCE_CHANGE
when such changes are detected.
Signed-off-by: Maxime Jourdan <mjourdan@baylibre.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Reviewed-by: Alexandre Courbot <acourbot@chromium.org>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Add an enum_fmt format flag to specifically tag coded formats where
full bytestream parsing is supported by the device.
Some stateful decoders are capable of fully parsing a bytestream,
but others require that userspace pre-parses the bytestream into
frames or fields (see the corresponding pixelformat descriptions
for details).
If this flag is set, then this pre-parsing step is not required
(but still possible, of course).
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Reviewed-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Pull networking fixes from David Miller:
1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.
2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.
3) Fix severe performance regression due to lack of SKB coalescing of
fragments during local delivery, from Guillaume Nault.
4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.
5) Fix batched events in skbedit packet action, from Roman Mashak.
6) Propagate VLAN TX offload to hw_enc_features in bond and team
drivers, from Yue Haibing.
7) RXRPC local endpoint refcounting fix and read after free in
rxrpc_queue_local(), from David Howells.
8) Fix endian bug in ibmveth multicast list handling, from Thomas
Falcon.
9) Oops, make nlmsg_parse() wrap around the correct function,
__nlmsg_parse not __nla_parse(). Fix from David Ahern.
10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.
11) Fix memory leak in cxgb4, from Wenwen Wang.
12) Yet another race in AF_PACKET, from Eric Dumazet.
13) Fix false detection of retransmit failures in tipc, from Tuong
Lien.
14) Use after free in ravb_tstamp_skb, from Tho Vu.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
ravb: Fix use-after-free ravb_tstamp_skb
netfilter: nf_tables: map basechain priority to hardware priority
net: sched: use major priority number as hardware priority
wimax/i2400m: fix a memory leak bug
net: cavium: fix driver name
ibmvnic: Unmap DMA address of TX descriptor buffers after use
bnxt_en: Fix to include flow direction in L2 key
bnxt_en: Use correct src_fid to determine direction of the flow
bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
bnxt_en: Improve RX doorbell sequence.
bnxt_en: Fix VNIC clearing logic for 57500 chips.
net: kalmia: fix memory leaks
cx82310_eth: fix a memory leak bug
bnx2x: Fix VF's VLAN reconfiguration in reload.
Bluetooth: Add debug setting for changing minimum encryption key size
tipc: fix false detection of retransmit failures
lan78xx: Fix memory leaks
MAINTAINERS: r8169: Update path to the driver
MAINTAINERS: PHY LIBRARY: Update files in the record
...
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from sk_clone_lock.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit adds support for a new flag called need_wakeup in the
AF_XDP Tx and fill rings. When this flag is set, it means that the
application has to explicitly wake up the kernel Rx (for the bit in
the fill ring) or kernel Tx (for bit in the Tx ring) processing by
issuing a syscall. Poll() can wake up both depending on the flags
submitted and sendto() will wake up tx processing only.
The main reason for introducing this new flag is to be able to
efficiently support the case when application and driver is executing
on the same core. Previously, the driver was just busy-spinning on the
fill ring if it ran out of buffers in the HW and there were none on
the fill ring. This approach works when the application is running on
another core as it can replenish the fill ring while the driver is
busy-spinning. Though, this is a lousy approach if both of them are
running on the same core as the probability of the fill ring getting
more entries when the driver is busy-spinning is zero. With this new
feature the driver now sets the need_wakeup flag and returns to the
application. The application can then replenish the fill queue and
then explicitly wake up the Rx processing in the kernel using the
syscall poll(). For Tx, the flag is only set to one if the driver has
no outstanding Tx completion interrupts. If it has some, the flag is
zero as it will be woken up by a completion interrupt anyway.
As a nice side effect, this new flag also improves the performance of
the case where application and driver are running on two different
cores as it reduces the number of syscalls to the kernel. The kernel
tells user space if it needs to be woken up by a syscall, and this
eliminates many of the syscalls.
This flag needs some simple driver support. If the driver does not
support this, the Rx flag is always zero and the Tx flag is always
one. This makes any application relying on this feature default to the
old behaviour of not requiring any syscalls in the Rx path and always
having to call sendto() in the Tx path.
For backwards compatibility reasons, this feature has to be explicitly
turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
that you always turn it on as it so far always have had a positive
performance impact.
The name and inspiration of the flag has been taken from io_uring by
Jens Axboe. Details about this feature in io_uring can be found in
http://kernel.dk/io_uring.pdf, section 8.3.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add the basic packet trap infrastructure that allows device drivers to
register their supported packet traps and trap groups with devlink.
Each driver is expected to provide basic information about each
supported trap, such as name and ID, but also the supported metadata
types that will accompany each packet trapped via the trap. The
currently supported metadata type is just the input port, but more will
be added in the future. For example, output port and traffic class.
Trap groups allow users to set the action of all member traps. In
addition, users can retrieve per-group statistics in case per-trap
statistics are too narrow. In the future, the trap group object can be
extended with more attributes, such as policer settings which will limit
the amount of traffic generated by member traps towards the CPU.
Beside registering their packet traps with devlink, drivers are also
expected to report trapped packets to devlink along with relevant
metadata. devlink will maintain packets and bytes statistics for each
packet trap and will potentially report the trapped packet with its
metadata to user space via drop monitor netlink channel.
The interface towards the drivers is simple and allows devlink to set
the action of the trap. Currently, only two actions are supported:
'trap' and 'drop'. When set to 'trap', the device is expected to provide
the sole copy of the packet to the driver which will pass it to devlink.
When set to 'drop', the device is expected to drop the packet and not
send a copy to the driver. In the future, more actions can be added,
such as 'mirror'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop monitor has start and stop commands, but so far these were only
used to start and stop monitoring of software drops.
Now that drop monitor can also monitor hardware drops, we should allow
the user to control these as well.
Do that by adding SW and HW flags to these commands. If no flag is
specified, then only start / stop monitoring software drops. This is
done in order to maintain backward-compatibility with existing user
space applications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In summary alert mode a notification is sent with a list of recent drop
reasons and a count of how many packets were dropped due to this reason.
To avoid expensive operations in the context in which packets are
dropped, each CPU holds an array whose number of entries is the maximum
number of drop reasons that can be encoded in the netlink notification.
Each entry stores the drop reason and a count. When a packet is dropped
the array is traversed and a new entry is created or the count of an
existing entry is incremented.
Later, in process context, the array is replaced with a newly allocated
copy and the old array is encoded in a netlink notification. To avoid
breaking user space, the notification includes the ancillary header,
which is 'struct net_dm_alert_msg' with number of entries set to '0'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to software drops, extend drop monitor to send
netlink events when packets are dropped by the underlying hardware.
The main difference is that instead of encoding the program counter (PC)
from which kfree_skb() was called in the netlink message, we encode the
hardware trap name. The two are mostly equivalent since they should both
help the user understand why the packet was dropped.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-08-14
this is a pull request for net-next/master consisting of 41 patches.
The first two patches are for the kvaser_pciefd driver: Christer Beskow
removes unnecessary code in the kvaser_pciefd_pwm_stop() function,
YueHaibing removes the unused including of <linux/version.h>.
In the next patch YueHaibing also removes the unused including of
<linux/version.h> in the f81601 driver.
In the ti_hecc driver the next 6 patches are by me and fix checkpatch
warnings. YueHaibing's patch removes an unused variable in the
ti_hecc_mailbox_read() function.
The next 6 patches all target the xilinx_can driver. Anssi Hannula's
patch fixes a chip start failure with an invalid bus. The patch by
Venkatesh Yadav Abbarapu skips an error message in case of a deferred
probe. The 3 patches by Appana Durga Kedareswara rao fix the RX and TX
path for CAN-FD frames. Srinivas Neeli's patch fixes the bit timing
calculations for CAN-FD.
The next 12 patches are by me and several checkpatch warnings in the
af_can, raw and bcm components.
Thomas Gleixner provides a patch for the bcm, which switches the timer
to HRTIMER_MODE_SOFT and removes the hrtimer_tasklet.
Then 6 more patches by me for the gw component, which fix checkpatch
warnings, followed by 2 patches by Oliver Hartkopp to add CAN-FD
support.
The vcan driver gets 3 patches by me, fixing checkpatch warnings.
And finally a patch by Andre Hartmann to fix typos in CAN's netlink
header.
====================
Support monitoring and detecting the SD-FEC error events
through IRQ and poll file operation.
The SD-FEC device can detect one-error or multi-error events.
An error triggers an interrupt which creates and run the ONE_SHOT
IRQ thread.
The ONE_SHOT IRQ thread detects type of error and pass that
information to the poll function.
The file_operation callback poll(), collects the events and
updates the statistics accordingly.
The function poll blocks() on waiting queue which can be
unblocked by ONE_SHOT IRQ handling thread.
Support SD-FEC interrupt set ioctl callback.
The SD-FEC can detect two type of errors: coding errors (ECC) and
a data interface errors (TLAST).
The errors are events which can trigger an IRQ if enabled.
The driver can monitor and detect these errors through IRQ.
Also the driver updates the statistical data.
Tested-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Signed-off-by: Derek Kiernan <derek.kiernan@xilinx.com>
Signed-off-by: Dragan Cvetic <dragan.cvetic@xilinx.com>
Link: https://lore.kernel.org/r/1564216438-322406-6-git-send-email-dragan.cvetic@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With softpin we allow the userspace to take control over the GPU virtual
address space. The new capability is relected by a bump of the minor DRM
version. There are a few restrictions for userspace to take into
account:
1. The kernel reserves a bit of the address space to implement zero page
faulting and mapping of the kernel internal ring buffer. Userspace can
query the kernel for the first usable GPU VM address via
ETNAVIV_PARAM_SOFTPIN_START_ADDR.
2. We only allow softpin on GPUs, which implement proper process
separation via PPAS. If softpin is not available the softpin start
address will be set to ~0.
3. Softpin is all or nothing. A submit using softpin must not use any
address fixups via relocs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
There are two netfilter userspace headers which contain deprecation
warnings. While these headers are not used within the kernel, they are
compiled stand-alone for header-testing.
Pablo informs me that userspace iptables still refer to these headers,
and the intention was to use xt_LOG.h instead and remove these, but
userspace was never updated.
Remove the warnings.
Fixes: 2a475c409f ("kbuild: remove all netfilter headers from header-test blacklist.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull rdma fixes from Doug Ledford:
"Fairly small pull request for -rc3. I'm out of town the rest of this
week, so I made sure to clean out as much as possible from patchworks
in enough time for 0-day to chew through it (Yay! for 0-day being back
online! :-)). Jason might send through any emergency stuff that could
pop up, otherwise I'm back next week.
The only real thing of note is the siw ABI change. Since we just
merged siw *this* release, there are no prior kernel releases to
maintain kernel ABI with. I told Bernard that if there is anything
else about the siw ABI he thinks he might want to change before it
goes set in stone, he should get it in ASAP. The siw module was around
for several years outside the kernel tree, and it had to be revamped
considerably for inclusion upstream, so we are making no attempts to
be backward compatible with the out of tree version. Once 5.3 is
actually released, we will have our baseline ABI to maintain.
Summary:
- Fix a memory registration release flow issue that was causing a
WARN_ON (mlx5)
- If the counters for a port aren't allocated, then we can't do
operations on the non-existent counters (core)
- Check the right variable for error code result (mlx5)
- Fix a use after free issue (mlx5)
- Fix an off by one memory leak (siw)
- Actually return an error code on error (core)
- Allow siw to be built on 32bit arches (siw, ABI change, but OK
since siw was just merged this merge window and there is no prior
released kernel to maintain compatibility with and we also updated
the rdma-core user space package to match)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Change CQ flags from 64->32 bits
RDMA/core: Fix error code in stat_get_doit_qp()
RDMA/siw: Fix a memory leak in siw_init_cpulist()
IB/mlx5: Fix use-after-free error while accessing ev_file pointer
IB/mlx5: Check the correct variable in error handling code
RDMA/counter: Prevent QP counter binding if counters unsupported
IB/mlx5: Fix implicit MR release flow
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.
2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.
3) More strict validation of IPVS sysctl values, from Junwei Hu.
4) Remove unnecessary spaces after on the right hand side of assignments,
from yangxingwu.
5) Add offload support for bitwise operation.
6) Extend the nft_offload_reg structure to store immediate date.
7) Collapse several ip_set header files into ip_set.h, from
Jeremy Sowden.
8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
from Jeremy Sowden.
9) Fix several sparse warnings due to missing prototypes, from
Valdis Kletnieks.
10) Use static lock initialiser to ensure connlabel spinlock is
initialized on boot time to fix sched/act_ct.c, patch
from Florian Westphal.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):
for (i = 1; i <= btf__get_nr_types(btf); i++) {
t = (struct btf_type *)btf__type_by_id(btf, i);
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
<<<<<<< HEAD
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && kind == BTF_KIND_DATASEC) {
=======
t->size = sizeof(int);
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
} else if (!has_datasec && btf_is_datasec(t)) {
>>>>>>> 72ef80b5ee
/* replace DATASEC with STRUCT */
Conflict is between the two commits 1d4126c4e1 ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc6853c ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:
[...]
if (!has_datasec && btf_is_var(t)) {
/* replace VAR with INT */
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
/*
* using size = 1 is the safest choice, 4 will be too
* big and cause kernel BTF validation failure if
* original variable took less than 4 bytes
*/
t->size = 1;
*(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
} else if (!has_datasec && btf_is_datasec(t)) {
/* replace DATASEC with STRUCT */
[...]
The main changes are:
1) Addition of core parts of compile once - run everywhere (co-re) effort,
that is, relocation of fields offsets in libbpf as well as exposure of
kernel's own BTF via sysfs and loading through libbpf, from Andrii.
More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
and http://vger.kernel.org/lpc-bpf2018.html#session-2
2) Enable passing input flags to the BPF flow dissector to customize parsing
and allowing it to stop early similar to the C based one, from Stanislav.
3) Add a BPF helper function that allows generating SYN cookies from XDP and
tc BPF, from Petar.
4) Add devmap hash-based map type for more flexibility in device lookup for
redirects, from Toke.
5) Improvements to XDP forwarding sample code now utilizing recently enabled
devmap lookups, from Jesper.
6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
and Takshak.
7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.
8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.
9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.
10) Add perf event output helper also for other skb-based program types, from Allan.
11) Fix a co-re related compilation error in selftests, from Yonghong.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
This patch changes the driver/user shared (mmapped) CQ notification
flags field from unsigned 64-bits size to unsigned 32-bits size. This
enables building siw on 32-bit architectures.
This patch changes the siw-abi, but as siw was only just merged in
this merge window cycle, there are no released kernels with the prior
abi. We are making no attempt to be binary compatible with siw user
space libraries prior to the merge of siw into the upstream kernel,
only moving forward with upstream kernels and upstream rdma-core
provided siw libraries are we guaranteeing compatibility.
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch fixes some documentation typos in struct can_bittiming_const.
Signed-off-by: Andre Hartmann <aha_1980@gmx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Introduce CAN FD support which needs an extension of the netlink API to
pass CAN FD type content to the kernel which has a different size to
Classic CAN. Additionally the struct canfd_frame has a new 'flags' element
that can now be modified with can-gw.
The new CGW_FLAGS_CAN_FD option flag defines whether the routing job
handles Classic CAN or CAN FD frames. This setting is very strict at
reception time and enables the new possibilities, e.g. CGW_FDMOD_* and
modifying the flags element of struct canfd_frame, only when
CGW_FLAGS_CAN_FD is set.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
To prepare the CAN FD support this patch implements the first adaptions in
data structures for CAN FD without changing the current functionality.
Additionally some code at the end of this patch is moved or indented to
simplify the review of the next implementation step.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add #defines only for the Data Link Feature and Physical Layer 16.0 GT/s
features as defined in PCIe spec r4.0, sec 7.7.4 for Data Link Feature and
sec 7.7.5 for Physical Layer 16.0 GT/s.
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
A number of netfilter header-files used declarations and definitions
from other headers without including them. Added include directives to
make those declarations and definitions available.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add SHA-512 support to fs-verity. This is primarily a demonstration of
the trivial changes needed to support a new hash algorithm in fs-verity;
most users will still use SHA-256, due to the smaller space required to
store the hashes. But some users may prefer SHA-512.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to make
sure they can be included from user-space.
Currently, scsi_bsg_fc.h, scsi_netlink.h, and scsi_netlink_fc.h are
excluded from the test coverage. To make them join the compile-test, we
need to fix the build errors attached below.
For a case like this, we decided to use __u{8,16,32,64} variable types in
this discussion:
https://lkml.org/lkml/2019/6/5/18
Build log:
CC usr/include/scsi/scsi_netlink_fc.h.s
CC usr/include/scsi/scsi_netlink.h.s
CC usr/include/scsi/scsi_bsg_fc.h.s
In file included from ./usr/include/scsi/scsi_netlink_fc.h:10:0,
from <command-line>:32:
./usr/include/scsi/scsi_netlink.h:29:2: error: unknown type name uint8_t
uint8_t version;
^~~~~~~
./usr/include/scsi/scsi_netlink.h:30:2: error: unknown type name uint8_t
uint8_t transport;
^~~~~~~
./usr/include/scsi/scsi_netlink.h:31:2: error: unknown type name uint16_t
uint16_t magic;
^~~~~~~~
./usr/include/scsi/scsi_netlink.h:32:2: error: unknown type name uint16_t
uint16_t msgtype;
^~~~~~~~
CC usr/include/rdma/vmw_pvrdma-abi.h.s
./usr/include/scsi/scsi_netlink.h:33:2: error: unknown type name uint16_t
uint16_t msglen;
^~~~~~~~
./usr/include/scsi/scsi_netlink.h:34:33: error: uint64_t undeclared here (not in a function); did you mean __uint128_t ?
} __attribute__((aligned(sizeof(uint64_t))));
^~~~~~~~
__uint128_t
./usr/include/scsi/scsi_netlink.h:78:2: error: expected specifier-qualifier-list before uint64_t
uint64_t vendor_id;
^~~~~~~~
In file included from <command-line>:32:0:
./usr/include/scsi/scsi_netlink_fc.h:46:2: error: expected specifier-qualifier-list before uint64_t
uint64_t seconds;
^~~~~~~~
make[2]: *** [scripts/Makefile.build;302: usr/include/scsi/scsi_netlink_fc.h.s] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from <command-line>:32:0:
./usr/include/scsi/scsi_netlink.h:29:2: error: unknown type name uint8_t
uint8_t version;
^~~~~~~
./usr/include/scsi/scsi_netlink.h:30:2: error: unknown type name uint8_t
uint8_t transport;
^~~~~~~
./usr/include/scsi/scsi_netlink.h:31:2: error: unknown type name uint16_t
uint16_t magic;
^~~~~~~~
./usr/include/scsi/scsi_netlink.h:32:2: error: unknown type name uint16_t
uint16_t msgtype;
^~~~~~~~
./usr/include/scsi/scsi_netlink.h:33:2: error: unknown type name uint16_t
uint16_t msglen;
^~~~~~~~
./usr/include/scsi/scsi_netlink.h:34:33: error: uint64_t undeclared here (not in a function); did you mean __uint128_t ?
} __attribute__((aligned(sizeof(uint64_t))));
^~~~~~~~
__uint128_t
./usr/include/scsi/scsi_netlink.h:78:2: error: expected specifier-qualifier-list before uint64_t
uint64_t vendor_id;
^~~~~~~~
make[2]: *** [scripts/Makefile.build;302: usr/include/scsi/scsi_netlink.h.s] Error 1
In file included from <command-line>:32:0:
./usr/include/scsi/scsi_bsg_fc.h:69:2: error: unknown type name uint8_t
uint8_t reserved;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:72:2: error: unknown type name uint8_t
uint8_t port_id[3];
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:90:2: error: unknown type name uint8_t
uint8_t reserved;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:93:2: error: unknown type name uint8_t
uint8_t port_id[3];
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:114:2: error: unknown type name uint8_t
uint8_t command_code;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:117:2: error: unknown type name uint8_t
uint8_t port_id[3];
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:154:2: error: unknown type name uint32_t
uint32_t status; /* See FC_CTELS_STATUS_xxx */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:158:3: error: unknown type name uint8_t
uint8_t action; /* fragment_id for CT REJECT */
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:159:3: error: unknown type name uint8_t
uint8_t reason_code;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:160:3: error: unknown type name uint8_t
uint8_t reason_explanation;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:161:3: error: unknown type name uint8_t
uint8_t vendor_unique;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:177:2: error: unknown type name uint8_t
uint8_t reserved;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:180:2: error: unknown type name uint8_t
uint8_t port_id[3];
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:185:2: error: unknown type name uint32_t
uint32_t preamble_word0; /* revision & IN_ID */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:186:2: error: unknown type name uint32_t
uint32_t preamble_word1; /* GS_Type, GS_SubType, Options, Rsvd */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:187:2: error: unknown type name uint32_t
uint32_t preamble_word2; /* Cmd Code, Max Size */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:207:2: error: unknown type name uint64_t
uint64_t vendor_id;
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:210:2: error: unknown type name uint32_t
uint32_t vendor_cmd[0];
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:217:2: error: unknown type name uint32_t
uint32_t vendor_rsp[0];
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:236:2: error: unknown type name uint8_t
uint8_t els_code;
^~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:254:2: error: unknown type name uint32_t
uint32_t preamble_word0; /* revision & IN_ID */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:255:2: error: unknown type name uint32_t
uint32_t preamble_word1; /* GS_Type, GS_SubType, Options, Rsvd */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:256:2: error: unknown type name uint32_t
uint32_t preamble_word2; /* Cmd Code, Max Size */
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:268:2: error: unknown type name uint32_t
uint32_t msgcode;
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:292:2: error: unknown type name uint32_t
uint32_t result;
^~~~~~~~
./usr/include/scsi/scsi_bsg_fc.h:295:2: error: unknown type name uint32_t
uint32_t reply_payload_rcv_len;
^~~~~~~~
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a root-only variant of the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl which
removes all users' claims of the key, not just the current user's claim.
I.e., it always removes the key itself, no matter how many users have
added it.
This is useful for forcing a directory to be locked, without having to
figure out which user ID(s) the key was added under. This is planned to
be used by a command like 'sudo fscrypt lock DIR --all-users' in the
fscrypt userspace tool (http://github.com/google/fscrypt).
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Allow the FS_IOC_ADD_ENCRYPTION_KEY and FS_IOC_REMOVE_ENCRYPTION_KEY
ioctls to be used by non-root users to add and remove encryption keys
from the filesystem-level crypto keyrings, subject to limitations.
Motivation: while privileged fscrypt key management is sufficient for
some users (e.g. Android and Chromium OS, where a privileged process
manages all keys), the old API by design also allows non-root users to
set up and use encrypted directories, and we don't want to regress on
that. Especially, we don't want to force users to continue using the
old API, running into the visibility mismatch between files and keyrings
and being unable to "lock" encrypted directories.
Intuitively, the ioctls have to be privileged since they manipulate
filesystem-level state. However, it's actually safe to make them
unprivileged if we very carefully enforce some specific limitations.
First, each key must be identified by a cryptographic hash so that a
user can't add the wrong key for another user's files. For v2
encryption policies, we use the key_identifier for this. v1 policies
don't have this, so managing keys for them remains privileged.
Second, each key a user adds is charged to their quota for the keyrings
service. Thus, a user can't exhaust memory by adding a huge number of
keys. By default each non-root user is allowed up to 200 keys; this can
be changed using the existing sysctl 'kernel.keys.maxkeys'.
Third, if multiple users add the same key, we keep track of those users
of the key (of which there remains a single copy), and won't really
remove the key, i.e. "lock" the encrypted files, until all those users
have removed it. This prevents denial of service attacks that would be
possible under simpler schemes, such allowing the first user who added a
key to remove it -- since that could be a malicious user who has
compromised the key. Of course, encryption keys should be kept secret,
but the idea is that using encryption should never be *less* secure than
not using encryption, even if your key was compromised.
We tolerate that a user will be unable to really remove a key, i.e.
unable to "lock" their encrypted files, if another user has added the
same key. But in a sense, this is actually a good thing because it will
avoid providing a false notion of security where a key appears to have
been removed when actually it's still in memory, available to any
attacker who compromises the operating system kernel.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a new fscrypt policy version, "v2". It has the following changes
from the original policy version, which we call "v1" (*):
- Master keys (the user-provided encryption keys) are only ever used as
input to HKDF-SHA512. This is more flexible and less error-prone, and
it avoids the quirks and limitations of the AES-128-ECB based KDF.
Three classes of cryptographically isolated subkeys are defined:
- Per-file keys, like used in v1 policies except for the new KDF.
- Per-mode keys. These implement the semantics of the DIRECT_KEY
flag, which for v1 policies made the master key be used directly.
These are also planned to be used for inline encryption when
support for it is added.
- Key identifiers (see below).
- Each master key is identified by a 16-byte master_key_identifier,
which is derived from the key itself using HKDF-SHA512. This prevents
users from associating the wrong key with an encrypted file or
directory. This was easily possible with v1 policies, which
identified the key by an arbitrary 8-byte master_key_descriptor.
- The key must be provided in the filesystem-level keyring, not in a
process-subscribed keyring.
The following UAPI additions are made:
- The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a
fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated
from fscrypt_policy/fscrypt_policy_v1 by the version code prefix.
- A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows
getting the v1 or v2 encryption policy of an encrypted file or
directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not
be used because it did not have a way for userspace to indicate which
policy structure is expected. The new ioctl includes a size field, so
it is extensible to future fscrypt policy versions.
- The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2
encryption policies. Such keys are kept logically separate from keys
for v1 encryption policies, and are identified by 'identifier' rather
than by 'descriptor'. The 'identifier' need not be provided when
adding a key, since the kernel will calculate it anyway.
This patch temporarily keeps adding/removing v2 policy keys behind the
same permission check done for adding/removing v1 policy keys:
capable(CAP_SYS_ADMIN). However, the next patch will carefully take
advantage of the cryptographically secure master_key_identifier to allow
non-root users to add/remove v2 policy keys, thus providing a full
replacement for v1 policies.
(*) Actually, in the API fscrypt_policy::version is 0 while on-disk
fscrypt_context::format is 1. But I believe it makes the most sense
to advance both to '2' to have them be in sync, and to consider the
numbering to start at 1 except for the API quirk.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a new fscrypt ioctl, FS_IOC_GET_ENCRYPTION_KEY_STATUS. Given a key
specified by 'struct fscrypt_key_specifier' (the same way a key is
specified for the other fscrypt key management ioctls), it returns
status information in a 'struct fscrypt_get_key_status_arg'.
The main motivation for this is that applications need to be able to
check whether an encrypted directory is "unlocked" or not, so that they
can add the key if it is not, and avoid adding the key (which may
involve prompting the user for a passphrase) if it already is.
It's possible to use some workarounds such as checking whether opening a
regular file fails with ENOKEY, or checking whether the filenames "look
like gibberish" or not. However, no workaround is usable in all cases.
Like the other key management ioctls, the keyrings syscalls may seem at
first to be a good fit for this. Unfortunately, they are not. Even if
we exposed the keyring ID of the ->s_master_keys keyring and gave
everyone Search permission on it (note: currently the keyrings
permission system would also allow everyone to "invalidate" the keyring
too), the fscrypt keys have an additional state that doesn't map cleanly
to the keyrings API: the secret can be removed, but we can be still
tracking the files that were using the key, and the removal can be
re-attempted or the secret added again.
After later patches, some applications will also need a way to determine
whether a key was added by the current user vs. by some other user.
Reserved fields are included in fscrypt_get_key_status_arg for this and
other future extensions.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl
removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY.
It wipes the secret key itself, then "locks" the encrypted files and
directories that had been unlocked using that key -- implemented by
evicting the relevant dentries and inodes from the VFS caches.
The problem this solves is that many fscrypt users want the ability to
remove encryption keys, causing the corresponding encrypted directories
to appear "locked" (presented in ciphertext form) again. Moreover,
users want removing an encryption key to *really* remove it, in the
sense that the removed keys cannot be recovered even if kernel memory is
compromised, e.g. by the exploit of a kernel security vulnerability or
by a physical attack. This is desirable after a user logs out of the
system, for example. In many cases users even already assume this to be
the case and are surprised to hear when it's not.
It is not sufficient to simply unlink the master key from the keyring
(or to revoke or invalidate it), since the actual encryption transform
objects are still pinned in memory by their inodes. Therefore, to
really remove a key we must also evict the relevant inodes.
Currently one workaround is to run 'sync && echo 2 >
/proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the
system rather than just the inodes associated with the key being
removed, causing severe performance problems. Moreover, it requires
root privileges, so regular users can't "lock" their encrypted files.
Another workaround, used in Chromium OS kernels, is to add a new
VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of
drop_caches that operates on a single super_block. It does:
shrink_dcache_sb(sb);
invalidate_inodes(sb, false);
But it's still a hack. Yet, the major users of filesystem encryption
want this feature badly enough that they are actually using these hacks.
To properly solve the problem, start maintaining a list of the inodes
which have been "unlocked" using each master key. Originally this
wasn't possible because the kernel didn't keep track of in-use master
keys at all. But, with the ->s_master_keys keyring it is now possible.
Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified
master key in ->s_master_keys, then wipes the secret key itself, which
prevents any additional inodes from being unlocked with the key. Then,
it syncs the filesystem and evicts the inodes in the key's list. The
normal inode eviction code will free and wipe the per-file keys (in
->i_crypt_info). Note that freeing ->i_crypt_info without evicting the
inodes was also considered, but would have been racy.
Some inodes may still be in use when a master key is removed, and we
can't simply revoke random file descriptors, mmap's, etc. Thus, the
ioctl simply skips in-use inodes, and returns -EBUSY to indicate that
some inodes weren't evicted. The master key *secret* is still removed,
but the fscrypt_master_key struct remains to keep track of the remaining
inodes. Userspace can then retry the ioctl to evict the remaining
inodes. Alternatively, if userspace adds the key again, the refreshed
secret will be associated with the existing list of inodes so they
remain correctly tracked for future key removals.
The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a
kernel compromise some portions of plaintext file contents may still be
recoverable from memory. This can be solved by enabling page poisoning
system-wide, which security conscious users may choose to do. But it's
very difficult to solve otherwise, e.g. note that plaintext file
contents may have been read in other places than pagecache pages.
Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is
initially restricted to privileged users only. This is sufficient for
some use cases, but not all. A later patch will relax this restriction,
but it will require introducing key hashes, among other changes.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".
Why we need this
~~~~~~~~~~~~~~~~
The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.
Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.
Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.
Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.
On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).
By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.
Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:
- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.
- Calculating a key-dependent cryptographic identifier and returning it
to userspace.
- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.
Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.
However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.
References:
[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Update fs/crypto/ to use the new names for the UAPI constants rather
than the old names, then make the old definitions conditional on
!__KERNEL__.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Prefix all filesystem encryption UAPI constants except the ioctl numbers
with "FSCRYPT_" rather than with "FS_". This namespaces the constants
more appropriately and makes it clear that they are related specifically
to the filesystem encryption feature, and to the 'fscrypt_*' structures.
With some of the old names like "FS_POLICY_FLAGS_VALID", it was not
immediately clear that the constant had anything to do with encryption.
This is also useful because we'll be adding more encryption-related
constants, e.g. for the policy version, and we'd otherwise have to
choose whether to use unclear names like FS_POLICY_V1 or inconsistent
names like FS_ENCRYPTION_POLICY_V1.
For source compatibility with existing userspace programs, keep the old
names defined as aliases to the new names.
Finally, as long as new names are being defined anyway, I skipped
defining new names for the fscrypt mode numbers that aren't actually
used: INVALID (0), AES_256_GCM (2), AES_256_CBC (3), SPECK128_256_XTS
(7), and SPECK128_256_CTS (8).
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
More fscrypt definitions are being added, and we shouldn't use a
disproportionate amount of space in <linux/fs.h> for fscrypt stuff.
So move the fscrypt definitions to a new header <linux/fscrypt.h>.
For source compatibility with existing userspace programs, <linux/fs.h>
still includes the new header.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
The midgard/bifrost GPUs need to allocate GPU heap memory which is
allocated on GPU page faults and not pinned in memory. The vendor driver
calls this functionality GROW_ON_GPF.
This implementation assumes that BOs allocated with the
PANFROST_BO_NOEXEC flag are never mmapped or exported. Both of those may
actually work, but I'm unsure if there's some interaction there. It
would cause the whole object to be pinned in memory which would defeat
the point of this.
On faults, we map in 2MB at a time in order to utilize huge pages (if
enabled). Currently, once we've mapped pages in, they are only unmapped
if the BO is freed. Once we add shrinker support, we can unmap pages
with the shrinker.
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808222200.13176-9-robh@kernel.org
Executable buffers have an alignment restriction that they can't cross
16MB boundary as the GPU program counter is 24-bits. This restriction is
currently not handled and we just get lucky. As current userspace
assumes all BOs are executable, that has to remain the default. So add a
new PANFROST_BO_NOEXEC flag to allow userspace to indicate which BOs are
not executable.
There is also a restriction that executable buffers cannot start or end
on a 4GB boundary. This is mostly avoided as there is only 4GB of space
currently and the beginning is already blocked out for NULL ptr
detection. Add support to handle this restriction fully regardless of
the current constraints.
For existing userspace, all created BOs remain executable, but the GPU
VA alignment will be increased to the size of the BO. This shouldn't
matter as there is plenty of GPU VA space.
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808222200.13176-6-robh@kernel.org