mce: Uncorrected hardware memory error in user-access at af34214200
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
mce: [Hardware Error]: Machine check events logged
{1}[Hardware Error]: event severity: corrected
Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
[..]
Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
mce: Memory error not recovered
In contrast to typical memory, dev_pagemap pages may be dax mapped. With
dax there is no possibility to map in another page dynamically since dax
establishes 1:1 physical address to file offset associations. Also
dev_pagemap pages associated with NVDIMM / persistent memory devices can
internal remap/repair addresses with poison. While memory_failure()
assumes that it can discard typical poisoned pages and keep them
unmapped indefinitely, dev_pagemap pages may be returned to service
after the error is cleared.
Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST
dev_pagemap pages that have poison consumed by userspace. Mark the
memory as UC instead of unmapping it completely to allow ongoing access
via the device driver (nd_pmem). Later, nd_pmem will grow support for
marking the page back to WB when the error is cleared.
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
In preparation for implementing support for memory poison (media error)
handling via dax mappings, implement a lock_page() equivalent. Poison
error handling requires rmap and needs guarantees that the page->mapping
association is maintained / valid (inode not freed) for the duration of
the lookup.
In the device-dax case it is sufficient to simply hold a dev_pagemap
reference. In the filesystem-dax case we need to use the entry lock.
Export the entry lock via dax_lock_mapping_entry() that uses
rcu_read_lock() to protect against the inode being freed, and
revalidates the page->mapping association under xa_lock().
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.
the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.
br0 -- team0---eth0
|
-- switch-peerlink
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a backmerge to get DP_DPCD_REV_14 before we push other
i915 changes to dinq that could break compilation.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As Dave Chinner points out, we don't have a proper documentation for the
ktime_get() family of interfaces, making it rather unclear which of the
over 30 (!) interfaces one should actually use in a driver or elsewhere
in the kernel.
I wrote up an explanation from how I personally see the interfaces,
documenting what each of the functions do and hopefully making it a bit
clearer which should be used where.
This is the first time I tried writing .rst format documentation, so
in addition to any mistakes in the content, I probably also introduce
nonstandard formatting ;-)
I first tried to add an extra section to
Documentation/timers/timekeeping.txt, but this is currently not included
in the generated API, and it seems useful to have the API docs as part
of what gets generated in
https://www.kernel.org/doc/html/latest/core-api/index.html#core-utilities
instead, so I started a new file there.
I also considered adding the documentation inline in the
include/linux/timekeeping.h header, but couldn't figure out how to do
that in a way that would result both in helpful inline comments as
well as readable html output, so I settled for the latter, with
a small note pointing to it from the header.
Cc: Dave Chinner <david@fromorbit.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The fwnode graph API is preferred over the ACPI graph API. Therefore
make the ACPI graph API private, and use it as a back-end for the
fwnode graph API only.
Unused functionality is removed while the functionality actually used
remains the same.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Convert all users of struct acpi_reference_args to more generic
fwnode_reference_args. This will
1) avoid an ACPI specific references to device nodes with integer
arguments as well as
2) allow making references to nodes other than device nodes in ACPI.
As a by-product, convert the fwnode interger arguments to u64. The
arguments were 64-bit integers on ACPI but the fwnode arguments were
just 32-bit.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function compare_device_paths from wacom_sys.c is generic
and useful for other drivers. Move the function to hid-core and
rename it as hid_compare_device_paths.
Signed-off-by: Daniel M. Lambea <dmlambea@gmail.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Added some feature ids present in nvme-cli but not kernel.
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The Aspeed AST2x00 can contain a ColdFire v1 coprocessor which
is currently unused on OpenPower systems.
This adds an alternative to the fsi-master-gpio driver that
uses that coprocessor instead of bit banging from the ARM
core itself. The end result is about 4 times faster.
The firmware for the coprocessor and its source code can be
found at https://github.com/ozbenh/cf-fsi and is system specific.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are still quite a few cases where a device might want
to get to a different node of the device-tree, obtain the
resources and map them.
We have of_iomap() and of_io_request_and_map() but they both
have shortcomings, such as not returning the size of the
resource found (which can be useful) and not being "managed".
This adds a devm_of_iomap() that provides all of these and
should probably replace uses of the above in most drivers.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Pull vfs fixes from Al Viro:
"Fix several places that screw up cleanups after failures halfway
through opening a file (one open-coding filp_clone_open() and getting
it wrong, two misusing alloc_file()). That part is -stable fodder from
the 'work.open' branch.
And Christoph's regression fix for uapi breakage in aio series;
include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
definition of sigset_t, the reason for doing so in the first place had
been bogus - there's no need to expose struct __aio_sigset in
aio_abi.h at all"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: don't expose __aio_sigset in uapi
ocxlflash_getfile(): fix double-iput() on alloc_file() failures
cxl_getfile(): fix double-iput() on alloc_file() failures
drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)
[ Also, fix the prototype of kernel_wait4() to have that __user
annotation - Linus ]
Fixes: 92ebce5ac5 ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we have offload replay infrastructure added by
commit 326367427c ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6de ("nfp: reject binding to shared blocks").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Example setup:
host: ip -6 addr add dev eth1 2001:db8:104::4
where eth1 is enslaved to a VRF
switch: ip -6 ro add 2001:db8:104::4/128 dev br1
where br1 only has an LLA
ping6 2001:db8:104::4
ssh 2001:db8:104::4
(NOTE: UDP works fine if the PKTINFO has the address set to the global
address and ifindex is set to the index of eth1 with a destination an
LLA).
For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
L3 master. If it is then return the ifindex from rt6i_idev similar
to what is done for loopback.
For TCP, restore the original tcp_v6_iif definition which is needed in
most places and add a new tcp_v6_iif_l3_slave that considers the
l3_slave variability. This latter check is only needed for socket
lookups.
Fixes: 9ff7438460 ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
net/ipv4/bpfilter/sockopt.c:28:5: error: symbol 'bpfilter_ip_set_sockopt' redeclared with different type
net/ipv4/bpfilter/sockopt.c:34:5: error: symbol 'bpfilter_ip_get_sockopt' redeclared with different type
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.
The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reset controller changes for v4.19
This adds new drivers and bindings for the SDM845 AOSS (always on
subsystem) reset controller and for the Uniphier USB3 core reset.
SPI controller resets are added to the Uniphier reset driver.
* tag 'reset-for-4.19' of git://git.pengutronix.de/git/pza/linux:
reset: uniphier: add reset control support for SPI
reset: uniphier: add USB3 core reset control
dt-bindings: reset: uniphier: add USB3 core reset support
reset: simple: export reset_simple_ops to be referred from modules
reset: qcom: AOSS (always on subsystem) reset controller
dt-bindings: reset: Add AOSS reset bindings for SDM845 SoCs
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX device tree changes with clock dependency:
- Add clock for i.MX6UL GPIO blocks
* tag 'imx-dt-clkdep-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6ul: add GPIO clocks
clk: imx6ul: remove clks_init_on array
clk: imx6ul: add GPIO clock gates
dt-bindings: clock: imx6ul: Do not change the clock definition order
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX SoC update for 4.19:
- A series from Anson Huang to add power management for i.MX6SLL,
including standby and mem mode suspend, cpuidle support, and bus
clock auto gating function, etc.
- A couple of fix-ups on i.MX6SLL cpuidle random build issues.
- A couple of cleanups on stale EPIT timer initialization and RNGA
platform device registration function.
- Configure i.MX51 SoC M4IF to avoid visual artifacts during video
playback.
- Set up i.MX51 and i.MX53 DBGEN bit of ARM_GPC register, so that
clocks within the debug system can be activated.
- Add a Cortex-M4 platform support which will be useful for running
a Linux instance on Cortex-M4 core integrated in i.MX7D SoC.
- Flag of_iomap failure in imx_aips_allow_unprivileged_access()
function by giving a warning in there.
* tag 'imx-soc-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: mx5: Set the DBGEN bit in ARM_GPC register
ARM: imx51: Configure M4IF to avoid visual artifacts
ARM: imx: call imx6sx_cpuidle_init() conditionally for 6sll
ARM: imx: fix i.MX6SLL build
ARM: imx: flag failure of of_iomap
ARM: i.MX31: remove rnga registration as a platform device
ARM: imx: Provide support for NXP i.MX7D Cortex-M4
ARM: imx: enable bus auto clock gating function for i.mx6sll
ARM: imx: remove i.MX6SLL support in i.MX6SL cpu idle driver
ARM: imx: add cpu idle support for i.MX6SLL
ARM: imx: add L2 page power control for GPC
ARM: imx: add mem mode suspend for i.MX6SLL
ARM: imx: add standby mode suspend for i.MX6SLL
ARM: imx: remove inexistant EPIT timer init
Signed-off-by: Olof Johansson <olof@lixom.net>
AT91 SoC for 4.19:
- New low power mode for sama5d2: ULP1
* tag 'at91-ab-4.19-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
ARM: at91: pm: configure wakeup sources for ULP1 mode
ARM: at91: pm: add PMC fast startup registers defines
ARM: at91: pm: Add ULP1 mode support
ARM: at91: pm: Use ULP0 naming instead of slow clock
MAINTAINERS: Remove the AT91 clk driver entry
Signed-off-by: Olof Johansson <olof@lixom.net>
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.
We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.
Right now those functions are literally just wrappers around the
kmem_cache_*() calls. This is a purely mechanical conversion:
# new vma:
kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
# copy old vma
kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
# free vma
kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)
to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a dummy qcom_scm_assign_mem() to enable building drivers when
CONFIG_COMPILE_TEST=y && CONFIG_QCOM_SCM=n.
All other qcom_scm_* functions already have a dummy version.
Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Platform drivers need make a lot of resource state requests at the same
time, say, at the start or end of an usecase. It can be quite
inefficient to send each request separately. Instead they can give the
RPMH library a batch of requests to be sent and wait on the whole
transaction to be complete.
rpmh_write_batch() is a blocking call that can be used to send multiple
RPMH command sets. Each RPMH command set is set asynchronously and the
API blocks until all the command sets are complete and receive their
tx_done callbacks.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Platform drivers that want to send a request but do not want to block
until the RPMH request completes have now a new API -
rpmh_write_async().
The API allocates memory and send the requests and returns the control
back to the platform driver. The tx_done callback from the controller is
handled in the context of the controller's thread and frees the
allocated memory. This API allows RPMH requests from atomic contexts as
well.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Active state requests are sent immediately to the RSC controller, while
sleep and wake state requests are cached in this driver to avoid taxing
the RSC controller repeatedly. The cached values will be sent to the
controller when the rpmh_flush() is called.
Generally, flushing is a system PM activity and may be called from the
system PM drivers when the system is entering suspend or deeper sleep
modes during cpuidle.
Also allow invalidating the cached requests, so they may be re-populated
again.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
[rplsssn: remove unneeded semicolon, address line over 80chars error]
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Sending RPMH requests and waiting for response from the controller
through a callback is common functionality across all platform drivers.
To simplify drivers, add a library functions to create RPMH client and
send resource state requests.
rpmh_write() is a synchronous blocking call that can be used to send
active state requests.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
Add controller driver for QCOM SoCs that have hardware based shared
resource management. The hardware IP known as RSC (Resource State
Coordinator) houses multiple Direct Resource Voter (DRV) for different
execution levels. A DRV is a unique voter on the state of a shared
resource. A Trigger Control Set (TCS) is a bunch of slots that can house
multiple resource state requests, that when triggered will issue those
requests through an internal bus to the Resource Power Manager Hardened
(RPMH) blocks. These hardware blocks are capable of adjusting clocks,
voltages, etc. The resource state request from a DRV are aggregated
along with state requests from other processors in the SoC and the
aggregate value is applied on the resource.
Some important aspects of the RPMH communication -
- Requests are <addr, value> with some header information
- Multiple requests (upto 16) may be sent through a TCS, at a time
- Requests in a TCS are sent in sequence
- Requests may be fire-n-forget or completion (response expected)
- Multiple TCS from the same DRV may be triggered simultaneously
- Cannot send a request if another request for the same addr is in
progress from the same DRV
- When all the requests from a TCS are complete, an IRQ is raised
- The IRQ handler needs to clear the TCS before it is available for
reuse
- TCS configuration is specific to a DRV
- Platform drivers may use DRV from different RSCs to make requests
Resource state requests made when CPUs are active are called 'active'
state requests. Requests made when all the CPUs are powered down (idle
state) are called 'sleep' state requests. They are matched by a
corresponding 'wake' state requests which puts the resources back in to
previously requested active state before resuming any CPU. TCSes are
dedicated for each type of requests. Active mode TCSes (AMC) are used to
send requests immediately to the resource, while control TCS are used to
provide specific information to the controller. Sleep and Wake TCS send
sleep and wake requests, after and before the system halt respectively.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
LLCC (Last Level Cache Controller) provides additional cache memory
in the system. LLCC is partitioned into multiple slices and each
slice gets its own priority, size, ID and other config parameters.
LLCC driver programs these parameters for each slice. Clients that
are assigned to use LLCC need to get information such size & ID of the
slice they get and activate or deactivate the slice as needed. LLCC driver
provides API for the clients to perform these operations.
Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
Signed-off-by: Rishabh Bhatnagar <rishabhb@codeaurora.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
This passes the information we already have at the call sight into
do_send_sig_info. Ultimately allowing for better handling of signals
sent to a group of processes during fork.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This passes the information we already have at the call sight
into group_send_sig_info. Ultimatelly allowing for to better handle
signals sent to a group of processes.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-07-18
The following series provides fixes to mlx5 core and net device driver.
Please pull and let me know if there's any problem.
For -stable v4.7
net/mlx5e: Don't allow aRFS for encapsulated packets
net/mlx5e: Fix quota counting in aRFS expire flow
For -stable v4.15
net/mlx5e: Only allow offloading decap egress (egdev) flows
net/mlx5e: Refine ets validation function
net/mlx5: Adjust clock overflow work period
For -stable v4.17
net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the code more maintainable by performing more of the signal
related work in send_sigqueue.
A quick inspection of do_timer_create will show that this code path
does not lookup a thread group by a thread's pid. Making it safe
to find the task pointed to by it_pid with "pid_task(it_pid, type)";
This supports the changes needed in fork to tell if a signal was sent
to a single process or a group of processes.
Having the pid to task transition in signal.c will also make it easier
to sort out races with de_thread and and the thread group leader
exiting when it comes time to address that.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id). Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array. Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.
The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.
The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest. The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
To access these fields the code always has to go to group leader so
going to signal struct is no loss and is actually a fundamental simplification.
This saves a little bit of memory by only allocating the pid pointer array
once instead of once for every thread, and even better this removes a
few potential races caused by the fact that group_leader can be changed
by de_thread, while signal_struct can not.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The cost is the the same and this removes the need
to worry about complications that come from de_thread
and group_leader changing.
__task_pid_nr_ns has been updated to take advantage of this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.
KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[expanded documentation wording]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Simply letting IGROUPR be writable from userspace would break
migration from old kernels to newer kernels, because old kernels
incorrectly report interrupt groups as group 1. This would not be a big
problem if userspace wrote GICD_IIDR as read from the kernel, because we
could detect the incompatibility and return an error to userspace.
Unfortunately, this is not the case with current userspace
implementations and simply letting IGROUPR be writable from userspace for
an emulated GICv2 silently breaks migration and causes the destination
VM to no longer run after migration.
We now encourage userspace to write the read and expected value of
GICD_IIDR as the first part of a GIC register restore, and if we observe
a write to GICD_IIDR we know that userspace has been updated and has had
a chance to cope with older kernels (VGICv2 IIDR.Revision == 0)
incorrectly reporting interrupts as group 1, and therefore we now allow
groups to be user writable.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now when we have a group configuration on the struct IRQ, use this state
when populating the LR and signaling interrupts as either group 0 or
group 1 to the VM. Depending on the model of the emulated GIC, and the
guest's configuration of the VMCR, interrupts may be signaled as IRQs or
FIQs to the VM.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for proper group 0 and group 1 support in the vgic, we
add a field in the struct irq to store the group of all interrupts.
We initialize the group to group 0 when emulating GICv2 and to group 1
when emulating GICv3, just like we treat them today. LPIs are always
group 1. We also continue to ignore writes from the guest, preserving
existing functionality, for now.
Finally, we also add this field to the vgic debug logic to show the
group for all interrupts.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we are about to tweak implementation aspects of the VGIC emulation,
while still preserving some level of backwards compatibility support,
add a field to keep track of the implementation revision field which is
reported to the VM and to userspace.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of hardcoding the shifts and masks in the GICD_IIDR register
emulation, let's add the definition of these fields to the GIC header
files and use them.
This will make things more obvious when we're going to bump the revision
in the IIDR when we'll make guest-visible changes to the implementation.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add new channel type support for phase.
This channel may be used by Time-of-flight sensors to express the
phase difference between emitted and received signals. Those sensor
will then use the phase shift of return signals to approximate the
distance to objects.
Signed-off-by: Mathieu Othacehe <m.othacehe@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
This preserves whatever was written even if we can't currently display the
given glyph. Mouse paste won't corrupt any character of wcwidth() == 1
anymore.
Note that for now uniscr doesn't get allocated until something reads
/dev/vcsuN for that console, making this code dormant for most users.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>