There is an include loop between netdevice.h, dsa.h, devlink.h because
of NETDEV_ALIGN, making it impossible to use devlink structures in
dsa.h.
Break this loop by taking dsa.h out of netdevice.h, add a forward
declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
function, which is what netdevice.h requires.
No longer having dsa.h in netdevice.h means the includes in dsa.h no
longer get included. This breaks a few other files which depend on
these includes. Add these directly in the affected file.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor inet6_netconf_notify_devconf to take the event as an input arg.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor inet_netconf_notify_devconf to take the event as an input arg.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for NETDEV_RESEND_IGMP event similar
to how it works for IPv4.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the standalone SMD implementation as we have transitioned the
client drivers to use the RPMSG based one.
Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
keep them selectable in the absence of the removed symbol.
Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
By moving these client drivers to use RPMSG instead of the direct SMD
API we can reuse them ontop of the newly added GLINK wire-protocol
support found in the 820 and 835 Qualcomm platforms.
As the new (RPMSG-based) and old SMD implementations are mutually
exclusive we have to change all client drivers in one commit, to make
sure we have a working system before and after this transition.
Acked-by: Andy Gross <andy.gross@linaro.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Laight noticed the support for MSG_MORE with datamsg->force_delay
didn't really work as we expected, as the first msg with MSG_MORE set
would always block the following chunks' dequeuing.
This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
David Laight suggested.
asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
All chunks in queue would not be sent out if asoc->force_delay is set
by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
clears asoc->force_delay.
Note that this change would not affect the flush is generated by other
triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.
v1->v2:
Not clear asoc->force_delay after sending the msg with MSG_MORE flag.
Fixes: 4ea0c32f5f ("sctp: add support for MSG_MORE")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Laight <david.laight@aculab.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pipeline debug is used to export the pipeline abstractions for the
main objects - tables, headers and entries. The only support for set is
for changing the counter parameter on specific table.
The basic structures:
Header - can represent a real protocol header information or internal
metadata. Generic protocol headers like IPv4 can be shared
between drivers. Each driver can add local headers.
Field - part of a header. Can represent protocol field or specific ASIC
metadata field. Hardware special metadata fields can be mapped
to different resources, for example switch ASIC ports can have
internal number which from the systems point of view is mapped
to netdeivce ifindex.
Match - represent specific match rule. Can describe match on specific
field or header. The header index should be specified as well
in order to support several header instances of the same type
(tunneling).
Action - represents specific action rule. Actions can describe operations
on specific field values for example like set, increment, etc.
And header operation like add and delete.
Value - represents value which can be associated with specific match or
action.
Table - represents a hardware block which can be described with match/
action behavior. The match/action can be done on the packets
data or on the internal metadata that it gathered along the
packets traversal throw the pipeline which is vendor specific
and should be exported in order to provide understanding of
ASICs behavior.
Entry - represents single record in a specific table. The entry is
identified by specific combination of values for match/action.
Prior to accessing the tables/entries the drivers provide the header/
field data base which is used by driver to user-space. The data base
is split between the shared headers and unique headers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current way to implement an always on PM domain consists of returning
-EBUSY from the ->power_off() callback. This is a bit different compared to
using the always on genpd governor, which prevents the PM domain from being
powered off via runtime suspend, but not via system suspend.
The approach to return -EBUSY from the ->power_off() callback to support
always on PM domains in genpd is suboptimal. That is because it requires
genpd to follow the regular execution path of the power off sequence, which
ends by invoking the ->power_off() callback.
To enable genpd to early abort the power off sequence for always on PM
domains, it needs static information about these configurations. Therefore
let's add a new genpd configuration flag, GENPD_FLAG_ALWAYS_ON.
Users of the new GENPD_FLAG_ALWAYS_ON flag, are by genpd required to make
sure the PM domain is powered on before calling pm_genpd_init(). Moreover,
users don't need to implement the ->power_off() callback, as genpd doesn't
ever invoke it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Similar to OF endpoints, endpoint type nodes can be also supported on
ACPI. In order to make it possible for drivers to ignore the matter,
add a type for fwnode_endpoint and a function to parse them.
On ACPI, find the child node index instead of relying on the "endpoint"
property.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function to obtain a fwnode related to a struct device is useful for
drivers that use the fwnode property API: it allows not being aware of the
underlying firmware implementation.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
of_fwnode_handle() returns a struct fwnode_handle of the struct
device_node. This may be used on the fwnode property API.
Use a macro instead of a function in order to support const and non-const
arguments.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
fwnode_handle_get() is used to obtain a reference to a fwnode_handle
container. In this case this is OF specific struct device_node.
This complements fwnode_handle_put() which is already implemented.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This follows DT implementation of of_graph_* APIs but we call them
fwnode_graph_* instead. For DT nodes the existing of_graph_* implementation
will be used. For ACPI we use the new ACPI graph implementation instead.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
DT has had concept of remote endpoints for some time already. It makes
possible to reference another firmware node through a property called
remote-endpoint. This is already used by some subsystems like v4l2 for
parsing hardware properties related to camera.
This patch adds ACPI support for remote endpoints utilizing _DSD
hierarchical data extensions.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since now we have means to enumerate all children of any fwnode even in
ACPI we can implement fwnode_get_named_child_node(). This is similar than
device_get_named_child_node() with the exception that it can be called to
any fwnode handle. Make device_get_named_child_node() call directly this
new function.
This is useful in cases where we need to be able to find child nodes which
are not direct descendants of the parent device.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ACPI _DSD hierarchical data extension makes it possible to have
hierarchies deeper than one level in similar way than DT allows. These
"subsubnodes" have not been accessible because device property
implementation only provides device_get_next_child_node() that is limited
to direct descendants of a device.
We need this ability in order support things like remote endpoints
currently supported in DT with of_graph_* APIs.
Modify acpi_get_next_subnode() to accept fwnode handle instead and update
callers accordingly. Also add a new function fwnode_get_next_child_node()
that works directly with fwnodes and modify device_get_next_child_node() to
call it directly. While there add a macro fwnode_for_each_child_node()
analogous to the current device_for_each_child_node() but it works with
fwnodes instead of devices.
Link: http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.pdf
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that ACPI has support for returning parent firmware node for both types
of nodes we can expose this to others as well. This adds a new function
fwnode_get_parent() that can be used for DT and ACPI nodes to retrieve the
parent firmware node.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sometimes it is useful to be able to navigate firmware node hierarchy
upwards toward parent nodes. ACPI device nodes are pretty much already
supported because ACPICA provides acpi_get_parent(). ACPI data nodes,
however, are all below the same parent ACPI device. Their hierarchy is
created by "linking" each other using references in the value field.
Add parent pointer to the parent data node while we create them so it is
easy to navigate the hierarchy backwards. We use this parent pointer in a
new function acpi_node_get_parent() that is able to extract parent of both
ACPI firmware node types.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:
pcpu_alloc+0x68f/0x710
__alloc_percpu_gfp+0xd/0x10
__percpu_counter_init+0x55/0xc0
cfq_pd_alloc+0x3b2/0x4e0
blkg_alloc+0x187/0x230
blkg_create+0x489/0x670
blkg_lookup_create+0x9a/0x230
blkg_conf_prep+0x1fb/0x240
__cfqg_set_weight_device.isra.105+0x5c/0x180
cfq_set_weight_on_dfl+0x69/0xc0
cgroup_file_write+0x39/0x1c0
kernfs_fop_write+0x13f/0x1d0
__vfs_write+0x23/0x120
vfs_write+0xc2/0x1f0
SyS_write+0x44/0xb0
entry_SYSCALL_64_fastpath+0x18/0xad
In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.
A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.
Update blkg_create() function to temporarily drop the rcu and queue
locks when it is allowed by gfp mask.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Some of the Broadcom iProc SoCs have FlexRM ring manager
which provides a ring-based programming interface to various
offload engines (e.g. RAID, Crypto, etc).
This patch adds a common mailbox driver for Broadcom FlexRM
ring manager which can be shared by various offload engine
drivers (implemented as mailbox clients).
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Pramod KUMAR <pramod.kumar@broadcom.com>
Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
This driver is now used only on platforms which support device tree, so
it is safe to remove legacy platform data based initialization code.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
For plat-samsung:
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
User configures latency target, but the latency threshold for each
request size isn't fixed. For a SSD, the IO latency highly depends on
request size. To calculate latency threshold, we sample some data, eg,
average latency for request size 4k, 8k, 16k, 32k .. 1M. The latency
threshold of each request size will be the sample latency (I'll call it
base latency) plus latency target. For example, the base latency for
request size 4k is 80us and user configures latency target 60us. The 4k
latency threshold will be 80 + 60 = 140us.
To sample data, we calculate the order base 2 of rounded up IO sectors.
If the IO size is bigger than 1M, it will be accounted as 1M. Since the
calculation does round up, the base latency will be slightly smaller
than actual value. Also if there isn't any IO dispatched for a specific
IO size, we will use the base latency of smaller IO size for this IO
size.
But we shouldn't sample data at any time. The base latency is supposed
to be latency where disk isn't congested, because we use latency
threshold to schedule IOs between cgroups. If disk is congested, the
latency is higher, using it for scheduling is meaningless. Hence we only
do the sampling when block throttling is in the LOW limit, with
assumption disk isn't congested in such state. If the assumption isn't
true, eg, low limit is too high, calculated latency threshold will be
higher.
Hard disk is completely different. Latency depends on spindle seek
instead of request size. Currently this feature is SSD only, we probably
can use a fixed threshold like 4ms for hard disk though.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently there is no way to know the request size when the request is
finished. Next patch will need this info. We could add extra field to
record the size, but blk_issue_stat has enough space to record it, so
this patch just overloads blk_issue_stat. With this, we will have 49bits
to track time, which still is very long time.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
A cgroup gets assigned a low limit, but the cgroup could never dispatch
enough IO to cross the low limit. In such case, the queue state machine
will remain in LIMIT_LOW state and all other cgroups will be throttled
according to low limit. This is unfair for other cgroups. We should
treat the cgroup idle and upgrade the state machine to lower state.
We also have a downgrade logic. If the state machine upgrades because of
cgroup idle (real idle), the state machine will downgrade soon as the
cgroup is below its low limit. This isn't what we want. A more
complicated case is cgroup isn't idle when queue is in LIMIT_LOW. But
when queue gets upgraded to lower state, other cgroups could dispatch
more IO and this cgroup can't dispatch enough IO, so the cgroup is below
its low limit and looks like idle (fake idle). In this case, the queue
should downgrade soon. The key to determine if we should do downgrade is
to detect if cgroup is truely idle.
Unfortunately it's very hard to determine if a cgroup is real idle. This
patch uses the 'think time check' idea from CFQ for the purpose. Please
note, the idea doesn't work for all workloads. For example, a workload
with io depth 8 has disk utilization 100%, hence think time is 0, eg,
not idle. But the workload can run higher bandwidth with io depth 16.
Compared to io depth 16, the io depth 8 workload is idle. We use the
idea to roughly determine if a cgroup is idle.
We treat a cgroup idle if its think time is above a threshold (by
default 1ms for SSD and 100ms for HD). The idea is think time above the
threshold will start to harm performance. HD is much slower so a longer
think time is ok.
The patch (and the latter patches) uses 'unsigned long' to track time.
We convert 'ns' to 'us' with 'ns >> 10'. This is fast but loses
precision, should not a big deal.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
to allow MIPS KVM to support VZ without confusing old users (which
expect the trap & emulate implementation), define and start checking
KVM_CREATE_VM type codes.
The codes available are:
- KVM_VM_MIPS_TE = 0
This is the current value expected from the user, and will create a
VM using trap & emulate in user mode, confined to the user mode
address space. This may in future become unavailable if the kernel is
only configured to support VZ, in which case the EINVAL error will be
returned and KVM_CAP_MIPS_TE won't be available even though
KVM_CAP_MIPS_VZ is.
- KVM_VM_MIPS_VZ = 1
This can be provided when the KVM_CAP_MIPS_VZ capability is available
to create a VM using VZ, with a fully virtualized guest virtual
address space. If VZ support is unavailable in the kernel, the EINVAL
error will be returned (although old kernels without the
KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
emulate VM).
This is designed to allow the desired implementation (T&E vs VZ) to be
potentially chosen at runtime rather than being fixed in the kernel
configuration.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
HW drivers will use the header-type and command fields from the extended
keys, and some fields (e.g mask, val, offset) from the legacy keys.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the definitions related to creation/deletion of a modify header
context and the modify header steering action which are used for HW
packet header modify (re-write) as part of steering. Add as well the
modify header id into two intermediate structs and set it to the FTE.
Note that as the push/pop vlan steering actions are emulated by the
ewitch management code, we're not breaking any compatibility while
changing their values to make room for the modify header action which
is not emulated and whose value is part of the FW API. The new bit
values for the emulated actions are at the end of the possible range.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There are bunch of places in the code where the intermediate struct
that keeps the elements related to flow actions is initialized with
the same default values. Put that into a small DECLARE type helper.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MOTU FireWire series can transfer messages to registered address. These
messages are transferred for the status of internal clock synchronization
just after starting streams.
When the synchronization is stable, it's 0x01ffffff. Else, it's 0x05ffffff.
This commit adds a functionality for user space applications to receive
content of the message.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This commit adds hwdep interface so as the other sound drivers for units
on IEEE 1394 bus have.
This interface is designed for mixer/control applications. By using this
interface, an application can get information about firewire node, can
lock/unlock kernel streaming and can get notification at starting/stopping
kernel streaming.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add a calc_num_ports callback to the generic driver and verify that the
device has the required endpoints there instead of in core.
Note that the generic driver num_ports field was never used.
Signed-off-by: Johan Hovold <johan@kernel.org>
Allow subdrivers to modify the port-endpoint mapping by passing the
endpoint descriptors to calc_num_ports.
The callback can now also be used to verify that the required endpoints
exists and abort probing otherwise.
This will allow us to get rid of a few hacks in subdrivers that are
already modifying the port-endpoint mapping (or aborting probe due to
missing endpoints), but only after the port structures have been setup.
Signed-off-by: Johan Hovold <johan@kernel.org>
This fixes compilation for files, that try to include the
cpcap header in alphabetically sorted #include lists.
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The code introduced by commit 0c8893c9095d ("clockevents: Add a
clkevt-of mechanism like clksrc-of") refer to __clkevt_of_table
what doesn't exist in the vmlinux. As a result kernel build
failed with error: "clkevt-probe.c:63: undefined reference to
`__clkevt_of_table’"
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The patch fix syntax errors introduced by commit 0c8893c9095d
("clockevents: Add a clkevt-of mechanism like clksrc-of").
Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>