Pablo Neira Ayuso says:
====================
Netfilter/IPVS/OVS updates for net-next
The following patchset contains Netfilter/IPVS fixes and OVS NAT
support, more specifically this batch is composed of:
1) Fix a crash in ipset when performing a parallel flush/dump with
set:list type, from Jozsef Kadlecsik.
2) Make sure NFACCT_FILTER_* netlink attributes are in place before
accessing them, from Phil Turnbull.
3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
helper, from Arnd Bergmann.
4) Add workaround to IPVS to reschedule existing connections to new
destination server by dropping the packet and wait for retransmission
of TCP syn packet, from Julian Anastasov.
5) Allow connection rescheduling in IPVS when in CLOSE state, also
from Julian.
6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.
7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.
8) Check match/targetinfo netlink attribute size in nft_compat,
patch from Florian Westphal.
9) Check for integer overflow on 32-bit systems in x_tables, from
Florian Westphal.
Several patches from Jarno Rajahalme to prepare the introduction of
NAT support to OVS based on the Netfilter infrastructure:
10) Schedule IP_CT_NEW_REPLY definition for removal in
nf_conntrack_common.h.
11) Simplify checksumming recalculation in nf_nat.
12) Add comments to the openvswitch conntrack code, from Jarno.
13) Update the CT state key only after successful nf_conntrack_in()
invocation.
14) Find existing conntrack entry after upcall.
15) Handle NF_REPEAT case due to templates in nf_conntrack_in().
16) Call the conntrack helper functions once the conntrack has been
confirmed.
17) And finally, add the NAT interface to OVS.
The batch closes with:
18) Cleanup to use spin_unlock_wait() instead of
spin_lock()/spin_unlock(), from Nicholas Mc Guire.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull RAS updates from Ingo Molnar:
"Various RAS updates:
- AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
MCA) error decoding support (Aravind Gopalakrishnan)
- x86 memcpy_mcsafe() support, to enable smart(er) hardware error
recovery in NVDIMM drivers, based on an extension of the x86
exception handling code. (Tony Luck)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/sb_edac: Fix computation of channel address
x86/mm, x86/mce: Add memcpy_mcsafe()
x86/mce/AMD: Document some functionality
x86/mce: Clarify comments regarding deferred error
x86/mce/AMD: Fix logic to obtain block address
x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
x86/mce: Move MCx_CONFIG MSR definitions
x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
x86/mm: Expand the exception table logic to allow new handling options
x86/mce/AMD: Set MCAX Enable bit
x86/mce/AMD: Carve out threshold block preparation
x86/mce/AMD: Fix LVT offset configuration for thresholding
x86/mce/AMD: Reduce number of blocks scanned per bank
x86/mce/AMD: Do not perform shared bank check for future processors
x86/mce: Fix order of AMD MCE init function call
There is no way to detect the scsi_target_id for any given SAS remote
port, so add a new sysfs attribute 'scsi_target_id'.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS devices (and may be host) have issues if LCC is
enabled. So we are setting PA_Local_TX_LCC_Enable to 0
before link startup which will make sure that both host
and device TX LCC are disabled once link startup is
completed.
Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We put the UFS device in sleep state & UFS link in hibern8 state during
runtime suspend. After this we put all the UFS rails in low power
modes immediately but it seems some devices may still draw more than
sleep current from UFS rails (especially from VCCQ rail) at-least for
500us.
To avoid this situation, this change adds 2ms delay before putting
these UFS rails in LPM mode.
Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently when we try to put the link in off/disabled state during
suspend, it seems link is not being kept in low power mode.
This patch fixes the issue by putting the link in hibern8 first
(so device also puts the link in low power mode) and then stop the
host controller.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Optimal values of local UniPro parameters like PA_Hibern8Time &
PA_TActivate can help reduce the hibern8 exit latency. If both host and
device supports UniPro ver1.6 or later, these parameters will be
automatically tuned during link startup itself. But if either host or
device doesn't support UniPro ver 1.6 or later, we have to manually
tune them. But to keep manual tuning logic simple, we will only do
manual tuning if local unipro version doesn't support ver1.6 or later.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We are seeing that some devices are raising the urgent bkops exception
events even when BKOPS status doesn't indicate performace impacted or
critical. Handle these device by determining their urgent bkops status
at runtime.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Query commands have 100ms timeout and it may timeout if they are
issued in parallel to ongoing read/write SCSI commands, this change
adds the retry (max: 10) in case command timeouts.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some vendor's UFS device sends back to back NACs for the DL data frames
causing the host controller to raise the DFES error status. Sometimes
such UFS devices send back to back NAC without waiting for new
retransmitted DL frame from the host and in such cases it might be
possible the Host UniPro goes into bad state without raising the DFES
error interrupt. If this happens then all the pending commands would
timeout only after respective SW command (which is generally too
large).
This change workarounds such device behaviour like this:
- As soon as SW sees the DL NAC error, it would schedule the error
handler
- Error handler would sleep for 50ms to see if there any fatal errors
raised by UFS controller.
- If there are fatal errors then SW does normal error recovery.
- If there are no fatal errors then SW sends the NOP command to
device to check if link is alive.
- If NOP command times out, SW does normal error recovery
- If NOP command succeed, skip the error handling.
If DL NAC error is seen multiple times with some vendor's UFS devices
then enable this quirk to initiate quick error recovery and also
silence related error logs to reduce spamming of kernel logs.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
UFS driver's error handler forcefully tries to clear all the pending
requests. For each pending request in the queue, it waits 1 sec for it
to get cleared. If we have multiple requests in the queue then it's
possible that we might end up waiting for those many seconds before
resetting the host. But note that resetting host would any way clear
all the pending requests from the hardware. Hence this change skips
the forceful clear of the pending requests if we are anyway going to
reset the host (for fatal errors).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS devices don't require VCCQ rail for device operations hence
this change adds support to recognize such devices and remove vote for
the unused VCCQ rail.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we use the host quirks mechanism in order to
handle both device and host controller quirks.
In order to support various of UFS devices we should separate
handling the device quirks from the host controller's.
Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Raviv Shvili <rshvili@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sometimes due to hw issues it takes some time to the
host controller register to update. In order to verify the register
has updated, a polling is done until its value is set.
In addition the functions ufshcd_hba_stop() and
ufshcd_wait_for_register() was updated with an additional input
parameter, indicating the timeout between reads will
be done by sleeping or spinning the cpu.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Raviv Shvili <rshvili@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A race condition exists between request requeueing and scsi layer
error handling:
When UFS driver queuecommand returns a busy status for a request,
it will be requeued and its tag will be freed and set to -1.
At the same time it is possible that the request will timeout and
scsi layer will start error handling for it. The scsi layer reuses
the request and its tag to send error related commands to the device,
however its tag is no longer valid.
As this request was never really sent to the device, there is no
point to start error handling with the device.
Implement the scsi error handling timeout callback and bypass SCSI
error handling for request that were not actually sent to the device.
For such requests simply reset the block layer timer. Otherwise, let
SCSI layer perform the usual error handling.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When control reaches to Linux UFS driver during UFS boot mode, UFS host
controller interrupt status/enable registers may have left over
settings.
In order to avoid any spurious interrupts due to these left overs,
it's important to clear these interrupt status/enable registers before
enabling UFS interrupt handling.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Different platform may have different number of lanes
for the UFS link.
Add parameter to device tree specifying how many lanes
should be configured for the UFS link.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull perf updates from Ingo Molnar:
"Main kernel side changes:
- Big reorganization of the x86 perf support code. The old code grew
organically deep inside arch/x86/kernel/cpu/perf* and its naming
became somewhat messy.
The new location is under arch/x86/events/, using the following
cleaner hierarchy of source code files:
perf/x86: Move perf_event.c .................. => x86/events/core.c
perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c
(Borislav Petkov)
- Update various x86 PMU constraint and hw support details (Stephane
Eranian)
- Optimize kprobes for BPF execution (Martin KaFai Lau)
- Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
Gleixner)
- Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)
- Various fixes and smaller cleanups.
There are lots of perf tooling updates as well. A few highlights:
perf report/top:
- Hierarchy histogram mode for 'perf top' and 'perf report',
showing multiple levels, one per --sort entry: (Namhyung Kim)
On a mostly idle system:
# perf top --hierarchy -s comm,dso
Then expand some levels and use 'P' to take a snapshot:
# cat perf.hist.0
- 92.32% perf
58.20% perf
22.29% libc-2.22.so
5.97% [kernel]
4.18% libelf-0.165.so
1.69% [unknown]
- 4.71% qemu-system-x86
3.10% [kernel]
1.60% qemu-system-x86_64 (deleted)
+ 2.97% swapper
#
- Add 'L' hotkey to dynamicly set the percent threshold for
histogram entries and callchains, i.e. dynamicly do what the
--percent-limit command line option to 'top' and 'report' does.
(Namhyung Kim)
perf mem:
- Allow specifying events via -e in 'perf mem record', also listing
what events can be specified via 'perf mem record -e list' (Jiri
Olsa)
perf record:
- Add 'perf record' --all-user/--all-kernel options, so that one
can tell that all the events in the command line should be
restricted to the user or kernel levels (Jiri Olsa), i.e.:
perf record -e cycles:u,instructions:u
is equivalent to:
perf record --all-user -e cycles,instructions
- Make 'perf record' collect CPU cache info in the perf.data file header:
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
$ perf report --header-only -I | tail -10 | head -8
# CPU cache info:
# L1 Data 32K [0-1]
# L1 Instruction 32K [0-1]
# L1 Data 32K [2-3]
# L1 Instruction 32K [2-3]
# L2 Unified 256K [0-1]
# L2 Unified 256K [2-3]
# L3 Unified 4096K [0-3]
Will be used in 'perf c2c' and eventually in 'perf diff' to
allow, for instance running the same workload in multiple
machines and then when using 'diff' show the hardware difference.
(Jiri Olsa)
- Improved support for Java, using the JVMTI agent library to do
jitdumps that then will be inserted in synthesized
PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
ELF files stored in ~/.debug and keyed with build-ids, to allow
symbol resolution and even annotation with source line info, see
the changeset comments to see how to use it (Stephane Eranian)
perf script/trace:
- Decode data_src values (e.g. perf.data files generated by 'perf
mem record') in 'perf script': (Jiri Olsa)
# perf script
perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Improve support to 'data_src', 'weight' and 'addr' fields in
'perf script' (Jiri Olsa)
- Handle empty print fmts in 'perf script -s' i.e. when running
python or perl scripts (Taeung Song)
perf stat:
- 'perf stat' now shows shadow metrics (insn per cycle, etc) in
interval mode too. E.g:
# perf stat -I 1000 -e instructions,cycles sleep 1
# time counts unit events
1.000215928 519,620 instructions # 0.69 insn per cycle
1.000215928 752,003 cycles
<SNIP>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
- Implement CSV metrics output in 'perf stat' (Andi Kleen)
perf BPF support:
- Support converting data from bpf events in 'perf data' (Wang Nan)
- Print bpf-output events in 'perf script': (Wang Nan).
# perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
# perf script
usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
#
- Add API to set values of map entries in a BPF object, be it
individual map slots or ranges (Wang Nan)
- Introduce support for the 'bpf-output' event (Wang Nan)
- Add glue to read perf events in a BPF program (Wang Nan)
- Improve support for bpf-output events in 'perf trace' (Wang Nan)
... and tons of other changes as well - see the shortlog and git log
for details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
perf stat: Add --metric-only support for -A
perf stat: Implement --metric-only mode
perf stat: Document CSV format in manpage
perf hists browser: Check sort keys before hot key actions
perf hists browser: Allow thread filtering for comm sort key
perf tools: Add sort__has_comm variable
perf tools: Recalc total periods using top-level entries in hierarchy
perf tools: Remove nr_sort_keys field
perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
perf tools: Remove hist_entry->fmt field
perf tools: Fix command line filters in hierarchy mode
perf tools: Add more sort entry check functions
perf tools: Fix hist_entry__filter() for hierarchy
perf jitdump: Build only on supported archs
tools lib traceevent: Add '~' operation within arg_num_eval()
perf tools: Omit unnecessary cast in perf_pmu__parse_scale
perf tools: Pass perf_hpp_list all the way through setup_sort_list
perf tools: Fix perf script python database export crash
perf jitdump: DWARF is also needed
perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
...
xfs_dir2_node_trim_free can return with setting the rvalp argument
pointer. Initialize it to 0 at the beginning of the function and
only update it to 1 if we succeeded trimming a freespace block.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
__xfs_trans_roll() can return without setting the
*committed argument; this was a problem for xfs_bmap_finish():
int committed;/* xact committed or not */
...
error = __xfs_trans_roll(tp, ip, &committed);
if (error) {
...
if (committed) {
and we tested an uninitialized "committed" variable on the
error path. No caller is preserving "committed" state across
calls to __xfs_trans_roll(), so just initialize committed inside
the function to avoid future errors like this.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_bmap_del_extent() handles extent removal from the in-core and
on-disk extent lists. When removing a delalloc range, it updates the
indirect block reservation appropriately based on the removal. It
currently enforces that the new indirect block reservation is less than
or equal to the original. This is normally the case in all situations
except for in certain cases when the removed range creates a hole in a
single delalloc extent, thus splitting a single delalloc extent in two.
It is possible with small enough extents to split an indlen==1 extent
into two such slightly smaller extents. This leaves one extent with 0
indirect blocks and leads to assert failures in other areas (e.g.,
xfs_bunmapi() if the extent happens to be removed).
Update the indlen distribution code to steal blocks from the deleted
extent, if necessary, to satisfy the worst case total indirect
reservation for the new extents. This is safe as the caller does not
update the fdblocks counters until the extent is removed. Blocks stolen
in this manner simply remain accounted as allocated, having ownership
transferred from the data extent to an indirect reservation.
As a precaution, fall back to the original reservation algorithm if the
new indlen requirement is not met and warn if we end up with extents
without any reservation at all to detect this more easily in the future.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The delayed allocation indirect reservation splitting code is not
sufficient in some cases where a delalloc extent is split in two. In
preparation for enhancements to this code, refactor the current indlen
distribution algorithm into a new helper function.
[dchinner: rename temp, temp2 variables]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_bunmapi() currently updates the fdblocks counter, unreserves quota,
etc. before the extent is deleted by xfs_bmap_del_extent(). The function
has problems dividing up the indirect reserved blocks for scenarios
where a single delalloc extent is split in two. Particularly, there
aren't always enough blocks reserved for multiple extents in a single
extent reservation.
The solution to this problem is to allow the extent removal code to
steal from the deleted extent to meet indirect reservation requirements.
Move the block of code in xfs_bmapi() that updates the fdblocks counter
to after the call to xfs_bmap_del_extent() to allow the codepath to
update the extent record before the free blocks are accounted. Also,
reshuffle the code slightly so the delalloc accounting occurs near the
xfs_bmap_del_extent() call to provide context for the comments.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Add a DEBUG mode-only sysfs knob to enable forced buffered write
failure. An additional side effect of this mode is brute force killing
of delayed allocation blocks in the range of the write. The latter is
the prime motiviation behind this patch, as userspace test
infrastructure requires a reliable mechanism to create and split
delalloc extents without causing extent conversion.
Certain fallocate operations (i.e., zero range) were used for this in
the past, but the implementations have changed such that delalloc
extents are flushed and converted to real blocks, rendering the test
useless.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The spin_lock()/spin_unlock() is synchronizing on the
nf_conntrack_locks_all_lock which is equivalent to
spin_unlock_wait() but the later should be more efficient.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull read-only kernel memory updates from Ingo Molnar:
"This tree adds two (security related) enhancements to the kernel's
handling of read-only kernel memory:
- extend read-only kernel memory to a new class of formerly writable
kernel data: 'post-init read-only memory' via the __ro_after_init
attribute, and mark the ARM and x86 vDSO as such read-only memory.
This kind of attribute can be used for data that requires a once
per bootup initialization sequence, but is otherwise never modified
after that point.
This feature was based on the work by PaX Team and Brad Spengler.
(by Kees Cook, the ARM vDSO bits by David Brown.)
- make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
Kconfig option. This simplifies the kernel and also signals that
read-only memory is the default model and a first-class citizen.
(Kees Cook)"
* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
Userspace tools are not aware of how to convert the enums provided by
the tracepoints to their corresponding strings.
Adding TRACE_DEFINE_ENUM() macros allows to make the enums available
to userspace to let the tools know what those enum values represent.
In particular, for thermal zone trip types what we obtained before was
something like:
kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc
id=0 trip=1 trip_type=1
Unfortunately, userspace tools do not know how to convert enum values to
strings and as a consequence they can only forward the enum value to the
output. By using TRACE_DEFINE_ENUM() macros for thermal traces we get the
following trace line:
kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc
id=0 trip=1 trip_type=PASSIVE
Userspace tools are now able to better understand the meaning of the trip_type
and provide the user with more readable information.
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Enabled temperature reporting device of Skylake Platform Controller hub.
The register map is same as the wildcat point thermal currently implemented
in this driver.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This pull request covers what's left for 4.6. Notably, it includes a
significant 3D performance improvement and a fix to HDMI hotplug
detection for the Pi2/3.
* tag 'drm-vc4-next-2016-03-14' of github.com:anholt/linux:
drm/vc4: Recognize a more specific compatible string for V3D.
dt-bindings: Add binding docs for V3D.
drm/vc4: Return -EFAULT on copy_from_user() failure
drm/vc4: Respect GPIO_ACTIVE_LOW on HDMI HPD if set in the devicetree.
drm/vc4: Let gpiolib know that we're OK with sleeping for HPD.
drm/vc4: improve throughput by pipelining binning and rendering jobs
On loaded TCP servers, looking at millions of sockets can hold
cpu for many seconds, if the lookup condition is very narrow.
(eg : ss dst 1.2.3.4 )
Better add a cond_resched() to allow other processes to access
the cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The smc91x driver defines a macro that compares its argument to
itself, apparently to get a true result while using its argument
to avoid a warning about unused local variables.
Unfortunately, this triggers a warning with gcc-6, as the comparison
is obviously useless:
drivers/net/ethernet/smsc/smc91x.c: In function 'smc_hardware_send_pkt':
drivers/net/ethernet/smsc/smc91x.c:563:14: error: self-comparison always evaluates to true [-Werror=tautological-compare]
if (!smc_special_trylock(&lp->lock, flags)) {
This replaces the macro with another one that behaves similarly,
with a cast to (void) to ensure the argument is used, and using
a literal 'true' as its value.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull dma_*_writecombine rename from Ingo Molnar:
"Rename dma_*_writecombine() to dma_*_wc()
This is a tree-wide API rename, to move the dma_*() write-combining
APIs closer in name to their usual API families. (The old API names
are kept as compatibility wrappers to not introduce extra breakage.)
The patch was Coccinelle generated"
* 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dma, mm/pat: Rename dma_*_writecombine() to dma_*_wc()
The compiler store-fusion example in memory-barriers.txt uses a C
comment to represent arbitrary code that does not update a given
variable. Unfortunately, someone could reasonably interpret the
comment as instead referring to the following line of code. This
commit therefore replaces the comment with a string that more
clearly represents the arbitrary code.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The "transitivity" section mentions cumulativity in a potentially
confusing way. Contrary to the current wording, cumulativity is
not transitivity, but rather a hardware discipline that can be used
to implement transitivity on ARM and PowerPC CPUs. This commit
therefore deletes the mention of cumulativity.
Reported-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The memory-barriers.txt discussion of local transitivity and
release-acquire chains leaves out discussion of the outcome of
the read from "u". This commit therefore adds an outcome showing
that you can get a "1" from this read even if the release-acquire
pairs don't line up.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The introduction of smp_load_acquire() and smp_store_release() had
the side effect of introducing a weaker notion of transitivity:
The transitivity of full smp_mb() barriers is global, but that
of smp_store_release()/smp_load_acquire() chains is local. This
commit therefore introduces the notion of local transitivity and
gives an example.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current memory-barriers.txt does not address the possibility of
a write to a dereferenced pointer. This should be rare, but when it
happens, we need that write -not- to be clobbered by the initialization.
This commit therefore adds an example showing a data dependency ordering
a later data-dependent write.
Reported-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit #1ebee8017d84 (rcu: Eliminate array-index-based RCU primitives)
eliminated the primitives supporting RCU-protected array indexes, but
failed to update Documentation/memory-barriers.txt accordingly. This
commit therefore removes the discussion of RCU-protected array indexes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit fixes a couple of "Compiler Barrier" section references to
be "COMPILER BARRIER". This makes it easier to find the section in
the usual text editors.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The summary of the "CONTROL DEPENDENCIES" section incorrectly states that
barrier() may be used to prevent compiler reordering when more than one
leg of the control-dependent "if" statement start with identical stores.
This is incorrect at high optimization levels. This commit therefore
updates the summary to match the detailed description.
Reported by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull locking changes from Ingo Molnar:
"Various updates:
- Futex scalability improvements: remove page lock use for shared
futex get_futex_key(), which speeds up 'perf bench futex hash'
benchmarks by over 40% on a 60-core Westmere. This makes anon-mem
shared futexes perform close to private futexes. (Mel Gorman)
- lockdep hash collision detection and fix (Alfredo Alvarez
Fernandez)
- lockdep testing enhancements (Alfredo Alvarez Fernandez)
- robustify lockdep init by using hlists (Andrew Morton, Andrey
Ryabinin)
- mutex and csd_lock micro-optimizations (Davidlohr Bueso)
- small x86 barriers tweaks (Michael S Tsirkin)
- qspinlock updates (Waiman Long)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
locking/csd_lock: Explicitly inline csd_lock*() helpers
futex: Replace barrier() in unqueue_me() with READ_ONCE()
locking/lockdep: Detect chain_key collisions
locking/lockdep: Prevent chain_key collisions
tools/lib/lockdep: Fix link creation warning
tools/lib/lockdep: Add tests for AA and ABBA locking
tools/lib/lockdep: Add userspace version of READ_ONCE()
tools/lib/lockdep: Fix the build on recent kernels
locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
locking/mutex: Allow next waiter lockless wakeup
locking/pvqspinlock: Enable slowpath locking count tracking
locking/qspinlock: Use smp_cond_acquire() in pending code
locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
locking/mcs: Fix mcs_spin_lock() ordering
futex: Remove requirement for lock_page() in get_futex_key()
futex: Rename barrier references in ordering guarantees
locking/atomics: Update comment about READ_ONCE() and structures
locking/lockdep: Eliminate lockdep_init()
locking/lockdep: Convert hash tables to hlists
...
Extend OVS conntrack interface to cover NAT. New nested
OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
attributes, new (non-committed/non-confirmed) connections are mangled
according to the rest of the nested attributes.
The corresponding OVS userspace patch series includes test cases (in
tests/system-traffic.at) that also serve as example uses.
This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>