Commit Graph

53479 Commits

Author SHA1 Message Date
Nicolas Pitre
7e7ec6a934 elf_fdpic_transfer_args_to_stack(): make it generic
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25 16:51:49 +10:00
Yotam Gigi
b87f7936a9 net/sched: Add match-all classifier hw offloading.
Following the work that have been done on offloading classifiers like u32
and flower, now the match-all classifier hw offloading is possible. if
the interface supports tc offloading.

To control the offloading, two tc flags have been introduced: skip_sw and
skip_hw. Typical usage:

tc filter add dev eth25 parent ffff: 	\
	matchall skip_sw		\
	action mirred egress mirror	\
	dev eth27

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
David S. Miller
c42d7121fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
they are:

1) Count pre-established connections as active in "least connection"
   schedulers such that pre-established connections to avoid overloading
   backend servers on peak demands, from Michal Kubecek via Simon Horman.

2) Address a race condition when resizing the conntrack table by caching
   the bucket size when fulling iterating over the hashtable in these
   three possible scenarios: 1) dump via /proc/net/nf_conntrack,
   2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
   From Liping Zhang.

3) Revisit early_drop() path to perform lockless traversal on conntrack
   eviction under stress, use del_timer() as synchronization point to
   avoid two CPUs evicting the same entry, from Florian Westphal.

4) Move NAT hlist_head to nf_conn object, this simplifies the existing
   NAT extension and it doesn't increase size since recent patches to
   align nf_conn, from Florian.

5) Use rhashtable for the by-source NAT hashtable, also from Florian.

6) Don't allow --physdev-is-out from OUTPUT chain, just like
   --physdev-out is not either, from Hangbin Liu.

7) Automagically set on nf_conntrack counters if the user tries to
   match ct bytes/packets from nftables, from Liping Zhang.

8) Remove possible_net_t fields in nf_tables set objects since we just
   simply pass the net pointer to the backend set type implementations.

9) Fix possible off-by-one in h323, from Toby DiPasquale.

10) early_drop() may be called from ctnetlink patch, so we must hold
    rcu read size lock from them too, this amends Florian's patch #3
    coming in this batch, from Liping Zhang.

11) Use binary search to validate jump offset in x_tables, this
    addresses the O(n!) validation that was introduced recently
    resolve security issues with unpriviledge namespaces, from Florian.

12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.

13) Three updates for nft_log: Fix log prefix leak in error path. Bail
    out on loglevel larger than debug in nft_log and set on the new
    NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.

14) Allow to filter rule dumps in nf_tables based on table and chain
    names.

15) Simplify connlabel to always use 128 bits to store labels and
    get rid of unused function in xt_connlabel, from Florian.

16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
    helper, by Gao Feng.

17) Put back x_tables module reference in nft_compat on error, from
    Liping Zhang.

18) Add a reference count to the x_tables extensions cache in
    nft_compat, so we can remove them when unused and avoid a crash
    if the extensions are rmmod, again from Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 22:02:36 -07:00
Linus Torvalds
dd95069545 Merge tag 'hwmon-for-linus-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:

 - New drivers for FTS BMC "Teutates", TI INA3221, and Sensirion SHT3x.

 - Added support for Microchip MCP9808 and TI TMP461.

 - Cleanup and minor fixes in various drivers.

* tag 'hwmon-for-linus-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (37 commits)
  Documentation: dtb: xgene: Add hwmon dts binding documentation
  hwmon: (ftsteutates) Remove unused including <linux/version.h>
  hwmon: (adt7411) set bit 3 in CFG1 register
  hwmon: Add driver for FTS BMC chip "Teutates"
  hwmon: (sht3x) add humidity heater element control
  hwmon: (jc42) Add support for generic JC-42.4 devicetree binding
  dt/bindings: Add bindings for JC-42.4 compatible temperature sensors
  hwmon: (tmp102) Convert to use regmap, and drop local cache
  hwmon: (tmp102) Rework chip configuration
  hwmon: (tmp102) Improve handling of initial read delay
  hwmon: (lm90) Drop unnecessary else statements
  hwmon: (lm90) Use bool for valid flag
  hwmon: (lm90) Read limit registers only once
  hwmon: (lm90) Simplify read functions
  hwmon: (lm90) Use devm_hwmon_device_register_with_groups
  hwmon: (lm90) Use devm_add_action for cleanup
  hwmon: (lm75) Convert to use regmap
  hwmon: (lm75) Add update_interval attribute
  hwmon: (lm75) Drop lm75_read_value and lm75_write_value
  hwmon: (lm75) Handle cleanup with devm_add_action
  ...
2016-07-24 21:10:30 -07:00
Linus Torvalds
b7545b79a1 Merge tag 'usb-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB updates from Greg KH:
 "Here's the big USB driver update for 4.8-rc1.  Lots of the normal
  stuff in here, musb, gadget, xhci, and other updates and fixes.  All
  of the details are in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (169 commits)
  cdc-acm: beautify probe()
  cdc-wdm: use the common CDC parser
  cdc-acm: cleanup error handling
  cdc-acm: use the common parser
  usbnet: move the CDC parser into USB core
  usb: musb: sunxi: Simplify dr_mode handling
  usb: musb: sunxi: make unexported symbols static
  usb: musb: cppi41: add dma channel tracepoints
  usb: musb: cppi41: move struct cppi41_dma_channel to header
  usb: musb: cleanup cppi_dma header
  usb: musb: gadget: add usb-request tracepoints
  usb: musb: host: add urb tracepoints
  usb: musb: add tracepoints to dump interrupt events
  usb: musb: add tracepoints for register access
  usb: musb: dsps: use musb register read/write wrappers instead
  usb: musb: switch dev_dbg to tracepoints
  usb: musb: add tracepoints support for debugging
  usb: quirks: Add no-lpm quirk for Elan
  phy: rcar-gen3-usb2: fix mutex_lock calling in interrupt
  phy: rockhip-usb: use devm_add_action_or_reset()
  ...
2016-07-24 17:22:18 -07:00
Linus Torvalds
721413aff2 Merge tag 'tty-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
 "Here is the big tty and serial driver update for 4.8-rc1.

  Lots of good cleanups from Jiri on a number of vt and other tty
  related things, and the normal driver updates.  Full details are in
  the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits)
  tty/serial: atmel: enforce tasklet init and termination sequences
  serial: sh-sci: Stop transfers in sci_shutdown()
  serial: 8250_ingenic: drop #if conditional surrounding earlycon code
  serial: 8250_mtk: drop !defined(MODULE) conditional
  serial: 8250_uniphier: drop !defined(MODULE) conditional
  earlycon: mark earlycon code as __used iif the caller is built-in
  tty/serial/8250: use mctrl_gpio helpers
  serial: mctrl_gpio: enable API usage only for initialized mctrl_gpios struct
  serial: mctrl_gpio: add modem control read routine
  tty/serial/8250: make UART_MCR register access consistent
  serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV
  serial: 8250_dma: Export serial8250_rx_dma_flush()
  dmaengine: hsu: Export hsu_dma_get_status()
  tty: serial: 8250: add CON_CONSDEV to flags
  tty: serial: samsung: add byte-order aware bit functions
  tty: serial: samsung: fixup accessors for endian
  serial: sirf: make fifo functions static
  serial: mps2-uart: make driver explicitly non-modular
  serial: mvebu-uart: free the IRQ in ->shutdown()
  serial/bcm63xx_uart: use correct alias naming
  ...
2016-07-24 17:14:37 -07:00
Linus Torvalds
25a0dc4be8 Merge tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver updates from Greg KH:
 "Here is the big Staging and IIO driver update for 4.8-rc1.

  We ended up adding more code than removing, again, but it's not all
  that bad.  Lots of cleanups all over the staging tree, and new IIO
  drivers, full details in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits)
  drivers:iio:accel:mma8452: removed unwanted return statements
  drivers:iio:accel:mma8452: added cleanup provision in case of failure.
  iio: Add iio.git tree to MAINTAINERS
  iio:st_pressure: clean useless static channel initializers
  iio:st_pressure:lps22hb: temperature support
  iio:st_pressure:lps22hb: open drain support
  iio:st_pressure: temperature triggered buffering
  iio:st_pressure: document sampling gains
  iio:st_pressure: align storagebits on power of 2
  iio:st_sensors: align on storagebits boundaries
  staging:iio:lis3l02dq drop separate driver
  iio: accel: st_accel: Add lis3l02dq support
  iio: adc: add missing of_node references to iio_dev
  iio: adc: ti-ads1015: add indio_dev->dev.of_node reference
  iio: potentiometer: Fix typo in Kconfig
  iio: potentiometer: mcp4531: Add device tree binding
  iio: potentiometer: mcp4531: Add device tree binding documentation
  iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x
  iio:imu:mpu6050: icm20608 initial support
  iio: adc: max1363: Add device tree binding
  ...
2016-07-24 16:55:23 -07:00
Linus Torvalds
9d0be76f52 Merge tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver update for 4.8-rc1.

  Not a lot of stuff, but it's all over the place, full details are in
  the shortlog.  All of these have been in linux-next with no reported
  issues for a while"

* tag 'char-misc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (49 commits)
  lkdtm: silence warnings about function declarations
  lkdtm: hide unused functions
  intel_th: pci: Add Kaby Lake PCH-H support
  intel_th: Fix a deadlock in modprobing
  dsp56k: prevent a harmless underflow
  chardev: add missing line break in pr_warn
  lkdtm: use struct arrays instead of enums
  lkdtm: move jprobe entry points to start of source
  lkdtm: reorganize module paramaters
  lkdtm: rename globals for clarity
  lkdtm: rename "count" to "crash_count"
  lkdtm: remove intentional off-by-one array access
  lkdtm: split remaining logic bug tests to separate file
  lkdtm: split heap corruption tests to separate file
  lkdtm: split memory permissions tests to separate file
  lkdtm: split usercopy tests to separate file
  lkdtm: drop "alloc_size" parameter
  lkdtm: add usercopy test for blocking kernel text
  extcon: adc-jack: add suspend/resume support
  extcon: add missing of_node_put after calling of_parse_phandle
  ...
2016-07-24 16:26:26 -07:00
Linus Torvalds
b403f23044 Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
 "We've got ten patches this time, half of which are related to a
  plethora of nasty outcomes when inodes are transitioned from the
  unlinked state to the free state.  Small file systems are particularly
  vulnerable to these problems, and it can manifest as mainly hangs, but
  also file system corruption.  The patches have been tested for
  literally many weeks, with a very gruelling test, so I have a high
  level of confidence.

   - Andreas Gruenbacher wrote a series of five patches for various
     lockups during the transition of inodes from unlinked to free.

     The main patch is titled "Fix gfs2_lookup_by_inum lock inversion"
     and the other four are support and cleanup patches related to that.

   - Ben Marzinski contributed two patches with regard to a recreatable
     problem when gfs2 tries to write a page to a file that is being
     truncated, resulting in a BUG() in gfs2_remove_from_journal.

     Note that Ben had to export vfs function __block_write_full_page to
     get this to work properly.  It's been posted a long time and he
     talked to various VFS people about it, and nobody seemed to mind.

   - I contributed 3 patches:
       o The first one fixes a memory corruptor: a race in which one
         process can overwrite the gl_object pointer set by another
         process, causing kernel panic and other symptoms.
       o The second patch fixes another race that resulted in a
         false-positive BUG_ON.  This occurred when resource group
         reservations were freed by one process while another process
         was trying to grab a new reservation in the same resource
         group.
       o The third patch fixes a problem with doing journal replay when
         the journals are not all the same size"

* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes
  GFS2: Check rs_free with rd_rsspin protection
  gfs2: writeout truncated pages
  fs: export __block_write_full_page
  gfs2: Lock holder cleanup
  gfs2: Large-filesystem fix for 32-bit systems
  gfs2: Get rid of gfs2_ilookup
  gfs2: Fix gfs2_lookup_by_inum lock inversion
  gfs2: Initialize iopen glock holder for new inodes
  GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
2016-07-24 16:07:52 -07:00
Trond Myklebust
1592c4d62a Merge branch 'nfs-rdma' 2016-07-24 17:09:02 -04:00
Trond Myklebust
362745268c Merge branch 'writeback' 2016-07-24 17:08:31 -04:00
Trond Myklebust
7f94ed2495 Merge branch 'sunrpc' 2016-07-24 17:08:31 -04:00
Mark Brown
9a4506b60d Merge remote-tracking branches 'spi/topic/pxa2xx', 'spi/topic/rockchip', 'spi/topic/s3c64xx', 'spi/topic/sh' and 'spi/topic/sh-msiof' into spi-next 2016-07-24 22:08:25 +01:00
Mark Brown
e350817b7c Merge remote-tracking branches 'spi/topic/flash-dma', 'spi/topic/imx', 'spi/topic/loopback', 'spi/topic/maintainers' and 'spi/topic/mpc52xx-psc' into spi-next 2016-07-24 22:08:20 +01:00
Mark Brown
3ceeda1cbe Merge remote-tracking branches 'asoc/topic/cs53l30', 'asoc/topic/cygnus', 'asoc/topic/da7219' and 'asoc/topic/davinci' into asoc-next 2016-07-24 22:07:31 +01:00
Mark Brown
72a04d6b60 Merge remote-tracking branches 'asoc/topic/adau', 'asoc/topic/adau7002', 'asoc/topic/adsp', 'asoc/topic/ak4613' and 'asoc/topic/ak4642' into asoc-next 2016-07-24 22:07:24 +01:00
Miklos Szeredi
285b102d3b vfs: new d_init method
Allow filesystem to initialize dentry at allocation time.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24 16:36:29 -04:00
Dan Williams
0606263f24 Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next 2016-07-24 08:05:44 -07:00
David S. Miller
de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Vishal Verma
37b137ff8c nfit, libnvdimm: allow an ARS scrub to be triggered on demand
Normally, an ARS (Address Range Scrub) only happens at
boot/initialization time. There can however arise situations where a
bus-wide rescan is needed - notably, in the case of discovering a latent
media error, we should do a full rescan to figure out what other sectors
are bad, and thus potentially avoid triggering an mce on them in the
future. Also provide a sysfs trigger to start a bus-wide scrub.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-23 21:51:42 -07:00
Linus Torvalds
107df03203 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix memory leak in nftables, from Liping Zhang.

 2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
    risk NULL skb derefs, from Sven Eckelmann.

 3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
    Greenman.

 4) Handle properly when we have ppp_unregister_channel() happening in
    parallel with ppp_connect_channel(), from WANG Cong.

 5) Fix DCCP deadlock, from Eric Dumazet.

 6) Bail out properly in UDP if sk_filter() truncates the packet to be
    smaller than even the space that the protocol headers need.  From
    Michal Kubecek.

 7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.

 8) Make TCP challenge ACKs less predictable, from Eric Dumazet.

 9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  packet: propagate sock_cmsg_send() error
  net/mlx5e: Fix del vxlan port command buffer memset
  packet: fix second argument of sock_tx_timestamp()
  net: switchdev: change ageing_time type to clock_t
  Update maintainer for EHEA driver.
  net/mlx4_en: Add resilience in low memory systems
  net/mlx4_en: Move filters cleanup to a proper location
  sctp: load transport header after sk_filter
  net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
  net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
  net: nb8800: Fix SKB leak in nb8800_receive()
  et131x: Fix logical vs bitwise check in et131x_tx_timeout()
  vlan: use a valid default mtu value for vlan over macsec
  net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
  mlxsw: spectrum: Prevent invalid ingress buffer mapping
  mlxsw: spectrum: Prevent overwrite of DCB capability fields
  mlxsw: spectrum: Don't emit errors when PFC is disabled
  mlxsw: spectrum: Indicate support for autonegotiation
  mlxsw: spectrum: Force link training according to admin state
  r8152: add MODULE_VERSION
  ...
2016-07-23 15:44:31 +09:00
Andrey Ryabinin
3cb9185c67 radix-tree: fix radix_tree_iter_retry() for tagged iterators.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:

  RIP: radix_tree_next_slot include/linux/radix-tree.h:473
    find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
  ....
  Call Trace:
    pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
    mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
    ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
    do_writepages+0x97/0x100 mm/page-writeback.c:2364
    __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
    filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
    ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
    vfs_fsync_range+0x10a/0x250 fs/sync.c:195
    vfs_fsync fs/sync.c:209
    do_fsync+0x42/0x70 fs/sync.c:219
    SYSC_fdatasync fs/sync.c:232
    SyS_fdatasync+0x19/0x20 fs/sync.c:230
    entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207

We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().

Fixes: 46437f9a55 ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23 10:25:54 +09:00
Johannes Weiner
73f576c04b mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears.  At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild.  Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.

Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.

Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later.  They pose no hurdle.

Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages.  And those
references are under the user's control, so they are manageable.

This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that.  This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.

This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:

  set -e
  mkdir -p pages
  for x in `seq 128000`; do
    [ $((x % 1000)) -eq 0 ] && echo $x
    mkdir /cgroup/foo
    echo $$ >/cgroup/foo/cgroup.procs
    echo trex >pages/$x
    echo $$ >/cgroup/cgroup.procs
    rmdir /cgroup/foo
  done

When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:

  [root@ham ~]# ./cssidstress.sh
  [...]
  65000
  mkdir: cannot create directory '/cgroup/foo': No space left on device

After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.

[hannes@cmpxchg.org: init the IDR]
  Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e6 ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org>	[3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23 10:25:54 +09:00
Sebastian Andrzej Siewior
cd894f1497 leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
There is no need the ledtriger to be called *that* early in the hotplug
process (+ with disabled interrupts). As explained by Jacek Anaszewski [0]
there is no need for it.
Therefore this patch moves it to the ONLINE/PREPARE_DOWN level using the
dynamic registration for the id.

[0] https://lkml.kernel.org/r/578C92BC.2070603@samsung.com

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: rt@linutronix.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-leds@vger.kernel.org
Link: http://lkml.kernel.org/r/1469028295-14702-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-22 21:53:18 +02:00
Sebastian Andrzej Siewior
bdab88e006 powerpc/numa: Convert to hotplug state machine
Install the callbacks via the state machine. On the boot cpu the callback is
invoked manually because cpuhp is not up yet and everything must be
preinitialized before additional CPUs are up.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160718140727.GA13132@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-22 21:53:17 +02:00
Radim Krčmář
912902ce78 Merge tag 'kvm-arm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next
KVM/ARM changes for Linux 4.8

- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation
2016-07-22 20:27:26 +02:00
Eric Auger
995a0ee980 KVM: arm/arm64: Enable MSI routing
Up to now, only irqchip routing entries could be set. This patch
adds the capability to insert MSI routing entries.

For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this
include SPI irqchip routes plus MSI routes. In the future this
might be extended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22 18:52:03 +01:00
Eric Auger
d9565a7399 KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
kvm_setup_default_irq_routing and kvm_setup_empty_irq_routing are
not used by generic code. So let's move the declarations in x86 irq.h
header instead of kvm_host.h.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22 18:52:00 +01:00
Eric Auger
0455e72c9a KVM: Add devid in kvm_kernel_irq_routing_entry
Enhance kvm_kernel_irq_routing_entry to transport the device id
field, devid. A new flags field makes possible to indicate the
devid is valid. Those additions are used for ARM GICv3 ITS MSI
injection. The original struct msi_msg msi field is replaced by
a new custom structure that embeds the new fields.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-22 18:51:56 +01:00
Jean Delvare
1ab0a1192d i2c: i2c-smbus: drop useless stubs
Drivers which use the SMBus extensions select I2C_SMBUS, so the
stubs are not needed.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-22 09:07:02 +02:00
Dan Williams
bc9775d869 libnvdimm: move ->module to struct nvdimm_bus_descriptor
Let the provider module be explicitly passed in rather than implicitly
assumed by the module that calls nvdimm_bus_register().  This is in
preparation for unifying the nfit and nfit_test driver teardown paths.

Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-21 20:03:19 -07:00
Sudeep Holla
220276e09b cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
The function arm_enter_idle_state is exactly the same in both generic
ARM{32,64} CPUIdle driver and will be the same even on ARM64 backend
for ACPI processor idle driver. So we can unify it and move it to a
common place by introducing CPU_PM_CPU_IDLE_ENTER macro that can be
used in all places avoiding duplication.

This is in preparation of reuse of the generic cpuidle entry function
for ACPI LPI support on ARM64.

Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21 23:29:38 +02:00
Sudeep Holla
a36a7fecfe ACPI / processor_idle: Add support for Low Power Idle(LPI) states
ACPI 6.0 introduced an optional object _LPI that provides an alternate
method to describe Low Power Idle states. It defines the local power
states for each node in a hierarchical processor topology. The OSPM can
use _LPI object to select a local power state for each level of processor
hierarchy in the system. They used to produce a composite power state
request that is presented to the platform by the OSPM.

Since multiple processors affect the idle state for any non-leaf hierarchy
node, coordination of idle state requests between the processors is
required. ACPI supports two different coordination schemes: Platform
coordinated and  OS initiated.

This patch adds initial support for Platform coordination scheme of LPI.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21 23:25:58 +02:00
Christoph Hellwig
4ef33685aa PCI: Spread interrupt vectors in pci_alloc_irq_vectors()
Set the affinity_mask in the PCI device before allocating vectors so that
the affinity can be propagated through the MSI descriptor structures to the
core IRQ code.  To facilitate this, new __pci_enable_msi_range() and
__pci_enable_msix_range() helpers are factored out of their not prefixed
variants which assigning the new IRQ affinity mask in the PCI device so
that the low-level interrupt code can perform the interrupt affinity
assignment and do node-local allocations.

A new PCI_IRQ_NOAFFINITY flag is added to pci_alloc_irq_vectors() so that
this function can also be used by drivers that don't wish to use the
automatic affinity assignment.

[bhelgaas: omit "else" after "return" consistently]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
2016-07-21 15:57:03 -05:00
Christoph Hellwig
aff171641d PCI: Provide sensible IRQ vector alloc/free routines
Add a function to allocate and free a range of interrupt vectors, using
MSI-X, MSI or legacy vectors (in that order) based on the capabilities of
the underlying device and PCIe complex.

Additionally a new helper is provided to get the Linux IRQ number for given
device-relative vector so that the drivers don't need to allocate their own
arrays to keep track of the vectors for the multi vector MSI-X case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
2016-07-21 15:50:07 -05:00
Steve Muckle
e3c0623608 cpufreq: add cpufreq_driver_resolve_freq()
Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq(). This API returns the lowest driver
frequency equal or greater than the target frequency
(CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
limitations. The mapping is also cached in the policy so that a
subsequent fast_switch operation can avoid repeating the same lookup.

The API will call a new cpufreq driver callback, resolve_freq(), if it
has been registered by the driver. Otherwise the frequency is resolved
via cpufreq_frequency_table_target(). Rather than require ->target()
style drivers to provide a resolve_freq() callback it is left to the
caller to ensure that the driver implements this callback if necessary
to use cpufreq_driver_resolve_freq().

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21 14:46:08 +02:00
Toshi Kani
545ed20e6d dm: add infrastructure for DAX support
Change mapped device to implement direct_access function,
dm_blk_direct_access(), which calls a target direct_access function.
'struct target_type' is extended to have target direct_access interface.
This function limits direct accessible size to the dm_target's limit
with max_io_len().

Add dm_table_supports_dax() to iterate all targets and associated block
devices to check for DAX support.  To add DAX support to a DM target the
target must only implement the direct_access function.

Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
device supports DAX and is bio based.  This new type is used to assure
that all target devices have DAX support and remain that way after
QUEUE_FLAG_DAX is set in mapped device.

At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
mapped device must have the same type, or else it fails per the check in
table_load().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:49 -04:00
Mike Snitzer
e9ccb945c4 Merge remote-tracking branch 'jens/for-4.8/core' into dm-4.8
DM's DAX support depends on block core's newly added QUEUE_FLAG_DAX.
2016-07-20 23:48:25 -04:00
Damien Le Moal
17007f3994 block: Fix front merge check
For a front merge, the maximum number of sectors of the
request must be checked against the front merge BIO sector,
not the current sector of the request.

Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 21:40:47 -06:00
Tahsin Erdogan
72ef799b3f block: do not merge requests without consulting with io scheduler
Before merging a bio into an existing request, io scheduler is called to
get its approval first. However, the requests that come from a plug
flush may get merged by block layer without consulting with io
scheduler.

In case of CFQ, this can cause fairness problems. For instance, if a
request gets merged into a low weight cgroup's request, high weight cgroup
now will depend on low weight cgroup to get scheduled. If high weigt cgroup
needs that io request to complete before submitting more requests, then it
will also lose its timeslice.

Following script demonstrates the problem. Group g1 has a low weight, g2
and g3 have equal high weights but g2's requests are adjacent to g1's
requests so they are subject to merging. Due to these merges, g2 gets
poor disk time allocation.

cat > cfq-merge-repro.sh << "EOF"
#!/bin/bash
set -e

IO_ROOT=/mnt-cgroup/io

mkdir -p $IO_ROOT

if ! mount | grep -qw $IO_ROOT; then
  mount -t cgroup none -oblkio $IO_ROOT
fi

cd $IO_ROOT

for i in g1 g2 g3; do
  if [ -d $i ]; then
    rmdir $i
  fi
done

mkdir g1 && echo 10 > g1/blkio.weight
mkdir g2 && echo 495 > g2/blkio.weight
mkdir g3 && echo 495 > g3/blkio.weight

RUNTIME=10

(echo $BASHPID > g1/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=0k &> /dev/null)&

(echo $BASHPID > g2/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=64k &> /dev/null)&

(echo $BASHPID > g3/cgroup.procs &&
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=256k &> /dev/null)&

sleep $((RUNTIME+1))

for i in g1 g2 g3; do
  echo ---- $i ----
  cat $i/blkio.time
done

EOF
# ./cfq-merge-repro.sh
---- g1 ----
8:16 162
---- g2 ----
8:16 165
---- g3 ----
8:16 686

After applying the patch:

# ./cfq-merge-repro.sh
---- g1 ----
8:16 90
---- g2 ----
8:16 445
---- g3 ----
8:16 471

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 21:35:12 -06:00
Al Viro
9aba36dea5 qstr constify instances in fs/dcache.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-20 23:30:06 -04:00
Al Viro
beffb8feb6 qstr: constify instances in nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-20 23:30:06 -04:00
Al Viro
4f3ccd7657 qstr: constify dentry_init_security
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-20 23:30:06 -04:00
Toshi Kani
163d4baaeb block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Currently, presence of direct_access() in block_device_operations
indicates support of DAX on its block device.  Because
block_device_operations is instantiated with 'const', this DAX
capablity may not be enabled conditinally.

In preparation for supporting DAX to device-mapper devices, add
QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
support.  This will allow to set the DAX capability based on how
mapped device is composed.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <linux-s390@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 21:01:01 -06:00
Christoph Hellwig
4613c5f1df scsi/osd: open code blk_make_request
I wish the OSD code could simply use blk_rq_map_* helpers like
everyone else, but the complex nature of deciding if we have
DATA IN and/or DATA OUT buffers might make this impossible
(at least for a mere human like me).

But using blk_rq_append_bio at least allows sharing the setup code
between request with or without dat a buffers, and given that this
is the last user of blk_make_request it allows getting rid of that
somewhat awkward interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:38:35 -06:00
Christoph Hellwig
98d61d5b1a block: simplify and export blk_rq_append_bio
The target SCSI passthrough backend is much better served with the low-level
blk_rq_append_bio construct then the helpers built on top of it, so export it.

Also use the opportunity to remove the pointless request_queue argument and
make the code flow a little more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:38:32 -06:00
Christoph Hellwig
c0acf12a50 block: shrink bio size again
The recent ops split grew the bio by adding the new ioprio field.
Shrink it again by using a 16-bit field for the bi_flags value and
filling the holes near the beginning of the structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:04 -06:00
Christoph Hellwig
ed996a52c8 block: simplify and cleanup bvec pool handling
Instead of a flag and an index just make sure an index of 0 means
no need to free the bvec array.  Also move the constants related
to the bvec pools together and use a consistent naming scheme for
them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:02 -06:00
Christoph Hellwig
70246286e9 block: get rid of bio_rw and READA
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:01 -06:00
Christoph Hellwig
e950fdf71c block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes.  Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.

This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:35:20 -06:00