Commit Graph

753602 Commits

Author SHA1 Message Date
Dan Carpenter
c37a3c9477 xen/acpi: off by one in read_acpi_id()
If acpi_id is == nr_acpi_bits, then we access one element beyond the end
of the acpi_psd[] array or we set one bit beyond the end of the bit map
when we do __set_bit(acpi_id, acpi_id_present);

Fixes: 59a5680291 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-03-30 11:30:13 -04:00
David S. Miller
b9a1260154 Merge branch 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations'
Kirill Tkhai says:

====================
Close race between {un, }register_netdevice_notifier and pernet_operations

the problem is {,un}register_netdevice_notifier() do not take
pernet_ops_rwsem, and they don't see network namespaces, being
initialized in setup_net() and cleanup_net(), since at this
time net is not hashed to net_namespace_list.

This may lead to imbalance, when a notifier is called at time of
setup_net()/net is alive, but it's not called at time of cleanup_net(),
for the devices, hashed to the net, and vise versa. See (3/3) for
the scheme of imbalance.

This patchset fixes the problem by acquiring pernet_ops_rwsem
at the time of {,un}register_netdevice_notifier() (3/3).
(1-2/3) are preparations in xfrm and netfilter subsystems.

The problem was introduced a long ago, but backporting won't be easy,
since every previous kernel version may have changes in netdevice
notifiers, and they all need review and testing. Otherwise, there
may be more pernet_operations, which register or unregister
netdevice notifiers, and that leads to deadlock (which is was fixed
in 1-2/3). This patchset is for net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:59:46 -04:00
Kirill Tkhai
328fbe747a net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net()
{un,}register_netdevice_notifier() iterate over all net namespaces
hashed to net_namespace_list. But pernet_operations register and
unregister netdevices in unhashed net namespace, and they are not
seen for netdevice notifiers. This results in asymmetry:

1)Race with register_netdevice_notifier()
  pernet_operations::init(net)	...
   register_netdevice()		...
    call_netdevice_notifiers()  ...
      ... nb is not called ...
  ...				register_netdevice_notifier(nb) -> net skipped
  ...				...
  list_add_tail(&net->list, ..) ...

  Then, userspace stops using net, and it's destructed:

  pernet_operations::exit(net)
   unregister_netdevice()
    call_netdevice_notifiers()
      ... nb is called ...

This always happens with net::loopback_dev, but it may be not the only device.

2)Race with unregister_netdevice_notifier()
  pernet_operations::init(net)
   register_netdevice()
    call_netdevice_notifiers()
      ... nb is called ...

  Then, userspace stops using net, and it's destructed:

  list_del_rcu(&net->list)	...
  pernet_operations::exit(net)  unregister_netdevice_notifier(nb) -> net skipped
   dev_change_net_namespace()	...
    call_netdevice_notifiers()
      ... nb is not called ...
   unregister_netdevice()
    call_netdevice_notifiers()
      ... nb is not called ...

This race is more danger, since dev_change_net_namespace() moves real
network devices, which use not trivial netdevice notifiers, and if this
will happen, the system will be left in unpredictable state.

The patch closes the race. During the testing I found two places,
where register_netdevice_notifier() is called from pernet init/exit
methods (which led to deadlock) and fixed them (see previous patches).

The review moved me to one more unusual registration place:
raw_init() (can driver). It may be a reason of problems,
if someone creates in-kernel CAN_RAW sockets, since they
will be destroyed in exit method and raw_release()
will call unregister_netdevice_notifier(). But grep over
kernel tree does not show, someone creates such sockets
from kernel space.

Theoretically, there can be more places like this, and which are
hidden from review, but we found them on the first bumping there
(since there is no a race, it will be 100% reproducible).

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:59:35 -04:00
Kirill Tkhai
9e2f6c5d78 netfilter: Rework xt_TEE netdevice notifier
Register netdevice notifier for every iptable entry
is not good, since this breaks modularity, and
the hidden synchronization is based on rtnl_lock().

This patch reworks the synchronization via new lock,
while the rest of logic remains as it was before.
This is required for the next patch.

Tested via:

while :; do
	unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo;
done

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:59:23 -04:00
Kirill Tkhai
e9a441b6e7 xfrm: Register xfrm_dev_notifier in appropriate place
Currently, driver registers it from pernet_operations::init method,
and this breaks modularity, because initialization of net namespace
and netdevice notifiers are orthogonal actions. We don't have
per-namespace netdevice notifiers; all of them are global for all
devices in all namespaces.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:59:23 -04:00
davidwang
e3a2d2be51 hwmon: (via-cputemp) support new centaur CPUs
New centaur CPUs (Familiy == 7) also support this cpu temperature sensor.

Signed-off-by: David Wang <davidwang@zhaoxin.com>
[groeck: Dropped changelog, updated subject]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-03-30 07:53:57 -07:00
David S. Miller
caeeeda344 Merge branch 'Implement-of_get_nvmem_mac_address-helper'
Mike Looijmans says:

====================
of_net: Implement of_get_nvmem_mac_address helper

Posted this as a small set now, with an (optional) second patch that shows
how the changes work and what I've used to test the code on a Topic Miami board.
I've taken the liberty to add appropriate "Acked" and "Review" tags.

v4: Replaced "6" with ETH_ALEN

v3: Add patch that implements mac in nvmem for the Cadence MACB controller
    Remove the integrated of_get_mac_address call

v2: Use of_nvmem_cell_get to avoid needing the assiciated device
    Use void* instead of char*
    Add devicetree binding doc
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:40:19 -04:00
Mike Looijmans
aa076e3d22 net: macb: Try to retrieve MAC addess from nvmem provider
Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
cell, if one is provided in the device tree. This allows the address to
be stored in an I2C EEPROM device for example.

Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:40:18 -04:00
Mike Looijmans
9217e566bd of_net: Implement of_get_nvmem_mac_address helper
It's common practice to store MAC addresses for network interfaces into
nvmem devices. However the code to actually do this in the kernel lacks,
so this patch adds of_get_nvmem_mac_address() for drivers to obtain the
address from an nvmem cell provider.

This is particulary useful on devices where the ethernet interface cannot
be configured by the bootloader, for example because it's in an FPGA.

Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:40:18 -04:00
David S. Miller
64e828dfc4 Merge branch 'nfp-flower-handle-MTU-changes'
Jakub Kicinski says:

====================
nfp: flower: handle MTU changes

This set improves MTU handling for flower offload.  The max MTU is
correctly capped and physical port MTU is communicated to the FW
(and indirectly HW).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:18:55 -04:00
John Hurley
29a5dcae27 nfp: flower: offload phys port MTU change
Trigger a port mod message to request an MTU change on the NIC when any
physical port representor is assigned a new MTU value. The driver waits
10 msec for an ack that the FW has set the MTU. If no ack is received the
request is rejected and an appropriate warning flagged.

Rather than maintain an MTU queue per repr, one is maintained per app.
Because the MTU ndo is protected by the rtnl lock, there can never be
contention here. Portmod messages from the NIC are also protected by
rtnl so we first check if the portmod is an ack and, if so, handle outside
rtnl and the cmsg work queue.

Acks are detected by the marking of a bit in a portmod response. They are
then verfied by checking the port number and MTU value expected by the
app. If the expected MTU is 0 then no acks are currently expected.

Also, ensure that the packet headroom reserved by the flower firmware is
considered when accepting an MTU change on any repr.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:18:55 -04:00
John Hurley
167cebeffa nfp: modify app MTU setting callbacks
Rename the 'change_mtu' app callback to 'check_mtu'. This is called
whenever an MTU change is requested on a netdev. It can reject the
change but is not responsible for implementing it.

Introduce a new 'repr_change_mtu' app callback that is hit when the MTU
of a repr is to be changed. This is responsible for performing the MTU
change and verifying it.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:18:54 -04:00
David S. Miller
44465c47a4 Merge branch 'phylink-API-changes'
Florian Fainelli says:

====================
phylink: API changes

This patch series contains two API changes to PHYLINK which will later be used
by DSA to migrate to PHYLINK. Because these are API changes that impact other
outstanding work (e.g: MVPP2) I would rather get them included sooner to minimize
conflicts.

Thank you!

Changes in v2:

- added missing documentation to mac_link_{up,down} that the interface
  must be configured in mac_config()

- added Russell's, Andrew's and my tags
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:11:07 -04:00
Russell King
e679c9c1db sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
Provide a pointer to the SFP bus in struct net_device, so that the
ethtool module EEPROM methods can access the SFP directly, rather
than needing every user to provide a hook for it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:11:06 -04:00
Florian Fainelli
c6ab3008b6 net: phy: phylink: Provide PHY interface to mac_link_{up, down}
In preparation for having DSA transition entirely to PHYLINK, we need to pass a
PHY interface type to the mac_link_{up,down} callbacks because we may have to
make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass
an entire phylink_link_state because not all parameters (pause, duplex etc.) are
defined when the link is down, only link and interface are.

Update mvneta accordingly since it currently implements phylink_mac_ops.

Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:11:06 -04:00
Colin Ian King
a9645b273e atm: iphase: fix spelling mistake: "Receiverd" -> "Received"
Trivial fix to spelling mistake in message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:10:04 -04:00
Ronak Doshi
2166dc9571 MAINTAINERS: update vmxnet3 driver maintainer
Shrikrishna Khare would no longer maintain the vmxnet3 driver. Taking
over the role of vmxnet3 maintainer.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:05:32 -04:00
David S. Miller
95e623fd98 Merge branch 'net-Broadcom-drivers-coalescing-fixes'
Florian Fainelli says:

====================
net: Broadcom drivers coalescing fixes

Following Tal's review of the adaptive RX/TX coalescing feature added to the
SYSTEMPORT and GENET driver a number of things showed up:

- adaptive TX coalescing is not actually a good idea with the current way
  the estimator will program the ring, this results in a higher CPU load, NAPI
  on TX already does a reasonably good job at maintaining the interrupt count low

- both SYSTEMPORT and GENET would suffer from the same issues while configuring
  coalescing parameters where the values would just not be applied correctly
  based on user settings, so we fix that too

Tal, thanks again for your feedback, I would appreciate if you could review that
the new behavior appears to be implemented correctly.

Thanks!

Changes in v2:

- added Tal's reviewed-by to the first patch
- split DIM initialization from coalescing parameters initialization
- avoid duplicating the same code in bcmgenet_set_coalesce() when configuring RX rings
- fixed the condition where default DIM parameters would be applied when
  adaptive RX coalescing would be enabled, do this only if it was disabled before
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:03:37 -04:00
Florian Fainelli
5e6ce1f1a4 net: bcmgenet: Fix coalescing settings handling
There were a number of issues with setting the RX coalescing parameters:

- we would not be preserving values that would have been configured
  across close/open calls, instead we would always reset to no timeout
  and 1 interrupt per packet, this would also prevent DIM from setting its
  default usec/pkts values

- when adaptive RX would be turned on, we woud not be fetching the
  default parameters, we would stay with no timeout/1 packet per interrupt
  until the estimator kicks in and changes that

- finally disabling adaptive RX coalescing while providing parameters
  would not be honored, and we would stay with whatever DIM had previously
  determined instead of the user requested parameters

Fixes: 9f4ca05827 ("net: bcmgenet: Add support for adaptive RX coalescing")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:03:36 -04:00
Florian Fainelli
a8cdfbdf88 net: systemport: Fix coalescing settings handling
There were a number of issues with setting the RX coalescing parameters:

- we would not be preserving values that would have been configured
  across close/open calls, instead we would always reset to no timeout
  and 1 interrupt per packet, this would also prevent DIM from setting its
  default usec/pkts values

- when adaptive RX would be turned on, we woud not be fetching the
  default parameters, we would stay with no timeout/1 packet per
  interrupt until the estimator kicks in and changes that

- finally disabling adaptive RX coalescing while providing parameters
  would not be honored, and we would stay with whatever DIM had
  previously determined instead of the user requested parameters

Fixes: b6e0e87542 ("net: systemport: Implement adaptive interrupt coalescing")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:03:36 -04:00
Florian Fainelli
fd41f2bfb7 net: systemport: Remove adaptive TX coalescing
Adaptive TX coalescing is not currently giving us any advantages and
ends up making the CPU spin more frequently until TX completion. Deny
and disable adaptive TX coalescing for now and rely on static
configuration, we can always add it back later.

Reviewed-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 10:03:35 -04:00
Gal Pressman
9daae9bd47 net: Call add/kill vid ndo on vlan filter feature toggling
NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit
flip in dev->features in order to keep the driver in a consistent state.
These features notify the driver of each added/removed vlan, but toggling
of vlan-filter does not notify the driver accordingly for each of the
existing vlans.

This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT
behavior (which notifies the driver about UDP ports in the same manner
that vids are reported).

Each toggling of the features propagates to the 8021q module, which
iterates over the vlans and call add/kill ndo accordingly.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 09:58:59 -04:00
Wei Yongjun
004c3cf1a1 cxgb4: fix error return code in adap_init0()
Fix to return a negative error code from the hash filter init error
handling case instead of 0, as done elsewhere in this function.

Fixes: 5c31254e35 ("cxgb4: initialize hash-filter configuration")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 09:44:29 -04:00
Aneesh Kumar K.V
872a100a49 powerpc/mm/hash: Don't memset pgd table if not needed
We need to zero-out pgd table only if we share the slab cache with
pud/pmd level caches. With the support of 4PB, we don't share the slab
cache anymore. Instead of removing the code completely hide it within
an #ifdef. We don't need to do this with any other page table level,
because they all allocate table of double the size and we take of
initializing the first half corrrectly during page table zap.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Consolidate multiple #if / #ifdef into one]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:39 +11:00
Aneesh Kumar K.V
c2b4d8b741 powerpc/mm/hash64: Increase the VA range
This patch increases the max virtual (effective) address value to 4PB.
With 4K page size config we continue to limit ourself to 64TB.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Keep the H_PGTABLE_RANGE test, update it to work]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V
f384796c40 powerpc/mm: Add support for handling > 512TB address in SLB miss
For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.

The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
      in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V
0dea04b288 powerpc/mm/slice: Consolidate return path in slice_get_unmapped_area()
In a following patch, on finding a free area we will need to do
allocatinon of extra contexts as needed. Consolidating the return path
for slice_get_unmapped_area() will make that easier.

Split into a separate patch to make review easy.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:37 +11:00
Aneesh Kumar K.V
1a2f778970 powerpc/mm/keys: Move pte bits to correct headers
Memory keys are supported only with hash translation mode. Instead of
using #ifdef in generic code move the key related pte bits to
respective headers

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:36 +11:00
Frederic Barrat
16b19f1a03 powerpc/xive: Fix wrong xmon output caused by typo
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:36 +11:00
Aaro Koskinen
e283655b5a drivers: macintosh: rack-meter: really fix bogus memsets
We should zero an array using sizeof instead of number of elements.

Fixes the following compiler (GCC 7.3.0) warnings:

drivers/macintosh/rack-meter.c: In function 'rackmeter_do_pause':
drivers/macintosh/rack-meter.c:157:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
drivers/macintosh/rack-meter.c:158:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]

Fixes: 4f7bef7a9f ("drivers: macintosh: rack-meter: fix bogus memsets")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:35 +11:00
Nicholas Piggin
0bfdf59890 powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
asm/barrier.h is not always included after asm/synch.h, which meant
it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would
be eieio when it should be lwsync. kernel/time/hrtimer.c is one case.

__SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in
to where it's used. Previously with my small simulator config, 377
instances of eieio in the tree. After this patch there are 55.

Fixes: 46d075be58 ("powerpc: Optimise smp_wmb")
Cc: stable@vger.kernel.org # v2.6.29+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:34 +11:00
Wei Yongjun
9a2c1d31e6 powerpc/4xx: Fix error return code in ppc4xx_msi_probe()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
[mpe: Add missing ';' to make it compile]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:34 +11:00
Ram Pai
f208638680 powerpc/mm: Fix thread_pkey_regs_init()
thread_pkey_regs_init() initializes the pkey related registers
instead of initializing the fields in the task structures.  Fortunately
those key related registers are re-set to zero when the task
gets scheduled on the cpu. However its good to fix this glaringly
visible error.

Fixes: 06bb53b338 ("powerpc: store and restore the pkey state across context switches")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:33 +11:00
Naveen N. Rao
e6e133c47e powerpc/kprobes: Fix call trace due to incorrect preempt count
Michael Ellerman reported the following call trace when running
ftracetest:

  BUG: using __this_cpu_write() in preemptible [00000000] code: ftracetest/6178
  caller is opt_pre_handler+0xc4/0x110
  CPU: 1 PID: 6178 Comm: ftracetest Not tainted 4.15.0-rc7-gcc6x-gb2cd1df #1
  Call Trace:
  [c0000000f9ec39c0] [c000000000ac4304] dump_stack+0xb4/0x100 (unreliable)
  [c0000000f9ec3a00] [c00000000061159c] check_preemption_disabled+0x15c/0x170
  [c0000000f9ec3a90] [c000000000217e84] opt_pre_handler+0xc4/0x110
  [c0000000f9ec3af0] [c00000000004cf68] optimized_callback+0x148/0x170
  [c0000000f9ec3b40] [c00000000004d954] optinsn_slot+0xec/0x10000
  [c0000000f9ec3e30] [c00000000004bae0] kretprobe_trampoline+0x0/0x10

This is showing up since OPTPROBES is now enabled with CONFIG_PREEMPT.

trampoline_probe_handler() considers itself to be a special kprobe
handler for kretprobes. In doing so, it expects to be called from
kprobe_handler() on a trap, and re-enables preemption before returning a
non-zero return value so as to suppress any subsequent processing of the
trap by the kprobe_handler().

However, with optprobes, we don't deal with special handlers (we ignore
the return code) and just try to re-enable preemption causing the above
trace.

To address this, modify trampoline_probe_handler() to not be special.
The only additional processing done in kprobe_handler() is to emulate
the instruction (in this case, a 'nop'). We adjust the value of
regs->nip for the purpose and delegate the job of re-enabling
preemption and resetting current kprobe to the probe handlers
(kprobe_handler() or optimized_callback()).

Fixes: 8a2d71a3f2 ("powerpc/kprobes: Disable preemption before invoking probe handler for optprobes")
Cc: stable@vger.kernel.org # v4.15+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:33 +11:00
Finn Thain
3a52f6f980 macintosh/adb: Use C99 initializers for struct adb_driver instances
No change to object files.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:32 +11:00
Nicholas Piggin
741de61766 powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
opal_nvram_write currently just assumes success if it encounters an
error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
on other errors instead.

Fixes: 628daa8d5a ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:32 +11:00
Mauricio Faria de Oliveira
0f9bdfe3c7 powerpc/pseries: Fix clearing of security feature flags
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field
of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_*
flags.

Found by playing around with QEMU's implementation of the hypercall:

  H_CPU_CHAR=0xf000000000000000
  H_CPU_BEHAV=0x0000000000000000

  This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR
  so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also
  clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush
  mitigation at all for cpu_show_meltdown() to report; but currently
  it does:

  Original kernel:

    # cat /sys/devices/system/cpu/vulnerabilities/meltdown
    Mitigation: RFI Flush

  Patched kernel:

    # cat /sys/devices/system/cpu/vulnerabilities/meltdown
    Not affected

  H_CPU_CHAR=0x0000000000000000
  H_CPU_BEHAV=0xf000000000000000

  This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should
  report vulnerable; but currently it doesn't:

  Original kernel:

    # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
    Not affected

  Patched kernel:

    # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
    Vulnerable

Brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: f636c14790 ("powerpc/pseries: Set or clear security feature flags")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:10:31 +11:00
Nicholas Piggin
29ab6c4708 powerpc/mm: Pass node id into create_section_mapping
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:10 +11:00
Nicholas Piggin
2ad452ffaa powerpc/64s/radix: Allocate kernel page tables node-local if possible
Try to allocate kernel page tables for direct mapping and vmemmap
according to the node of the memory they will map. The node is not
available for the linear map in early boot, so use range allocation
to allocate the page tables from the region they map, which is
effectively node-local.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error in radix__create_section_mapping()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:09 +11:00
Nicholas Piggin
0633dafcf8 powerpc/64s/radix: Split early page table mapping to its own function
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:09 +11:00
Nicholas Piggin
f3865f9a71 powerpc/64: Allocate per-cpu stacks node-local if possible
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:07:08 +11:00
Nicholas Piggin
4890aea65a powerpc/64: Allocate pacas per node
Per-node allocations are possible on 64s with radix that does
not have the bolted SLB limitation.

Hash would be able to do the same if all CPUs had the bottom of
their node-local memory bolted as well. This is left as an
exercise for the reader.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definition of boot_cpuid for !SMP]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31 00:06:44 +11:00
Nicholas Piggin
59f577743d powerpc/64: Defer paca allocation until memory topology is discovered
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the dummy allocate_pacas() to fix 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:28 +11:00
Nicholas Piggin
9f593f131e powerpc/setup: Add cpu_to_phys_id array
Build an array that finds hardware CPU number from logical CPU
number in firmware CPU discovery. Use that rather than setting
paca of other CPUs directly, to begin with. Subsequent patch will
not have pacas allocated at this point.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:27 +11:00
Nicholas Piggin
c0abd0c745 powerpc/64: move default SPR recording
Move this into the early setup code, and don't iterate over CPU masks.
We don't want to call into sysfs so early from setup, and a future patch
won't initialize CPU masks by the time this is called.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in incremental fix from Nick for DSCR handling]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:26 +11:00
Nicholas Piggin
9bd9be006c powerpc/mm/numa: move numa topology discovery earlier
Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:26 +11:00
Nicholas Piggin
b575454fa3 mm: make memblock_alloc_base_nid() non-static
This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop stray change to memblock_alloc_range() as noticed by akpm]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:25 +11:00
Nicholas Piggin
384e806784 powerpc/64s: Allocate slb_shadow structures individually
slb_shadow structures are avoided for radix environment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:24 +11:00
Nicholas Piggin
499dcd4137 powerpc/64s: Allocate LPPACAs individually
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:24 +11:00
Nicholas Piggin
d2e60075a3 powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:23 +11:00