Commit Graph

1881 Commits

Author SHA1 Message Date
Konstantin Weitz
b31288fa83 s390/kvm: support collaborative memory management
This patch enables Collaborative Memory Management (CMM) for kvm
on s390. CMM allows the guest to inform the host about page usage
(see arch/s390/mm/cmm.c). The host uses this information to avoid
swapping in unused pages in the page fault handler. Further, a CPU
provided list of unused invalid pages is processed to reclaim swap
space of not yet accessed unused pages.

[ Martin Schwidefsky: patch reordering and cleanup ]

Signed-off-by: Konstantin Weitz <konstantin.weitz@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:19 +01:00
Martin Schwidefsky
53e857f308 s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries
Git commit 050eef364a "[S390] fix tlb flushing vs. concurrent
/proc accesses" introduced the attach counter to avoid using the
mm_users value to decide between IPTE for every PTE and lazy TLB
flushing with IDTE. That fixed the problem with mm_users but it
introduced another subtle race, fortunately one that is very hard
to hit.
The background is the requirement of the architecture that a valid
PTE may not be changed while it can be used concurrently by another
cpu. The decision between IPTE and lazy TLB flushing needs to be
done while the PTE is still valid. Now if the virtual cpu is
temporarily stopped after the decision to use lazy TLB flushing but
before the invalid bit of the PTE has been set, another cpu can attach
the mm, find that flush_mm is set, do the IDTE, return to userspace,
and recreate a TLB that uses the PTE in question. When the first,
stopped cpu continues it will change the PTE while it is attached on
another cpu. The first cpu will do another IDTE shortly after the
modification of the PTE which makes the race window quite short.

To fix this race the CPU that wants to attach the address space of a
user space thread needs to wait for the end of the PTE modification.
The number of concurrent TLB flushers for an mm is tracked in the
upper 16 bits of the attach_count and finish_arch_post_lock_switch
is used to wait for the end of the flush operation if required.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:18 +01:00
Heiko Carstens
ca04ddbf53 s390/setup: get rid of MACHINE_HAS_MVCOS machine flag
MACHINE_HAS_MVCOS is used exactly once when the machine is brought up.
There is no need to cache the flag in the machine_flags.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:15 +01:00
Heiko Carstens
211deca6bf s390/uaccess: consistent types
The types 'size_t' and 'unsigned long' have been used randomly for the
uaccess functions. This looks rather confusing.
So let's change all functions to use unsigned long instead and get rid
of size_t in order to have a consistent interface.

The only exception is strncpy_from_user() which uses 'long' since it
may return a signed value (-EFAULT).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:15 +01:00
Heiko Carstens
4f41c2b456 s390/uaccess: get rid of indirect function calls
There are only two uaccess variants on s390 left: the version that is used
if the mvcos instruction is available, and the page table walk variant.
So there is no need for expensive indirect function calls.

By default the mvcos variant will be called. If the mvcos instruction is not
available it will call the page table walk variant.

For minimal performance impact the "if (mvcos_is_available)" is implemented
with a jump label, which will be a six byte nop on machines with mvcos.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:14 +01:00
Heiko Carstens
cfa785e623 s390/uaccess: normalize order of parameters of indirect uaccess function calls
For some unknown reason the indirect uaccess functions on s390 implement a
different parameter order than what is usual.

e.g.:

unsigned long copy_to_user(void *to, const void *from, unsigned long n);
vs.
size_t (*copy_to_user)(size_t n, void __user * to, const void *from);

Let's get rid of this confusing parameter reordering.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:13 +01:00
Sebastian Ott
1e53209605 s390/cio: reorder initialization of ccw consoles
Drivers for ccw consoles use ccw_device_probe_console to receive
an initialized ccw device which is already enabled for interrupts.
After that the device driver does the initialization of its private
data. This can race with unsolicited interrupts which can happen
once the device is enabled for interrupts.

Split ccw_device_probe_console into ccw_device_create_console and
ccw_device_enable_console and reorder the initialization of the ccw
console drivers.

While at it mark these functions as __init.

Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:12 +01:00
Sebastian Ott
2253e8d792 s390/cio: fix driver callback initialization for ccw consoles
ccw consoles are in use before they can be properly registered with
the driver core. For devices which are in use by a device driver we
rely on the ccw_device's pointer to the driver callbacks to be valid.
For ccw consoles this pointer is NULL until they are registered later
during boot and we dereferenced this pointer. This worked by
chance on 64 bit builds (cdev->drv was NULL but the optional callback
cdev->drv->path_event was also NULL by coincidence) and was unnoticed
until we received reports about boot failures on 31 bit systems.
Fix it by initializing the driver pointer for ccw consoles.

Cc: <stable@vger.kernel.org> # 3.10+
Reported-by: Mike Frysinger <vapier@gentoo.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21 08:50:11 +01:00
Jiri Kosina
d4263348f7 Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
Masanari Iida
e227867f12 treewide: Fix typo in Documentation/DocBook
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-19 14:58:17 +01:00
Tim Chen
ddf1d169c0 locking/mcs: Allow architecture specific asm files to be used for contended case
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:52 +01:00
Tim Chen
b119fa61d4 locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
We perform a clean up of the Kbuid files in each architecture.
We order the files in each Kbuild in alphabetical order
by running the below script.

for i in arch/*/include/asm/Kbuild
do
        cat $i | gawk '/^generic-y/ {
                i = 3;
                do {
                        for (; i <= NF; i++) {
                                if ($i == "\\") {
                                        getline;
                                        i = 1;
                                        continue;
                                }
                                if ($i != "")
                                        hdr[$i] = $i;
                        }
                        break;
                } while (1);
                next;
        }
        // {
                print $0;
        }
        END {
                n = asort(hdr);
                for (i = 1; i <= n; i++)
                        print "generic-y += " hdr[i];
        }' > ${i}.sorted;
        mv ${i}.sorted $i;
done

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: George Spelvin <linux@horizon.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ Fixed build bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:17:50 +01:00
Tejun Heo
0b60f9ead5 s390: use device_remove_file_self() instead of device_schedule_callback()
driver-core now supports synchrnous self-deletion of attributes and
the asynchrnous removal mechanism is scheduled for removal.  Use it
instead of device_schedule_callback().

* Conversions in arch/s390/pci/pci_sysfs.c and
  drivers/s390/block/dcssblk.c are straightforward.

* drivers/s390/cio/ccwgroup.c is a bit more tricky because
  ccwgroup_notifier() was (ab)using device_schedule_callback() to
  purely obtain a process context to kick off ungroup operation which
  may block from a notifier callback.

  Rename ccwgroup_ungroup_callback() to ccwgroup_ungroup() and make it
  take ccwgroup_device * instead.  The new function is now called
  directly from ccwgroup_ungroup_store().

  ccwgroup_notifier() chain is updated to explicitly bounce through
  ccwgroup_device->ungroup_work.  This also removes possible failure
  from memory pressure.

Only compile-tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07 15:42:41 -08:00
Paolo Bonzini
f244d910ea Merge tag 'kvm-s390-20140130' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Two new features are added by this patch set:
- The floating interrupt controller (flic) that allows us to inject,
  clear and inspect non-vcpu local interrupts. This also gives us an
  opportunity to fix deficiencies in our existing interrupt definitions.
- Support for asynchronous page faults via the pfault mechanism. Testing
  show significant guest performance improvements under host swap.
2014-02-04 04:23:37 +01:00
Linus Torvalds
e2a0f813e0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
 "Second batch of KVM updates.  Some minor x86 fixes, two s390 guest
  features that need some handling in the host, and all the PPC changes.

  The PPC changes include support for little-endian guests and
  enablement for new POWER8 features"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
  x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101
  x86, kvm: cache the base of the KVM cpuid leaves
  kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef
  KVM: PPC: Book3S PR: Cope with doorbell interrupts
  KVM: PPC: Book3S HV: Add software abort codes for transactional memory
  KVM: PPC: Book3S HV: Add new state for transactional memory
  powerpc/Kconfig: Make TM select VSX and VMX
  KVM: PPC: Book3S HV: Basic little-endian guest support
  KVM: PPC: Book3S HV: Add support for DABRX register on POWER7
  KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
  KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
  KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
  KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
  KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
  KVM: PPC: Book3S HV: Add handler for HV facility unavailable
  KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
  KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
  KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
  KVM: PPC: Book3S HV: Don't set DABR on POWER8
  kvm/ppc: IRQ disabling cleanup
  ...
2014-01-31 08:37:32 -08:00
Dominik Dingel
536336c216 KVM: async_pf: Exploit one reg interface for pfault
To enable pfault after live migration we need to expose pfault_token,
pfault_select and pfault_compare, as one reg registers to userspace.

So that qemu is able to transfer this between the source and the target.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-30 13:11:05 +01:00
Dominik Dingel
3c038e6be0 KVM: async_pf: Async page fault support on s390
This patch enables async page faults for s390 kvm guests.
It provides the userspace API to enable and disable_wait this feature.
The disable_wait will enforce that the feature is off by waiting on it.
Also it includes the diagnose code, called by the guest to enable async page faults.

The async page faults will use an already existing guest interface for this
purpose, as described in "CP Programming Services (SC24-6084)".

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-30 13:11:02 +01:00
Dominik Dingel
24eb3a824c KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault
In the case of a fault, we will retry to exit sie64 but with gmap fault
indication for this thread set. This makes it possible to handle async
page faults.

Based on a patch from Martin Schwidefsky.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-30 12:50:39 +01:00
Jens Freimann
a91b8ebe86 KVM: s390: limit floating irqs
Userspace can flood the kernel with interrupts as of now, so let's
limit the number of pending floating interrupts injected via either
the floating interrupt controller or the KVM_S390_INTERRUPT ioctl.

We can have up to 4*64k pending subchannels + 8 adapter interrupts,
as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
There are also sclp and machine checks. This gives us
(4*65536+8+64*64+1+1) = 266250 interrupts.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-30 10:25:23 +01:00
Jens Freimann
c05c4186bb KVM: s390: add floating irq controller
This patch adds a floating irq controller as a kvm_device.
It will be necessary for migration of floating interrupts as well
as for hardening the reset code by allowing user space to explicitly
remove all pending floating interrupts.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-30 10:25:20 +01:00
Jens Freimann
81aa8efe01 KVM: s390: add and extend interrupt information data structs
With the currently available struct kvm_s390_interrupt it is not possible to
inject every kind of interrupt as defined in the z/Architecture. Add
additional interruption parameters to the structures and move it to kvm.h

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-29 22:00:41 +01:00
Linus Torvalds
e770d73ceb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "A new binary interface to be able to query and modify the LPAR
  scheduler weight and cap settings.  Some improvements for the hvc
  terminal over iucv and a couple of bux fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/hypfs: add interface for diagnose 0x304
  s390: wire up sys_sched_setattr/sys_sched_getattr
  s390/uapi: fix struct statfs64 definition
  s390/uaccess: remove dead extern declarations, make functions static
  s390/uaccess: test if current->mm is set before walking page tables
  s390/zfcpdump: make zfcpdump depend on 64BIT
  s390/32bit: fix cmpxchg64
  s390/xpram: don't modify module parameters
  s390/zcrypt: remove zcrypt kmsg documentation again
  s390/hvc_iucv: Automatically assign free HVC terminal devices
  s390/hvc_iucv: Display connection details through device attributes
  s390/hvc_iucv: fix sparse warning
  s390/vmur: Link parent CCW device during UR device creation
2014-01-28 09:02:24 -08:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Martin Schwidefsky
07be038209 s390/hypfs: add interface for diagnose 0x304
To provide access to the set-partition-resource-parameter interface
to user space add a new attribute to hypfs/debugfs:
 * s390_hypsfs/diag_304
The data for the query-partition-resource-parameters command can
be access by a read on the attribute. All other diagnose 0x304
requests need to be submitted via ioctl with CAP_SYS_ADMIN rights.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-24 09:40:59 +01:00
Paolo Bonzini
c760f5e29d Merge tag 'kvm-s390-20140117' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-queue
This deals with 2 guest features that need enablement in the kvm host:
- transactional execution
- lpp sampling support

In addition there is also a fix to the virtio-ccw guest driver. This will
enable future features
2014-01-23 11:38:13 +01:00
Linus Torvalds
7ebd3faa9b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "First round of KVM updates for 3.14; PPC parts will come next week.

  Nothing major here, just bugfixes all over the place.  The most
  interesting part is the ARM guys' virtualized interrupt controller
  overhaul, which lets userspace get/set the state and thus enables
  migration of ARM VMs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
  kvm: make KVM_MMU_AUDIT help text more readable
  KVM: s390: Fix memory access error detection
  KVM: nVMX: Update guest activity state field on L2 exits
  KVM: nVMX: Fix nested_run_pending on activity state HLT
  KVM: nVMX: Clean up handling of VMX-related MSRs
  KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject
  KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
  KVM: nVMX: Leave VMX mode on clearing of feature control MSR
  KVM: VMX: Fix DR6 update on #DB exception
  KVM: SVM: Fix reading of DR6
  KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS
  add support for Hyper-V reference time counter
  KVM: remove useless write to vcpu->hv_clock.tsc_timestamp
  KVM: x86: fix tsc catchup issue with tsc scaling
  KVM: x86: limit PIT timer frequency
  KVM: x86: handle invalid root_hpa everywhere
  kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub
  kvm: vfio: silence GCC warning
  KVM: ARM: Remove duplicate include
  arm/arm64: KVM: relax the requirements of VMA alignment for THP
  ...
2014-01-22 21:40:43 -08:00
Heiko Carstens
fded4329da s390: wire up sys_sched_setattr/sys_sched_getattr
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-22 14:02:19 +01:00
Heiko Carstens
4e078146df s390/uapi: fix struct statfs64 definition
With b8668fd0a7 "s390/uapi: change struct statfs[64] member types
to unsigned values" the size of a couple of struct statfs64 member got
incorrectly changed from 64 to 32 bit for 32 bit builds.

Fix this by changing the type of couple of struct statfs64 members from
unsigned long to unsigned long long.
The definition of struct compat_statfs64 was correct however.

Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-22 14:02:18 +01:00
Heiko Carstens
38ea1f358b s390/32bit: fix cmpxchg64
Fix broken inline assembly contraints for cmpxchg64 on 32bit.

Fixes this crash:

specification exception: 0006 [#1] SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0 #4
task: 005a16c8 ti: 00592000 task.ti: 00592000
Krnl PSW : 070ce000 8029abd6 (lockref_get+0x3e/0x9c)
...
Krnl Code: 8029abcc: a71a0001           ahi     %r1,1
           8029abd0: 1852               lr      %r5,%r2
          #8029abd2: bb40f064           cds     %r4,%r0,100(%r15)
          >8029abd6: 1943               cr      %r4,%r3
           8029abd8: 1815               lr      %r1,%r5
Call Trace:
([<0000000078e01870>] 0x78e01870)
 [<000000000021105a>] sysfs_mount+0xd2/0x1c8
 [<00000000001b551e>] mount_fs+0x3a/0x134
 [<00000000001ce768>] vfs_kern_mount+0x44/0x11c
 [<00000000001ce864>] kern_mount_data+0x24/0x3c
 [<00000000005cc4b8>] sysfs_init+0x74/0xd4
 [<00000000005cb5b4>] mnt_init+0xe0/0x1fc
 [<00000000005cb16a>] vfs_caches_init+0xb6/0x14c
 [<00000000005be794>] start_kernel+0x318/0x33c
 [<000000000010001c>] _stext+0x1c/0x80

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-22 14:02:15 +01:00
Linus Torvalds
6ffbe7d1fa Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 - futex performance increases: larger hashes, smarter wakeups
 - mutex debugging improvements
 - lots of SMP ordering documentation updates
 - introduce the smp_load_acquire(), smp_store_release() primitives.
   (There are WIP patches that make use of them - not yet merged)
 - lockdep micro-optimizations
 - lockdep improvement: better cover IRQ contexts
 - liblockdep at last. We'll continue to monitor how useful this is

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  futexes: Fix futex_hashsize initialization
  arch: Re-sort some Kbuild files to hopefully help avoid some conflicts
  futexes: Avoid taking the hb->lock if there's nothing to wake up
  futexes: Document multiprocessor ordering guarantees
  futexes: Increase hash table size for better performance
  futexes: Clean up various details
  arch: Introduce smp_load_acquire(), smp_store_release()
  arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  mutexes: Give more informative mutex warning in the !lock->owner case
  powerpc: Full barrier for smp_mb__after_unlock_lock()
  rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods
  Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK
  locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier
  Documentation/memory-barriers.txt: Document ACCESS_ONCE()
  Documentation/memory-barriers.txt: Prohibit speculative writes
  Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt
  Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
  Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency"
  ...
2014-01-20 10:23:08 -08:00
Linus Torvalds
f479c01c8e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "The bulk of the s390 updates for v3.14.

  New features are the perf support for the CPU-Measurement Sample
  Facility and the EP11 support for the crypto cards.  And the normal
  cleanups and bug-fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/cpum_sf: fix printk format warnings
  s390: Fix misspellings using 'codespell' tool
  s390/qdio: bridgeport support - CHSC part
  s390: delete new instances of __cpuinit usage
  s390/compat: fix PSW32_USER_BITS definition
  s390/zcrypt: add support for EP11 coprocessor cards
  s390/mm: optimize randomize_et_dyn for !PF_RANDOMIZE
  s390: use IS_ENABLED to check if a CONFIG is set to y or m
  s390/cio: use device_lock to synchronize calls to the ccwgroup driver
  s390/cio: use device_lock to synchronize calls to the ccw driver
  s390/cio: fix unlocked access of online member
  s390/cpum_sf: Add flag to process full SDBs only
  s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling function
  s390/cpum_sf: Filter perf events based event->attr.exclude_* settings
  s390/cpum_sf: Detect KVM guest samples
  s390/cpum_sf: Add helper to read TOD from trailer entries
  s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocks
  s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur
  s390/pci: reenable per default
  s390/pci/dma: fix accounting of allocated_pages
  ...
2014-01-20 09:23:31 -08:00
Michal Sekletar
ea02f9411d net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 19:08:58 -08:00
Michael Mueller
7feb6bb8e6 KVM: s390: enable Transactional Execution
This patch enables transactional execution for KVM guests
on s390 systems zec12 or later.

We rework the allocation of the page containing the sie_block
to also back the Interception Transaction Diagnostic Block.
If available the TE facilities will be enabled.

Setting bit 73 and 50 in vfacilities bitmask reveals the HW
facilities Transactional Memory and Constraint Transactional
Memory respectively to the KVM guest.

Furthermore, the patch restores the Program-Interruption TDB
from the Interception TDB in case a program interception has
occurred and the ITDB has a valid format.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-01-17 13:12:01 +01:00
Hendrik Brueckner
b4a960159e s390: Fix misspellings using 'codespell' tool
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-16 16:40:13 +01:00
Eugene Crosser
59b55a4df2 s390/qdio: bridgeport support - CHSC part
Introduce function for the "Perform network-subchannel operation"
CHSC command with operation code "bridgeport information",
and bit definitions for "characteristics" pertaning to this command.

Signed-off-by: Eugene Crosser <eugene.crosser@ru.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 14:48:01 -08:00
David S. Miller
0a379e21c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-14 14:42:42 -08:00
Eugene Crosser
1c59a861d6 s390/qdio: bridgeport support - CHSC part
Introduce function for the "Perform network-subchannel operation"
CHSC command with operation code "bridgeport information",
and bit definitions for "characteristics" pertaning to this command.

Signed-off-by: Eugene Crosser <eugene.crosser@ru.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-14 15:16:09 +01:00
Heiko Carstens
075dfd8210 s390/compat: fix PSW32_USER_BITS definition
PSW32_USER_BITS should define the primary address space for user space
instead of the home address space.
Symptom of this bug is that gdb doesn't work in compat mode.

The bug was introduced with e258d719ff "s390/uaccess: always run the kernel
in home space" and f26946d7ec "s390/compat: make psw32_user_bits a constant
value again".

Cc: stable@vger.kernel.org # v3.13+
Reported-by: Andreas Arnez <arnez@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-13 16:50:21 +01:00
Ingo Molnar
1c62448e39 Merge tag 'v3.13-rc8' into core/locking
Refresh the tree with the latest fixes, before applying new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 11:44:41 +01:00
Peter Zijlstra
47933ad41a arch: Introduce smp_load_acquire(), smp_store_release()
A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:37:17 +01:00
David S. Miller
143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Ingo Tuchscherer
91f3e3eaba s390/zcrypt: add support for EP11 coprocessor cards
This feature extends the generic cryptographic device driver (zcrypt)
with a new capability to service EP11 requests for the Crypto Express4S
card in EP11 (Enterprise PKCS#11 mode) coprocessor mode.

Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-18 17:37:15 +01:00
Heiko Carstens
d80512f874 s390/smp: improve setup of possible cpu mask
Since under z/VM we cannot have more than 64 cpus, make sure the
cpu_possible_mask does not contain more bits.
This avoids wasting memory for dynamic per-cpu allocations if
CONFIG_NR_CPUS is larger than 64.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-18 17:35:18 +01:00
David S. Miller
e3fec2f74f lib: Add missing arch generic-y entries for asm-generic/hash.h
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 21:26:19 -05:00
Hendrik Brueckner
d7528862cf s390/cpum_sf: Add flag to process full SDBs only
Add the PERF_CPUM_SF_FULL_BLOCKS flag to process only sample-data-blocks that
have the block-full-indicator bit set.  Sample-data-blocks that are partially
filled are discarded.  Use this flag if the sampling buffer is likely to be
shared among perf events that use different sampling modes.  In such
environments, flushing sample-data-blocks that are not completely filled, might
cause invalid-data-formats.

Setting PERF_CPUM_SF_FULL_BLOCKS prevents potentially invalid sampling data to
be processed but, in contrast, also discards valid samples in partially filled
sample-data-blocks.  Note that sample-data-blocks might not become full for
small sampling frequencies or for workload that is scheduled for tiny intervals.

To sample with the PERF_CPUM_SF_FULL_BLOCKS flag, set the perf->attr.config1
to 0x0004.  For example:

	perf record -e cpum_sf/config=0xB000,config1=0x0004/

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:38:01 +01:00
Hendrik Brueckner
7e75fc3ff4 s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling function
Also support the diagnostic-sampling function in addition to the basic-sampling
function.  Diagnostic-sampling data entries contain hardware model specific
sampling data and additional programs are required to analyze the data.

To deliver diagnostic-sampling, as well, as basis-sampling data entries to user
space, introduce support for sampling "raw data".  If this particular perf
sampling type (PERF_SAMPLE_RAW) is used, sampling data entries are copied
to user space.  External programs can then analyze these data.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:38:00 +01:00
Hendrik Brueckner
443e802bab s390/cpum_sf: Detect KVM guest samples
The host-program-parameter (hpp) value of basic sample-data-entries designates
a SIE control block that is set by the LPP instruction in sie64a().
Non-zero values indicate guest samples, a value of zero indicates a host sample.

For perf samples, host and guest samples are distinguished using particular
PERF_MISC_* flags.  The perf layer calls perf_misc_flags() to set the flags
based on the pt_regs content.  For each sample-data-entry, the cpum_sf PMU
creates a pt_regs structure with the sample-data information.  An additional
flag structure is added to easily distinguish between host and guest samples.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:37:59 +01:00
Hendrik Brueckner
443d4beb82 s390/cpum_sf: Add helper to read TOD from trailer entries
The trailer entry contains a timestamp of the time when the sample-data-block
became full.  The timestamp specifies a TOD (time-of-day) value in either the
STCK or STCKE format.

Provide a helper function to return the TOD value depending on the setting of
time format indicator.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:37:58 +01:00
Hendrik Brueckner
fcc77f5073 s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocks
Ensure to reset the sample-data-block full indicator and the overflow counter
at the same time.  This must be done atomically because the sampling hardware
is still active while full sample-data-block is processed.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:37:57 +01:00
Hendrik Brueckner
69f239ed33 s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur
Improve the sampling buffer allocation and add a function to reallocate and
increase the sampling buffer structure.  The number of allocated buffer elements
(sample-data-blocks) are accounted.  You can control the minimum and maximum
number these sample-data-blocks through the cpum_sfb_size kernel parameter.

The number hardware sample overflows (if any) are also accounted and stored
per perf event.  During the PMU disable/enable calls, the accumulated overflow
counter is analyzed and, if necessary, the sampling buffer is dynamically
increased.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-12-16 14:37:57 +01:00