Commit Graph

23030 Commits

Author SHA1 Message Date
Namhyung Kim
4b94f5b7b4 tracing: Add hist trigger 'log2' modifier
Allow users to have numeric fields displayed as log2 values in case
value range is very wide by appending '.log2' to field names.

For example,

  # echo 'hist:key=bytes_req' > kmalloc/trigger
  # cat kmalloc/hist

  { bytes_req:        504 } hitcount:          1
  { bytes_req:         11 } hitcount:          1
  { bytes_req:        104 } hitcount:          1
  { bytes_req:         48 } hitcount:          1
  { bytes_req:       2048 } hitcount:          1
  { bytes_req:       4096 } hitcount:          1
  { bytes_req:        240 } hitcount:          1
  { bytes_req:        392 } hitcount:          1
  { bytes_req:         13 } hitcount:          1
  { bytes_req:         28 } hitcount:          1
  { bytes_req:         12 } hitcount:          1
  { bytes_req:         64 } hitcount:          2
  { bytes_req:        128 } hitcount:          2
  { bytes_req:         32 } hitcount:          2
  { bytes_req:          8 } hitcount:         11
  { bytes_req:         10 } hitcount:         13
  { bytes_req:         24 } hitcount:         25
  { bytes_req:        160 } hitcount:         29
  { bytes_req:         16 } hitcount:         33
  { bytes_req:         80 } hitcount:         36

When using '.log2' modifier, the output looks like:

  # echo 'hist:key=bytes_req.log2' > kmalloc/trigger
  # cat kmalloc/hist

  { bytes_req: ~ 2^12 } hitcount:          1
  { bytes_req: ~ 2^11 } hitcount:          1
  { bytes_req: ~ 2^9  } hitcount:          2
  { bytes_req: ~ 2^6  } hitcount:          3
  { bytes_req: ~ 2^3  } hitcount:         13
  { bytes_req: ~ 2^5  } hitcount:         19
  { bytes_req: ~ 2^8  } hitcount:         49
  { bytes_req: ~ 2^7  } hitcount:         57
  { bytes_req: ~ 2^4  } hitcount:         74

Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:56:03 -04:00
Tom Zanussi
5463bfda32 tracing: Add support for named hist triggers
Allow users to define 'named' hist triggers.  All triggers created
with the same 'name=xxx' option will update the same shared histogram
data.

This expands the hist trigger syntax from this:

    # echo hist:keys=xxx ... [ if filter] > event/trigger

to this:

    # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger

Named histograms must use a 'compatible' set of keys and values, which
means each event added to a set of named triggers must have the same
names and types.

Reading the 'hist' file of any of the participating events will
produce the same output as any other participating event, which is to
be expected since they share the same data.

Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:56:01 -04:00
Tom Zanussi
db1388b4ff tracing: Add support for named triggers
Named triggers are sets of triggers that share a common set of trigger
data.  An example of functionality that could benefit from this type
of capability would be a set of inlined probes that would each
contribute event counts, for example, to a shared counter data
structure.

The first named trigger registered with a given name owns the common
trigger data that the others subsequently registered with the same
name will reference.  The functions defined here allow users to add,
delete, and find named triggers.

It also adds functions to pause and unpause named triggers; since
named triggers act upon common data, they should also be paused and
unpaused as a group.

Link: http://lkml.kernel.org/r/c09ff648360f65b10a3e321eddafe18060b4a04f.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:56:00 -04:00
Tom Zanussi
52a7f16ded tracing: Add support for multiple hist triggers per event
Allow users to define any number of hist triggers per trace event.
Any number of hist triggers may be added for a given event, which may
differ by key, value, or filter.

Reading the event's 'hist' file will display the output of all the
hist triggers defined on an event concatenated in the order they were
defined.

Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:55:59 -04:00
Tom Zanussi
d0bad49bb0 tracing: Add enable_hist/disable_hist triggers
Similar to enable_event/disable_event triggers, these triggers enable
and disable the aggregation of events into maps rather than enabling
and disabling their writing into the trace buffer.

They can be used to automatically start and stop hist triggers based
on a matching filter condition.

If there's a paused hist trigger on system:event, the following would
start it when the filter condition was hit:

  # echo enable_hist:system:event [ if filter] > event/trigger

And the following would disable a running system:event hist trigger:

  # echo disable_hist:system:event [ if filter] > event/trigger

See Documentation/trace/events.txt for real examples.

Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:55:57 -04:00
Tom Zanussi
6a475cb17f tracing: Remove restriction on string position in hist trigger keys
If we assume the maximum size for a string field, we don't have to
worry about its position.  Since we only allow two keys in a compound
key and having more than one string key in a given compound key
doesn't make much sense anyway, trading a bit of extra space instead
of introducing an arbitrary restriction makes more sense.

We also need to use the event field size for static strings when
copying the contents, otherwise we get random garbage in the key.

Also, cast string return values to avoid warnings on 32-bit compiles.

Finally, rearrange the code without changing any functionality by
moving the compound key updating code into a separate function.

Link: http://lkml.kernel.org/r/8976e1ab04b66bc2700ad1ed0768a2de85ac1983.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:55:56 -04:00
Namhyung Kim
79e577cbce tracing: Support string type key properly
The string in a trace event is usually recorded as dynamic array which
is variable length.  But current hist code only support fixed length
array so it cannot support most strings.

This patch fixes it by checking filter_type of the field and get
proper pointer with it.  With this, it can get a histogram of exec()
based on filenames like below:

  # cd /sys/kernel/tracing/events/sched/sched_process_exec
  # cat 'hist:key=filename' > trigger
  # ps
   PID TTY       TIME CMD
     1 ?     00:00:00 init
    29 ?     00:00:00 sh
    38 ?     00:00:00 ps
  # ls
  enable  filter  format  hist  id  trigger
  # cat hist
  # trigger info: hist:keys=filename:vals=hitcount:sort=hitcount:size=2048 [active]

  { filename: /usr/bin/ps                         } hitcount:          1
  { filename: /usr/bin/ls                         } hitcount:          1
  { filename: /usr/bin/cat                        } hitcount:          1

  Totals:
      Hits: 3
      Entries: 3
      Dropped: 0

Link: http://lkml.kernel.org/r/610180d6df0cfdf11ee205452f3b241dea657233.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Added (unsigned long) typecast to fix compile warning ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 18:55:00 -04:00
Tom Zanussi
69a0200c2e tracing: Add hist trigger support for stacktraces as keys
It's often useful to be able to use a stacktrace as a hash key, for
keeping a count of the number of times a particular call path resulted
in a trace event, for instance.  Add a special key named 'stacktrace'
which can be used as key in a 'keys=' param for this purpose:

    # echo hist:keys=stacktrace ... \
               [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:19:01 -04:00
Tom Zanussi
316961988b tracing: Add hist trigger 'syscall' modifier
Allow users to have syscall id fields displayed as syscall names in
the output by appending '.syscall' to field names:

   # echo hist:keys=aaa.syscall ... \
              [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:18:04 -04:00
Tom Zanussi
6b4827ad02 tracing: Add hist trigger 'execname' modifier
Allow users to have common_pid field values displayed as program names
in the output by appending '.execname' to a common_pid field name:

   # echo hist:keys=common_pid.execname ... \
              [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:56 -04:00
Tom Zanussi
c6afad49d1 tracing: Add hist trigger 'sym' and 'sym-offset' modifiers
Allow users to have address fields displayed as symbols in the output
by appending '.sym' or 'sym-offset' to field names:

   # echo hist:keys=aaa.sym,bbb.sym-offset ... \
              [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:51 -04:00
Tom Zanussi
0c4a6b4666 tracing: Add hist trigger 'hex' modifier for displaying numeric fields
Allow users to have numeric fields displayed as hex values in the
output by appending '.hex' to field names:

   # echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \
              [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:43 -04:00
Tom Zanussi
e86ae9baac tracing: Add hist trigger support for clearing a trace
Allow users to append 'clear' to an existing trigger in order to have
the hash table cleared.

This expands the hist trigger syntax from this:
    # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \
           [ if filter] >> event/trigger

to this:

    # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \
          [ if filter] >> event/trigger

Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:35 -04:00
Tom Zanussi
83e99914c9 tracing: Add hist trigger support for pausing and continuing a trace
Allow users to append 'pause' or 'continue' to an existing trigger in
order to have it paused or to have a paused trace continue.

This expands the hist trigger syntax from this:
    # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \
          [ if filter] >> event/trigger

to this:

    # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \
          [ if filter] >> event/trigger

Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:29 -04:00
Tom Zanussi
e62347d245 tracing: Add hist trigger support for user-defined sorting ('sort=' param)
Allow users to specify keys and/or values to sort on.  With this
addition, keys and values specified using the 'keys=' and 'vals='
keywords can be used to sort the hist trigger output via a new 'sort='
keyword.  If multiple sort keys are specified, the output will be
sorted using the second key as a secondary sort key, etc.  The default
sort order is ascending; if the user wants a different sort order,
'.descending' can be appended to the specific sort key.  Before this
addition, output was always sorted by 'hitcount' in ascending order.

This expands the hist trigger syntax from this:

    # echo hist:keys=xxx:vals=yyy \
          [ if filter] > event/trigger

to this:

    # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \
          [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:17:19 -04:00
Tom Zanussi
76a3b0c8ac tracing: Add hist trigger support for compound keys
Allow users to specify multiple trace event fields to use in keys by
allowing multiple fields in the 'keys=' keyword.  With this addition,
any unique combination of any of the fields named in the 'keys'
keyword will result in a new entry being added to the hash table.

Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:16:33 -04:00
Tom Zanussi
f2606835d7 tracing: Add hist trigger support for multiple values ('vals=' param)
Allow users to specify trace event fields to use in aggregated sums
via a new 'vals=' keyword.  Before this addition, the only aggregated
sum supported was the implied value 'hitcount'.  With this addition,
'hitcount' is also supported as an explicit value field, as is any
numeric trace event field.

This expands the hist trigger syntax from this:

  # echo hist:keys=xxx [ if filter] > event/trigger

to this:

  # echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger

Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:16:23 -04:00
Tom Zanussi
7ef224d1d0 tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.

The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.

This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.

A surprising number of typical use cases can be accomplished by users
via this simple mechanism.  In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.

The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event.  By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system.  See Documentation/trace/events.txt for examples.

hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.

The basic syntax follows the existing trigger syntax.  Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:

  # echo hist:keys=xxx [ if filter] > event/trigger

Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.

To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:

  # cat event/hist

The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.

Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:16:14 -04:00
Tom Zanussi
3b772b96b8 tracing: Update some tracing_map constants and comments
Make it clear exactly how many keys and values are supported through
better defines, and add 1 to the vals count, since normally clients
want support for at least a hitcount and two other values.

Also, note the error return value for tracing_map_add_key/val_field()
in the comments.

Link: http://lkml.kernel.org/r/6696fa02ebc716aa344c27a571a2afaa25e5b4d4.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:16:06 -04:00
Steven Rostedt (Red Hat)
8d44f2f34f tracing: Fix TRACING_MAP Kconfig
The config option for TRACING_MAP has "default n", which is not needed
because the default of configs is 'n'.

Also, since the TRACING_MAP has no config prompt, there's no reason to
include "If in doubt, say N" in the help text.

Fixed a typo in the comments of tracing_map.h.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:15:54 -04:00
Tom Zanussi
08d43a5fa0 tracing: Add lock-free tracing_map
Add tracing_map, a special-purpose lock-free map for tracing.

tracing_map is designed to aggregate or 'sum' one or more values
associated with a specific object of type tracing_map_elt, which
is associated by the map to a given key.

It provides various hooks allowing per-tracer customization and is
separated out into a separate file in order to allow it to be shared
between multiple tracers, but isn't meant to be generally used outside
of that context.

The tracing_map implementation was inspired by lock-free map
algorithms originated by Dr. Cliff Click:

 http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
 http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

Link: http://lkml.kernel.org/r/b43d68d1add33582a396f553c8ef705a33a6a748.1449767187.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:04:59 -04:00
Steven Rostedt
c37775d578 tracing: Add infrastructure to allow set_event_pid to follow children
Add the infrastructure needed to have the PIDs in set_event_pid to
automatically add PIDs of the children of the tasks that have their PIDs in
set_event_pid. This will also remove PIDs from set_event_pid when a task
exits

This is implemented by adding hooks into the fork and exit tracepoints. On
fork, the PIDs are added to the list, and on exit, they are removed.

Add a new option called event_fork that when set, PIDs in set_event_pid will
automatically get their children PIDs added when they fork, as well as any
task that exits will have its PID removed from set_event_pid.

This works for instances as well.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:28 -04:00
Steven Rostedt
f4d34a87e9 tracing: Use pid bitmap instead of a pid array for set_event_pid
In order to add the ability to let tasks that are filtered by the events
have their children also be traced on fork (and then not traced on exit),
convert the array into a pid bitmask. Most of the time the number of pids is
only 32768 pids or a 4k bitmask, which is the same size as the default list
currently is, and that list could grow if more pids are listed.

This also greatly simplifies the code.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:27 -04:00
Steven Rostedt
9ebc57cfaa tracing: Rename check_ignore_pid() to ignore_this_task()
The name "check_ignore_pid" is confusing in trying to figure out if the pid
should be ignored or not. Rename it to "ignore_this_task" which is pretty
straight forward, as a task (not a pid) is passed in, and should if true
should be ignored.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:26 -04:00
Davidlohr Bueso
6687659568 locking/pvqspinlock: Fix division by zero in qstat_read()
While playing with the qstat statistics (in <debugfs>/qlockstat/) I ran into
the following splat on a VM when opening pv_hash_hops:

  divide error: 0000 [#1] SMP
  ...
  RIP: 0010:[<ffffffff810b61fe>]  [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0
  ...
  Call Trace:
    [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0
    [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0
    [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40
    [<ffffffff811937a9>] ? do_mmap+0x449/0x550
    [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0
    [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0
    [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120
    [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0
    [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero,
similarly to what the qstat_pv_latency_wake case does, as if nothing
else, this can come from resetting the statistics, thus having 0 kicks
should be quite valid in this context.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: waiman.long@hpe.com
Link: http://lkml.kernel.org/r/1460961103-24953-1-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:49:19 +02:00
Arnd Bergmann
266a0a790f bpf: avoid warning for wrong pointer cast
Two new functions in bpf contain a cast from a 'u64' to a
pointer. This works on 64-bit architectures but causes a warning
on all 32-bit architectures:

kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp':
kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
  u64 ctx = *(long *)r1;

This changes the cast to first convert the u64 argument into a uintptr_t,
which is guaranteed to be the same size as a pointer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 9940d67c93 ("bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18 20:58:55 -04:00
Michael Ellerman
8404410b29 Merge branch 'topic/livepatch' into next
Merge the support for live patching on ppc64le using mprofile-kernel.
This branch has also been merged into the livepatching tree for v4.7.
2016-04-18 20:45:32 +10:00
Linus Torvalds
ac82a57aff Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixlet from Ingo Molnar:
 "Fixes a build warning on certain Kconfig combinations"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix print_collision() unused warning
2016-04-16 15:43:19 -07:00
Jiri Kosina
4d4fb97a62 Merge branch 'topic/livepatch' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into for-4.7/livepatching-ppc64le
Pull livepatching support for ppc64 architecture from Michael Ellerman.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-15 11:42:51 +02:00
Daniel Borkmann
074f528eed bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACK
This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument
type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(),
bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm()
and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can
also be removed since the verifier already makes sure we stay within bounds
on stack buffers.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:41 -04:00
Daniel Borkmann
435faee1aa bpf, verifier: add ARG_PTR_TO_RAW_STACK type
When passing buffers from eBPF stack space into a helper function, we have
ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure
that such buffers are initialized, within boundaries, etc.

However, the downside with this is that we have a couple of helper functions
such as bpf_skb_load_bytes() that fill out the passed buffer in the expected
success case anyway, so zero initializing them prior to the helper call is
unneeded/wasted instructions in the eBPF program that can be avoided.

Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK.
The idea is to skip the STACK_MISC check in check_stack_boundary() and color
the related stack slots as STACK_MISC after we checked all call arguments.

Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of
the helper function will fill the provided buffer area, so that we cannot leak
any uninitialized stack memory. This f.e. means that error paths need to
memset() the buffers, but the expected fast-path doesn't have to do this
anymore.

Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK
argument, we can keep it simple and don't need to check for multiple areas.
Should in future such a use-case really appear, we have check_raw_mode() that
will make sure we implement support for it first.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:41 -04:00
Daniel Borkmann
33ff9823c5 bpf, verifier: add bpf_call_arg_meta for passing meta data
Currently, when the verifier checks calls in check_call() function, we
call check_func_arg() for all 5 arguments e.g. to make sure expected types
are correct. In some cases, we collect meta data (here: map pointer) to
perform additional checks such as checking stack boundary on key/value
sizes for subsequent arguments. As we're going to extend the meta data,
add a generic struct bpf_call_arg_meta that we can use for passing into
check_func_arg().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 21:40:41 -04:00
Linus Torvalds
51d7b12041 /proc/iomem: only expose physical resource addresses to privileged users
In commit c4004b02f8 ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.

This limits all the detailed resource information to properly
credentialed users instead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-14 12:56:09 -07:00
Michael Ellerman
28e7cbd3e0 livepatch: Allow architectures to specify an alternate ftrace location
When livepatch tries to patch a function it takes the function address
and asks ftrace to install the livepatch handler at that location.
ftrace will look for an mcount call site at that exact address.

On powerpc the mcount location is not the first instruction of the
function, and in fact it's not at a constant offset from the start of
the function. To accommodate this add a hook which arch code can
override to customise the behaviour.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-14 15:47:05 +10:00
Michael Ellerman
04cf31a759 ftrace: Make ftrace_location_range() global
In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-14 15:47:05 +10:00
Alexei Starovoitov
d82bccc690 bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
verifier must check for reserved size bits in instruction opcode and
reject BPF_LD | BPF_ABS | BPF_DW and BPF_LD | BPF_IND | BPF_DW instructions,
otherwise interpreter will WARN_RATELIMIT on them during execution.

Fixes: ddd872bc30 ("bpf: verifier: add checks for BPF_ABS | BPF_IND instructions")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 01:31:50 -04:00
Anton Blanchard
bd92883051 sched/cpuacct: Check for NULL when using task_pt_regs()
task_pt_regs() can return NULL for kernel threads, so add a check.
This fixes an oops at boot on ppc64.

Reported-and-Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: htejun@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: tj@kernel.org
Cc: yangds.fnst@cn.fujitsu.com
Link: http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 13:22:37 +02:00
Daniel Lezcano
2c923e94cd sched/clock: Make local_clock()/cpu_clock() inline
The local_clock/cpu_clock functions were changed to prevent a double
identical test with sched_clock_cpu() when HAVE_UNSTABLE_SCHED_CLOCK
is set. That resulted in one line functions.

As these functions are in all the cases one line functions and in the
hot path, it is useful to specify them as static inline in order to
give a strong hint to the compiler.

After verification, it appears the compiler does not inline them
without this hint. Change those functions to static inline.

sched_clock_cpu() is called via the inlined local_clock()/cpu_clock()
functions from sched.h. So any module code including sched.h will
reference sched_clock_cpu(). Thus it must be exported with the
EXPORT_SYMBOL_GPL macro.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460385514-14700-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 12:25:22 +02:00
Daniel Lezcano
c78b17e28c sched/clock: Remove pointless test in cpu_clock/local_clock
In case the HAVE_UNSTABLE_SCHED_CLOCK config is set, the cpu_clock() version
checks if sched_clock_stable() is not set and calls sched_clock_cpu(),
otherwise it calls sched_clock().

sched_clock_cpu() checks also if sched_clock_stable() is set and, if true,
calls sched_clock().

sched_clock() will be called in sched_clock_cpu() if sched_clock_stable() is
true.

Remove the duplicate test by directly calling sched_clock_cpu() and let the
static key act in this function instead.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460385514-14700-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 12:25:22 +02:00
Rabin Vincent
fb90a6e93c sched/debug: Don't dump sched debug info in SysRq-W
sysrq_sched_debug_show() can dump a lot of information.  Don't print out
all that if we're just trying to get a list of blocked tasks (SysRq-W).
The information is still accessible with SysRq-T.

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:23:21 +02:00
Michal Hocko
d47996082f locking/rwsem: Introduce basis for down_write_killable()
Introduce a generic implementation necessary for down_write_killable().

This is a trivial extension of the already existing down_write() call
which can be interrupted by SIGKILL.  This patch doesn't provide
down_write_killable() yet because arches have to provide the necessary
pieces before.

rwsem_down_write_failed() which is a generic slow path for the
write lock is extended to take a task state and renamed to
__rwsem_down_write_failed_common(). The return value is either a valid
semaphore pointer or ERR_PTR(-EINTR).

rwsem_down_write_failed_killable() is exported as a new way to wait for
the lock and be killable.

For rwsem-spinlock implementation the current __down_write() it updated
in a similar way as __rwsem_down_write_failed_common() except it doesn't
need new exports just visible __down_write_killable().

Architectures which are not using the generic rwsem implementation are
supposed to provide their __down_write_killable() implementation and
use rwsem_down_write_failed_killable() for the slow path.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-7-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:20 +02:00
Michal Hocko
f8e04d8545 locking/rwsem: Get rid of __down_write_nested()
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:16 +02:00
Denys Vlasenko
c003ed9289 locking/lockdep: Deinline register_lock_class(), save 2328 bytes
This function compiles to 1328 bytes of machine code. Three callsites.

Registering a new lock class is definitely not *that* time-critical to inline it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460141926-13069-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:06:13 +02:00
Ingo Molnar
889fac6d67 Merge tag 'v4.6-rc3' into perf/core, to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:57:03 +02:00
Davidlohr Bueso
c1c33b92db locking/locktorture: Fix NULL pointer dereference for cleanup paths
It has been found that paths that invoke cleanups through
lock_torture_cleanup() can trigger NULL pointer dereferencing
bugs during the statistics printing phase. This is mainly
because we should not be calling into statistics before we are
sure things have been set up correctly.

Specifically, early checks (and the need for handling this in
the cleanup call) only include parameter checks and basic
statistics allocation. Once we start write/read kthreads
we then consider the test as started. As such, update the function
in question to check for cxt.lwsa writer stats, if not set,
we either have a bogus parameter or -ENOMEM situation and
therefore only need to deal with general torture calls.

Reported-and-tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-2-git-send-email-paulmck@linux.vnet.ibm.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:52:23 +02:00
Davidlohr Bueso
1f19093189 locking/locktorture: Fix deboosting NULL pointer dereference
For the case of rtmutex torturing we will randomly call into the
boost() handler, including upon module exiting when the tasks are
deboosted before stopping. In such cases the task may or may not have
already been boosted, and therefore the NULL being explicitly passed
can occur anywhere. Currently we only assume that the task will is
at a higher prio, and in consequence, dereference a NULL pointer.

This patch fixes the case of a rmmod locktorture exploding while
pounding on the rtmutex lock (partial trace):

 task: ffff88081026cf80 ti: ffff880816120000 task.ti: ffff880816120000
 RSP: 0018:ffff880816123eb0  EFLAGS: 00010206
 RAX: ffff88081026cf80 RBX: ffff880816bfa630 RCX: 0000000000160d1b
 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
 RBP: ffff88081026cf80 R08: 000000000000001f R09: ffff88017c20ca80
 R10: 0000000000000000 R11: 000000000048c316 R12: ffffffffa05d1840
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88203f880000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 0000000001c0a000 CR4: 00000000000406e0
 Stack:
  ffffffffa05d141d ffff880816bfa630 ffffffffa05d1922 ffff88081e70c2c0
  ffff880816bfa630 ffffffff81095fed 0000000000000000 ffffffff8107bf60
  ffff880816bfa630 ffffffff00000000 ffff880800000000 ffff880816123f08
 Call Trace:
  [<ffffffff81095fed>] kthread+0xbd/0xe0
  [<ffffffff815cf40f>] ret_from_fork+0x3f/0x70

This patch ensures that if the random state pointer is not NULL and current
is not boosted, then do nothing.

 RIP: 0010:[<ffffffffa05c6185>]  [<ffffffffa05c6185>] torture_random+0x5/0x60 [torture]
  [<ffffffffa05d141d>] torture_rtmutex_boost+0x1d/0x90 [locktorture]
  [<ffffffffa05d1922>] lock_torture_writer+0xe2/0x170 [locktorture]

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bobby.prani@gmail.com
Cc: dhowells@redhat.com
Cc: dipankar@in.ibm.com
Cc: dvhart@linux.intel.com
Cc: edumazet@google.com
Cc: fweisbec@gmail.com
Cc: jiangshanlai@gmail.com
Cc: josh@joshtriplett.org
Cc: mathieu.desnoyers@efficios.com
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1460476038-27060-1-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:52:23 +02:00
David Howells
a511e1af8b KEYS: Move the point of trust determination to __key_link()
Move the point at which a key is determined to be trustworthy to
__key_link() so that we use the contents of the keyring being linked in to
to determine whether the key being linked in is trusted or not.

What is 'trusted' then becomes a matter of what's in the keyring.

Currently, the test is done when the key is parsed, but given that at that
point we can only sensibly refer to the contents of the system trusted
keyring, we can only use that as the basis for working out the
trustworthiness of a new key.

With this change, a trusted keyring is a set of keys that once the
trusted-only flag is set cannot be added to except by verification through
one of the contained keys.

Further, adding a key into a trusted keyring, whilst it might grant
trustworthiness in the context of that keyring, does not automatically
grant trustworthiness in the context of a second keyring to which it could
be secondarily linked.

To accomplish this, the authentication data associated with the key source
must now be retained.  For an X.509 cert, this means the contents of the
AuthorityKeyIdentifier and the signature data.


If system keyrings are disabled then restrict_link_by_builtin_trusted()
resolves to restrict_link_reject().  The integrity digital signature code
still works correctly with this as it was previously using
KEY_FLAG_TRUSTED_ONLY, which doesn't permit anything to be added if there
is no system keyring against which trust can be determined.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-04-11 22:43:43 +01:00
Alexei Starovoitov
4923ec0b10 bpf: simplify verifier register state assignments
verifier is using the following structure to track the state of registers:
struct reg_state {
    enum bpf_reg_type type;
    union {
        int imm;
        struct bpf_map *map_ptr;
    };
};
and later on in states_equal() does memcmp(&old->regs[i], &cur->regs[i],..)
to find equivalent states.
Throughout the code of verifier there are assignements to 'imm' and 'map_ptr'
fields and it's not obvious that most of the assignments into 'imm' don't
need to clear extra 4 bytes (like mark_reg_unknown_value() does) to make sure
that memcmp doesn't go over junk left from 'map_ptr' assignment.

Simplify the code by converting 'int' into 'long'

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-10 22:43:18 -04:00
Al Viro
fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
David S. Miller
ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00