ftrace-uses.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348
  1. =================================
  2. Using ftrace to hook to functions
  3. =================================
  4. .. Copyright 2017 VMware Inc.
  5. .. Author: Steven Rostedt <[email protected]>
  6. .. License: The GNU Free Documentation License, Version 1.2
  7. .. (dual licensed under the GPL v2)
  8. Written for: 4.14
  9. Introduction
  10. ============
  11. The ftrace infrastructure was originally created to attach callbacks to the
  12. beginning of functions in order to record and trace the flow of the kernel.
  13. But callbacks to the start of a function can have other use cases. Either
  14. for live kernel patching, or for security monitoring. This document describes
  15. how to use ftrace to implement your own function callbacks.
  16. The ftrace context
  17. ==================
  18. .. warning::
  19. The ability to add a callback to almost any function within the
  20. kernel comes with risks. A callback can be called from any context
  21. (normal, softirq, irq, and NMI). Callbacks can also be called just before
  22. going to idle, during CPU bring up and takedown, or going to user space.
  23. This requires extra care to what can be done inside a callback. A callback
  24. can be called outside the protective scope of RCU.
  25. There are helper functions to help against recursion, and making sure
  26. RCU is watching. These are explained below.
  27. The ftrace_ops structure
  28. ========================
  29. To register a function callback, a ftrace_ops is required. This structure
  30. is used to tell ftrace what function should be called as the callback
  31. as well as what protections the callback will perform and not require
  32. ftrace to handle.
  33. There is only one field that is needed to be set when registering
  34. an ftrace_ops with ftrace:
  35. .. code-block:: c
  36. struct ftrace_ops ops = {
  37. .func = my_callback_func,
  38. .flags = MY_FTRACE_FLAGS
  39. .private = any_private_data_structure,
  40. };
  41. Both .flags and .private are optional. Only .func is required.
  42. To enable tracing call::
  43. register_ftrace_function(&ops);
  44. To disable tracing call::
  45. unregister_ftrace_function(&ops);
  46. The above is defined by including the header::
  47. #include <linux/ftrace.h>
  48. The registered callback will start being called some time after the
  49. register_ftrace_function() is called and before it returns. The exact time
  50. that callbacks start being called is dependent upon architecture and scheduling
  51. of services. The callback itself will have to handle any synchronization if it
  52. must begin at an exact moment.
  53. The unregister_ftrace_function() will guarantee that the callback is
  54. no longer being called by functions after the unregister_ftrace_function()
  55. returns. Note that to perform this guarantee, the unregister_ftrace_function()
  56. may take some time to finish.
  57. The callback function
  58. =====================
  59. The prototype of the callback function is as follows (as of v4.14):
  60. .. code-block:: c
  61. void callback_func(unsigned long ip, unsigned long parent_ip,
  62. struct ftrace_ops *op, struct pt_regs *regs);
  63. @ip
  64. This is the instruction pointer of the function that is being traced.
  65. (where the fentry or mcount is within the function)
  66. @parent_ip
  67. This is the instruction pointer of the function that called the
  68. the function being traced (where the call of the function occurred).
  69. @op
  70. This is a pointer to ftrace_ops that was used to register the callback.
  71. This can be used to pass data to the callback via the private pointer.
  72. @regs
  73. If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
  74. flags are set in the ftrace_ops structure, then this will be pointing
  75. to the pt_regs structure like it would be if an breakpoint was placed
  76. at the start of the function where ftrace was tracing. Otherwise it
  77. either contains garbage, or NULL.
  78. Protect your callback
  79. =====================
  80. As functions can be called from anywhere, and it is possible that a function
  81. called by a callback may also be traced, and call that same callback,
  82. recursion protection must be used. There are two helper functions that
  83. can help in this regard. If you start your code with:
  84. .. code-block:: c
  85. int bit;
  86. bit = ftrace_test_recursion_trylock(ip, parent_ip);
  87. if (bit < 0)
  88. return;
  89. and end it with:
  90. .. code-block:: c
  91. ftrace_test_recursion_unlock(bit);
  92. The code in between will be safe to use, even if it ends up calling a
  93. function that the callback is tracing. Note, on success,
  94. ftrace_test_recursion_trylock() will disable preemption, and the
  95. ftrace_test_recursion_unlock() will enable it again (if it was previously
  96. enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
  97. ftrace_test_recursion_trylock() to record where the recursion happened
  98. (if CONFIG_FTRACE_RECORD_RECURSION is set).
  99. Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
  100. (as explained below), then a helper trampoline will be used to test
  101. for recursion for the callback and no recursion test needs to be done.
  102. But this is at the expense of a slightly more overhead from an extra
  103. function call.
  104. If your callback accesses any data or critical section that requires RCU
  105. protection, it is best to make sure that RCU is "watching", otherwise
  106. that data or critical section will not be protected as expected. In this
  107. case add:
  108. .. code-block:: c
  109. if (!rcu_is_watching())
  110. return;
  111. Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops
  112. (as explained below), then a helper trampoline will be used to test
  113. for rcu_is_watching for the callback and no other test needs to be done.
  114. But this is at the expense of a slightly more overhead from an extra
  115. function call.
  116. The ftrace FLAGS
  117. ================
  118. The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
  119. Some of the flags are used for internal infrastructure of ftrace, but the
  120. ones that users should be aware of are the following:
  121. FTRACE_OPS_FL_SAVE_REGS
  122. If the callback requires reading or modifying the pt_regs
  123. passed to the callback, then it must set this flag. Registering
  124. a ftrace_ops with this flag set on an architecture that does not
  125. support passing of pt_regs to the callback will fail.
  126. FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
  127. Similar to SAVE_REGS but the registering of a
  128. ftrace_ops on an architecture that does not support passing of regs
  129. will not fail with this flag set. But the callback must check if
  130. regs is NULL or not to determine if the architecture supports it.
  131. FTRACE_OPS_FL_RECURSION
  132. By default, it is expected that the callback can handle recursion.
  133. But if the callback is not that worried about overehead, then
  134. setting this bit will add the recursion protection around the
  135. callback by calling a helper function that will do the recursion
  136. protection and only call the callback if it did not recurse.
  137. Note, if this flag is not set, and recursion does occur, it could
  138. cause the system to crash, and possibly reboot via a triple fault.
  139. Not, if this flag is set, then the callback will always be called
  140. with preemption disabled. If it is not set, then it is possible
  141. (but not guaranteed) that the callback will be called in
  142. preemptable context.
  143. FTRACE_OPS_FL_IPMODIFY
  144. Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
  145. the traced function (have another function called instead of the
  146. traced function), it requires setting this flag. This is what live
  147. kernel patches uses. Without this flag the pt_regs->ip can not be
  148. modified.
  149. Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
  150. registered to any given function at a time.
  151. FTRACE_OPS_FL_RCU
  152. If this is set, then the callback will only be called by functions
  153. where RCU is "watching". This is required if the callback function
  154. performs any rcu_read_lock() operation.
  155. RCU stops watching when the system goes idle, the time when a CPU
  156. is taken down and comes back online, and when entering from kernel
  157. to user space and back to kernel space. During these transitions,
  158. a callback may be executed and RCU synchronization will not protect
  159. it.
  160. FTRACE_OPS_FL_PERMANENT
  161. If this is set on any ftrace ops, then the tracing cannot disabled by
  162. writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with
  163. the flag set cannot be registered if ftrace_enabled is 0.
  164. Livepatch uses it not to lose the function redirection, so the system
  165. stays protected.
  166. Filtering which functions to trace
  167. ==================================
  168. If a callback is only to be called from specific functions, a filter must be
  169. set up. The filters are added by name, or ip if it is known.
  170. .. code-block:: c
  171. int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
  172. int len, int reset);
  173. @ops
  174. The ops to set the filter with
  175. @buf
  176. The string that holds the function filter text.
  177. @len
  178. The length of the string.
  179. @reset
  180. Non-zero to reset all filters before applying this filter.
  181. Filters denote which functions should be enabled when tracing is enabled.
  182. If @buf is NULL and reset is set, all functions will be enabled for tracing.
  183. The @buf can also be a glob expression to enable all functions that
  184. match a specific pattern.
  185. See Filter Commands in :file:`Documentation/trace/ftrace.rst`.
  186. To just trace the schedule function:
  187. .. code-block:: c
  188. ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
  189. To add more functions, call the ftrace_set_filter() more than once with the
  190. @reset parameter set to zero. To remove the current filter set and replace it
  191. with new functions defined by @buf, have @reset be non-zero.
  192. To remove all the filtered functions and trace all functions:
  193. .. code-block:: c
  194. ret = ftrace_set_filter(&ops, NULL, 0, 1);
  195. Sometimes more than one function has the same name. To trace just a specific
  196. function in this case, ftrace_set_filter_ip() can be used.
  197. .. code-block:: c
  198. ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
  199. Although the ip must be the address where the call to fentry or mcount is
  200. located in the function. This function is used by perf and kprobes that
  201. gets the ip address from the user (usually using debug info from the kernel).
  202. If a glob is used to set the filter, functions can be added to a "notrace"
  203. list that will prevent those functions from calling the callback.
  204. The "notrace" list takes precedence over the "filter" list. If the
  205. two lists are non-empty and contain the same functions, the callback will not
  206. be called by any function.
  207. An empty "notrace" list means to allow all functions defined by the filter
  208. to be traced.
  209. .. code-block:: c
  210. int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
  211. int len, int reset);
  212. This takes the same parameters as ftrace_set_filter() but will add the
  213. functions it finds to not be traced. This is a separate list from the
  214. filter list, and this function does not modify the filter list.
  215. A non-zero @reset will clear the "notrace" list before adding functions
  216. that match @buf to it.
  217. Clearing the "notrace" list is the same as clearing the filter list
  218. .. code-block:: c
  219. ret = ftrace_set_notrace(&ops, NULL, 0, 1);
  220. The filter and notrace lists may be changed at any time. If only a set of
  221. functions should call the callback, it is best to set the filters before
  222. registering the callback. But the changes may also happen after the callback
  223. has been registered.
  224. If a filter is in place, and the @reset is non-zero, and @buf contains a
  225. matching glob to functions, the switch will happen during the time of
  226. the ftrace_set_filter() call. At no time will all functions call the callback.
  227. .. code-block:: c
  228. ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
  229. register_ftrace_function(&ops);
  230. msleep(10);
  231. ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
  232. is not the same as:
  233. .. code-block:: c
  234. ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
  235. register_ftrace_function(&ops);
  236. msleep(10);
  237. ftrace_set_filter(&ops, NULL, 0, 1);
  238. ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
  239. As the latter will have a short time where all functions will call
  240. the callback, between the time of the reset, and the time of the
  241. new setting of the filter.