123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258 |
- CONTROL DEPENDENCIES
- ====================
- A major difficulty with control dependencies is that current compilers
- do not support them. One purpose of this document is therefore to
- help you prevent your compiler from breaking your code. However,
- control dependencies also pose other challenges, which leads to the
- second purpose of this document, namely to help you to avoid breaking
- your own code, even in the absence of help from your compiler.
- One such challenge is that control dependencies order only later stores.
- Therefore, a load-load control dependency will not preserve ordering
- unless a read memory barrier is provided. Consider the following code:
- q = READ_ONCE(a);
- if (q)
- p = READ_ONCE(b);
- This is not guaranteed to provide any ordering because some types of CPUs
- are permitted to predict the result of the load from "b". This prediction
- can cause other CPUs to see this load as having happened before the load
- from "a". This means that an explicit read barrier is required, for example
- as follows:
- q = READ_ONCE(a);
- if (q) {
- smp_rmb();
- p = READ_ONCE(b);
- }
- However, stores are not speculated. This means that ordering is
- (usually) guaranteed for load-store control dependencies, as in the
- following example:
- q = READ_ONCE(a);
- if (q)
- WRITE_ONCE(b, 1);
- Control dependencies can pair with each other and with other types
- of ordering. But please note that neither the READ_ONCE() nor the
- WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might
- fuse the load from "a" with other loads. Without the WRITE_ONCE(),
- the compiler might fuse the store to "b" with other stores. Worse yet,
- the compiler might convert the store into a load and a check followed
- by a store, and this compiler-generated load would not be ordered by
- the control dependency.
- Furthermore, if the compiler is able to prove that the value of variable
- "a" is always non-zero, it would be well within its rights to optimize
- the original example by eliminating the "if" statement as follows:
- q = a;
- b = 1; /* BUG: Compiler and CPU can both reorder!!! */
- So don't leave out either the READ_ONCE() or the WRITE_ONCE().
- In particular, although READ_ONCE() does force the compiler to emit a
- load, it does *not* force the compiler to actually use the loaded value.
- It is tempting to try use control dependencies to enforce ordering on
- identical stores on both branches of the "if" statement as follows:
- q = READ_ONCE(a);
- if (q) {
- barrier();
- WRITE_ONCE(b, 1);
- do_something();
- } else {
- barrier();
- WRITE_ONCE(b, 1);
- do_something_else();
- }
- Unfortunately, current compilers will transform this as follows at high
- optimization levels:
- q = READ_ONCE(a);
- barrier();
- WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */
- if (q) {
- /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
- do_something();
- } else {
- /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
- do_something_else();
- }
- Now there is no conditional between the load from "a" and the store to
- "b", which means that the CPU is within its rights to reorder them: The
- conditional is absolutely required, and must be present in the final
- assembly code, after all of the compiler and link-time optimizations
- have been applied. Therefore, if you need ordering in this example,
- you must use explicit memory ordering, for example, smp_store_release():
- q = READ_ONCE(a);
- if (q) {
- smp_store_release(&b, 1);
- do_something();
- } else {
- smp_store_release(&b, 1);
- do_something_else();
- }
- Without explicit memory ordering, control-dependency-based ordering is
- guaranteed only when the stores differ, for example:
- q = READ_ONCE(a);
- if (q) {
- WRITE_ONCE(b, 1);
- do_something();
- } else {
- WRITE_ONCE(b, 2);
- do_something_else();
- }
- The initial READ_ONCE() is still required to prevent the compiler from
- knowing too much about the value of "a".
- But please note that you need to be careful what you do with the local
- variable "q", otherwise the compiler might be able to guess the value
- and again remove the conditional branch that is absolutely required to
- preserve ordering. For example:
- q = READ_ONCE(a);
- if (q % MAX) {
- WRITE_ONCE(b, 1);
- do_something();
- } else {
- WRITE_ONCE(b, 2);
- do_something_else();
- }
- If MAX is compile-time defined to be 1, then the compiler knows that
- (q % MAX) must be equal to zero, regardless of the value of "q".
- The compiler is therefore within its rights to transform the above code
- into the following:
- q = READ_ONCE(a);
- WRITE_ONCE(b, 2);
- do_something_else();
- Given this transformation, the CPU is not required to respect the ordering
- between the load from variable "a" and the store to variable "b". It is
- tempting to add a barrier(), but this does not help. The conditional
- is gone, and the barrier won't bring it back. Therefore, if you need
- to relying on control dependencies to produce this ordering, you should
- make sure that MAX is greater than one, perhaps as follows:
- q = READ_ONCE(a);
- BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
- if (q % MAX) {
- WRITE_ONCE(b, 1);
- do_something();
- } else {
- WRITE_ONCE(b, 2);
- do_something_else();
- }
- Please note once again that each leg of the "if" statement absolutely
- must store different values to "b". As in previous examples, if the two
- values were identical, the compiler could pull this store outside of the
- "if" statement, destroying the control dependency's ordering properties.
- You must also be careful avoid relying too much on boolean short-circuit
- evaluation. Consider this example:
- q = READ_ONCE(a);
- if (q || 1 > 0)
- WRITE_ONCE(b, 1);
- Because the first condition cannot fault and the second condition is
- always true, the compiler can transform this example as follows, again
- destroying the control dependency's ordering:
- q = READ_ONCE(a);
- WRITE_ONCE(b, 1);
- This is yet another example showing the importance of preventing the
- compiler from out-guessing your code. Again, although READ_ONCE() really
- does force the compiler to emit code for a given load, the compiler is
- within its rights to discard the loaded value.
- In addition, control dependencies apply only to the then-clause and
- else-clause of the "if" statement in question. In particular, they do
- not necessarily order the code following the entire "if" statement:
- q = READ_ONCE(a);
- if (q) {
- WRITE_ONCE(b, 1);
- } else {
- WRITE_ONCE(b, 2);
- }
- WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */
- It is tempting to argue that there in fact is ordering because the
- compiler cannot reorder volatile accesses and also cannot reorder
- the writes to "b" with the condition. Unfortunately for this line
- of reasoning, the compiler might compile the two writes to "b" as
- conditional-move instructions, as in this fanciful pseudo-assembly
- language:
- ld r1,a
- cmp r1,$0
- cmov,ne r4,$1
- cmov,eq r4,$2
- st r4,b
- st $1,c
- The control dependencies would then extend only to the pair of cmov
- instructions and the store depending on them. This means that a weakly
- ordered CPU would have no dependency of any sort between the load from
- "a" and the store to "c". In short, control dependencies provide ordering
- only to the stores in the then-clause and else-clause of the "if" statement
- in question (including functions invoked by those two clauses), and not
- to code following that "if" statement.
- In summary:
- (*) Control dependencies can order prior loads against later stores.
- However, they do *not* guarantee any other sort of ordering:
- Not prior loads against later loads, nor prior stores against
- later anything. If you need these other forms of ordering, use
- smp_load_acquire(), smp_store_release(), or, in the case of prior
- stores and later loads, smp_mb().
- (*) If both legs of the "if" statement contain identical stores to
- the same variable, then you must explicitly order those stores,
- either by preceding both of them with smp_mb() or by using
- smp_store_release(). Please note that it is *not* sufficient to use
- barrier() at beginning and end of each leg of the "if" statement
- because, as shown by the example above, optimizing compilers can
- destroy the control dependency while respecting the letter of the
- barrier() law.
- (*) Control dependencies require at least one run-time conditional
- between the prior load and the subsequent store, and this
- conditional must involve the prior load. If the compiler is able
- to optimize the conditional away, it will have also optimized
- away the ordering. Careful use of READ_ONCE() and WRITE_ONCE()
- can help to preserve the needed conditional.
- (*) Control dependencies require that the compiler avoid reordering the
- dependency into nonexistence. Careful use of READ_ONCE() or
- atomic{,64}_read() can help to preserve your control dependency.
- (*) Control dependencies apply only to the then-clause and else-clause
- of the "if" statement containing the control dependency, including
- any functions that these two clauses call. Control dependencies
- do *not* apply to code beyond the end of that "if" statement.
- (*) Control dependencies pair normally with other types of barriers.
- (*) Control dependencies do *not* provide multicopy atomicity. If you
- need all the CPUs to agree on the ordering of a given store against
- all other accesses, use smp_mb().
- (*) Compilers do not understand control dependencies. It is therefore
- your job to ensure that they do not break your code.
|