Root/
1 | ===================================== |
2 | FUJITSU FR-V KERNEL ATOMIC OPERATIONS |
3 | ===================================== |
4 | |
5 | On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI |
6 | instruction. Unfortunately, this alone can't be used to implement the following operations: |
7 | |
8 | (*) Atomic add to memory |
9 | |
10 | (*) Atomic subtract from memory |
11 | |
12 | (*) Atomic bit modification (set, clear or invert) |
13 | |
14 | (*) Atomic compare and exchange |
15 | |
16 | On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable |
17 | interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be |
18 | done twice. This means the CPU runs for a relatively long time with interrupts disabled, |
19 | potentially having a great effect on interrupt latency. |
20 | |
21 | |
22 | ============= |
23 | NEW ALGORITHM |
24 | ============= |
25 | |
26 | To get around this, the following algorithm has been implemented. It operates in a way similar to |
27 | the LL/SC instruction pairs supported on a number of platforms. |
28 | |
29 | (*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag. |
30 | |
31 | (*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined |
32 | state). |
33 | |
34 | (*) All atomic operations can then be broken down into the following algorithm: |
35 | |
36 | (1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR). |
37 | |
38 | (2) Load the value currently in the memory to be modified into a register. |
39 | |
40 | (3) Make changes to the value. |
41 | |
42 | (4) If CC3 is still True, simultaneously and atomically (by VLIW packing): |
43 | |
44 | (a) Store the modified value back to memory. |
45 | |
46 | (b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current |
47 | task pointer in the kernel, and so is guaranteed to be non-zero). |
48 | |
49 | (5) If ICC3.Z is still true, go back to step (1). |
50 | |
51 | This works in a non-SMP environment because any interrupt or other exception that happens between |
52 | steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the |
53 | condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1). |
54 | |
55 | |
56 | This algorithm suffers from two problems: |
57 | |
58 | (1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or |
59 | not any changes were made to the target memory location during that exception. |
60 | |
61 | (2) The branch from step (5) back to step (1) may have to happen more than once until the store |
62 | manages to take place. In theory, this loop could cycle forever because there are too many |
63 | interrupts coming in, but it's unlikely. |
64 | |
65 | |
66 | ======= |
67 | EXAMPLE |
68 | ======= |
69 | |
70 | Taking an example from include/asm-frv/atomic.h: |
71 | |
72 | static inline int atomic_add_return(int i, atomic_t *v) |
73 | { |
74 | unsigned long val; |
75 | |
76 | asm("0: \n" |
77 | |
78 | It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the |
79 | True state. |
80 | |
81 | " orcc gr0,gr0,gr0,icc3 \n" <-- (1) |
82 | " ckeq icc3,cc7 \n" <-- (1) |
83 | |
84 | Then it does the load. Note that the final phase of step (1) is done at the same time as the |
85 | load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be |
86 | removed without swapping the order of these two instructions. |
87 | |
88 | " ld.p %M0,%1 \n" <-- (2) |
89 | " orcr cc7,cc7,cc3 \n" <-- (1) |
90 | |
91 | Then the proposed modification is generated. Note that the old value can be retained if required |
92 | (such as in test_and_set_bit()). |
93 | |
94 | " add%I2 %1,%2,%1 \n" <-- (3) |
95 | |
96 | Then it attempts to store the value back, contingent on no exception having cleared CC3 since it |
97 | was set to True. |
98 | |
99 | " cst.p %1,%M0 ,cc3,#1 \n" <-- (4a) |
100 | |
101 | It simultaneously records the success or failure of the store in ICC3.Z. |
102 | |
103 | " corcc gr29,gr29,gr0 ,cc3,#1 \n" <-- (4b) |
104 | |
105 | Such that the branch can then be taken if the operation was aborted. |
106 | |
107 | " beq icc3,#0,0b \n" <-- (5) |
108 | : "+U"(v->counter), "=&r"(val) |
109 | : "NPr"(i) |
110 | : "memory", "cc7", "cc3", "icc3" |
111 | ); |
112 | |
113 | return val; |
114 | } |
115 | |
116 | |
117 | ============= |
118 | CONFIGURATION |
119 | ============= |
120 | |
121 | The atomic ops implementation can be made inline or out-of-line by changing the |
122 | CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of |
123 | advantages: |
124 | |
125 | - The resulting kernel image may be smaller |
126 | - Debugging is easier as atomic ops can just be stepped over and they can be breakpointed |
127 | |
128 | Keeping it inline also has a number of advantages: |
129 | |
130 | - The resulting kernel may be Faster |
131 | - no out-of-line function calls need to be made |
132 | - the compiler doesn't have half its registers clobbered by making a call |
133 | |
134 | The out-of-line implementations live in arch/frv/lib/atomic-ops.S. |
135 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9