Root/
Source at commit b386be689295730688885552666ea40b2e639b14 created 11 years 11 months ago. By Maarten ter Huurne, Revert "MIPS: JZ4740: reset: Initialize hibernate wakeup counters." | |
---|---|
1 | Lightweight PI-futexes |
2 | ---------------------- |
3 | |
4 | We are calling them lightweight for 3 reasons: |
5 | |
6 | - in the user-space fastpath a PI-enabled futex involves no kernel work |
7 | (or any other PI complexity) at all. No registration, no extra kernel |
8 | calls - just pure fast atomic ops in userspace. |
9 | |
10 | - even in the slowpath, the system call and scheduling pattern is very |
11 | similar to normal futexes. |
12 | |
13 | - the in-kernel PI implementation is streamlined around the mutex |
14 | abstraction, with strict rules that keep the implementation |
15 | relatively simple: only a single owner may own a lock (i.e. no |
16 | read-write lock support), only the owner may unlock a lock, no |
17 | recursive locking, etc. |
18 | |
19 | Priority Inheritance - why? |
20 | --------------------------- |
21 | |
22 | The short reply: user-space PI helps achieving/improving determinism for |
23 | user-space applications. In the best-case, it can help achieve |
24 | determinism and well-bound latencies. Even in the worst-case, PI will |
25 | improve the statistical distribution of locking related application |
26 | delays. |
27 | |
28 | The longer reply: |
29 | ----------------- |
30 | |
31 | Firstly, sharing locks between multiple tasks is a common programming |
32 | technique that often cannot be replaced with lockless algorithms. As we |
33 | can see it in the kernel [which is a quite complex program in itself], |
34 | lockless structures are rather the exception than the norm - the current |
35 | ratio of lockless vs. locky code for shared data structures is somewhere |
36 | between 1:10 and 1:100. Lockless is hard, and the complexity of lockless |
37 | algorithms often endangers to ability to do robust reviews of said code. |
38 | I.e. critical RT apps often choose lock structures to protect critical |
39 | data structures, instead of lockless algorithms. Furthermore, there are |
40 | cases (like shared hardware, or other resource limits) where lockless |
41 | access is mathematically impossible. |
42 | |
43 | Media players (such as Jack) are an example of reasonable application |
44 | design with multiple tasks (with multiple priority levels) sharing |
45 | short-held locks: for example, a highprio audio playback thread is |
46 | combined with medium-prio construct-audio-data threads and low-prio |
47 | display-colory-stuff threads. Add video and decoding to the mix and |
48 | we've got even more priority levels. |
49 | |
50 | So once we accept that synchronization objects (locks) are an |
51 | unavoidable fact of life, and once we accept that multi-task userspace |
52 | apps have a very fair expectation of being able to use locks, we've got |
53 | to think about how to offer the option of a deterministic locking |
54 | implementation to user-space. |
55 | |
56 | Most of the technical counter-arguments against doing priority |
57 | inheritance only apply to kernel-space locks. But user-space locks are |
58 | different, there we cannot disable interrupts or make the task |
59 | non-preemptible in a critical section, so the 'use spinlocks' argument |
60 | does not apply (user-space spinlocks have the same priority inversion |
61 | problems as other user-space locking constructs). Fact is, pretty much |
62 | the only technique that currently enables good determinism for userspace |
63 | locks (such as futex-based pthread mutexes) is priority inheritance: |
64 | |
65 | Currently (without PI), if a high-prio and a low-prio task shares a lock |
66 | [this is a quite common scenario for most non-trivial RT applications], |
67 | even if all critical sections are coded carefully to be deterministic |
68 | (i.e. all critical sections are short in duration and only execute a |
69 | limited number of instructions), the kernel cannot guarantee any |
70 | deterministic execution of the high-prio task: any medium-priority task |
71 | could preempt the low-prio task while it holds the shared lock and |
72 | executes the critical section, and could delay it indefinitely. |
73 | |
74 | Implementation: |
75 | --------------- |
76 | |
77 | As mentioned before, the userspace fastpath of PI-enabled pthread |
78 | mutexes involves no kernel work at all - they behave quite similarly to |
79 | normal futex-based locks: a 0 value means unlocked, and a value==TID |
80 | means locked. (This is the same method as used by list-based robust |
81 | futexes.) Userspace uses atomic ops to lock/unlock these mutexes without |
82 | entering the kernel. |
83 | |
84 | To handle the slowpath, we have added two new futex ops: |
85 | |
86 | FUTEX_LOCK_PI |
87 | FUTEX_UNLOCK_PI |
88 | |
89 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to |
90 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the |
91 | remaining work: if there is no futex-queue attached to the futex address |
92 | yet then the code looks up the task that owns the futex [it has put its |
93 | own TID into the futex value], and attaches a 'PI state' structure to |
94 | the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, |
95 | kernel-based synchronization object. The 'other' task is made the owner |
96 | of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the |
97 | futex value. Then this task tries to lock the rt-mutex, on which it |
98 | blocks. Once it returns, it has the mutex acquired, and it sets the |
99 | futex value to its own TID and returns. Userspace has no other work to |
100 | perform - it now owns the lock, and futex value contains |
101 | FUTEX_WAITERS|TID. |
102 | |
103 | If the unlock side fastpath succeeds, [i.e. userspace manages to do a |
104 | TID -> 0 atomic transition of the futex value], then no kernel work is |
105 | triggered. |
106 | |
107 | If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), |
108 | then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the |
109 | behalf of userspace - and it also unlocks the attached |
110 | pi_state->rt_mutex and thus wakes up any potential waiters. |
111 | |
112 | Note that under this approach, contrary to previous PI-futex approaches, |
113 | there is no prior 'registration' of a PI-futex. [which is not quite |
114 | possible anyway, due to existing ABI properties of pthread mutexes.] |
115 | |
116 | Also, under this scheme, 'robustness' and 'PI' are two orthogonal |
117 | properties of futexes, and all four combinations are possible: futex, |
118 | robust-futex, PI-futex, robust+PI-futex. |
119 | |
120 | More details about priority inheritance can be found in |
121 | Documentation/rt-mutex.txt. |
122 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9