Root/
1 | REDUCING OS JITTER DUE TO PER-CPU KTHREADS |
2 | |
3 | This document lists per-CPU kthreads in the Linux kernel and presents |
4 | options to control their OS jitter. Note that non-per-CPU kthreads are |
5 | not listed here. To reduce OS jitter from non-per-CPU kthreads, bind |
6 | them to a "housekeeping" CPU dedicated to such work. |
7 | |
8 | |
9 | REFERENCES |
10 | |
11 | o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. |
12 | |
13 | o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. |
14 | |
15 | o man taskset: Using the taskset command to bind tasks to sets |
16 | of CPUs. |
17 | |
18 | o man sched_setaffinity: Using the sched_setaffinity() system |
19 | call to bind tasks to sets of CPUs. |
20 | |
21 | o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, |
22 | writing "0" to offline and "1" to online. |
23 | |
24 | o In order to locate kernel-generated OS jitter on CPU N: |
25 | |
26 | cd /sys/kernel/debug/tracing |
27 | echo 1 > max_graph_depth # Increase the "1" for more detail |
28 | echo function_graph > current_tracer |
29 | # run workload |
30 | cat per_cpu/cpuN/trace |
31 | |
32 | |
33 | KTHREADS |
34 | |
35 | Name: ehca_comp/%u |
36 | Purpose: Periodically process Infiniband-related work. |
37 | To reduce its OS jitter, do any of the following: |
38 | 1. Don't use eHCA Infiniband hardware, instead choosing hardware |
39 | that does not require per-CPU kthreads. This will prevent these |
40 | kthreads from being created in the first place. (This will |
41 | work for most people, as this hardware, though important, is |
42 | relatively old and is produced in relatively low unit volumes.) |
43 | 2. Do all eHCA-Infiniband-related work on other CPUs, including |
44 | interrupts. |
45 | 3. Rework the eHCA driver so that its per-CPU kthreads are |
46 | provisioned only on selected CPUs. |
47 | |
48 | |
49 | Name: irq/%d-%s |
50 | Purpose: Handle threaded interrupts. |
51 | To reduce its OS jitter, do the following: |
52 | 1. Use irq affinity to force the irq threads to execute on |
53 | some other CPU. |
54 | |
55 | Name: kcmtpd_ctr_%d |
56 | Purpose: Handle Bluetooth work. |
57 | To reduce its OS jitter, do one of the following: |
58 | 1. Don't use Bluetooth, in which case these kthreads won't be |
59 | created in the first place. |
60 | 2. Use irq affinity to force Bluetooth-related interrupts to |
61 | occur on some other CPU and furthermore initiate all |
62 | Bluetooth activity on some other CPU. |
63 | |
64 | Name: ksoftirqd/%u |
65 | Purpose: Execute softirq handlers when threaded or when under heavy load. |
66 | To reduce its OS jitter, each softirq vector must be handled |
67 | separately as follows: |
68 | TIMER_SOFTIRQ: Do all of the following: |
69 | 1. To the extent possible, keep the CPU out of the kernel when it |
70 | is non-idle, for example, by avoiding system calls and by forcing |
71 | both kernel threads and interrupts to execute elsewhere. |
72 | 2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force |
73 | the CPU offline, then bring it back online. This forces |
74 | recurring timers to migrate elsewhere. If you are concerned |
75 | with multiple CPUs, force them all offline before bringing the |
76 | first one back online. Once you have onlined the CPUs in question, |
77 | do not offline any other CPUs, because doing so could force the |
78 | timer back onto one of the CPUs in question. |
79 | NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: |
80 | 1. Force networking interrupts onto other CPUs. |
81 | 2. Initiate any network I/O on other CPUs. |
82 | 3. Once your application has started, prevent CPU-hotplug operations |
83 | from being initiated from tasks that might run on the CPU to |
84 | be de-jittered. (It is OK to force this CPU offline and then |
85 | bring it back online before you start your application.) |
86 | BLOCK_SOFTIRQ: Do all of the following: |
87 | 1. Force block-device interrupts onto some other CPU. |
88 | 2. Initiate any block I/O on other CPUs. |
89 | 3. Once your application has started, prevent CPU-hotplug operations |
90 | from being initiated from tasks that might run on the CPU to |
91 | be de-jittered. (It is OK to force this CPU offline and then |
92 | bring it back online before you start your application.) |
93 | BLOCK_IOPOLL_SOFTIRQ: Do all of the following: |
94 | 1. Force block-device interrupts onto some other CPU. |
95 | 2. Initiate any block I/O and block-I/O polling on other CPUs. |
96 | 3. Once your application has started, prevent CPU-hotplug operations |
97 | from being initiated from tasks that might run on the CPU to |
98 | be de-jittered. (It is OK to force this CPU offline and then |
99 | bring it back online before you start your application.) |
100 | TASKLET_SOFTIRQ: Do one or more of the following: |
101 | 1. Avoid use of drivers that use tasklets. (Such drivers will contain |
102 | calls to things like tasklet_schedule().) |
103 | 2. Convert all drivers that you must use from tasklets to workqueues. |
104 | 3. Force interrupts for drivers using tasklets onto other CPUs, |
105 | and also do I/O involving these drivers on other CPUs. |
106 | SCHED_SOFTIRQ: Do all of the following: |
107 | 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, |
108 | for example, ensure that at most one runnable kthread is present |
109 | on that CPU. If a thread that expects to run on the de-jittered |
110 | CPU awakens, the scheduler will send an IPI that can result in |
111 | a subsequent SCHED_SOFTIRQ. |
112 | 2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, |
113 | CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU |
114 | to be de-jittered is marked as an adaptive-ticks CPU using the |
115 | "nohz_full=" boot parameter. This reduces the number of |
116 | scheduler-clock interrupts that the de-jittered CPU receives, |
117 | minimizing its chances of being selected to do the load balancing |
118 | work that runs in SCHED_SOFTIRQ context. |
119 | 3. To the extent possible, keep the CPU out of the kernel when it |
120 | is non-idle, for example, by avoiding system calls and by |
121 | forcing both kernel threads and interrupts to execute elsewhere. |
122 | This further reduces the number of scheduler-clock interrupts |
123 | received by the de-jittered CPU. |
124 | HRTIMER_SOFTIRQ: Do all of the following: |
125 | 1. To the extent possible, keep the CPU out of the kernel when it |
126 | is non-idle. For example, avoid system calls and force both |
127 | kernel threads and interrupts to execute elsewhere. |
128 | 2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the |
129 | CPU offline, then bring it back online. This forces recurring |
130 | timers to migrate elsewhere. If you are concerned with multiple |
131 | CPUs, force them all offline before bringing the first one |
132 | back online. Once you have onlined the CPUs in question, do not |
133 | offline any other CPUs, because doing so could force the timer |
134 | back onto one of the CPUs in question. |
135 | RCU_SOFTIRQ: Do at least one of the following: |
136 | 1. Offload callbacks and keep the CPU in either dyntick-idle or |
137 | adaptive-ticks state by doing all of the following: |
138 | a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, |
139 | CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU |
140 | to be de-jittered is marked as an adaptive-ticks CPU using |
141 | the "nohz_full=" boot parameter. Bind the rcuo kthreads |
142 | to housekeeping CPUs, which can tolerate OS jitter. |
143 | b. To the extent possible, keep the CPU out of the kernel |
144 | when it is non-idle, for example, by avoiding system |
145 | calls and by forcing both kernel threads and interrupts |
146 | to execute elsewhere. |
147 | 2. Enable RCU to do its processing remotely via dyntick-idle by |
148 | doing all of the following: |
149 | a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. |
150 | b. Ensure that the CPU goes idle frequently, allowing other |
151 | CPUs to detect that it has passed through an RCU quiescent |
152 | state. If the kernel is built with CONFIG_NO_HZ_FULL=y, |
153 | userspace execution also allows other CPUs to detect that |
154 | the CPU in question has passed through a quiescent state. |
155 | c. To the extent possible, keep the CPU out of the kernel |
156 | when it is non-idle, for example, by avoiding system |
157 | calls and by forcing both kernel threads and interrupts |
158 | to execute elsewhere. |
159 | |
160 | Name: kworker/%u:%d%s (cpu, id, priority) |
161 | Purpose: Execute workqueue requests |
162 | To reduce its OS jitter, do any of the following: |
163 | 1. Run your workload at a real-time priority, which will allow |
164 | preempting the kworker daemons. |
165 | 2. Do any of the following needed to avoid jitter that your |
166 | application cannot tolerate: |
167 | a. Build your kernel with CONFIG_SLUB=y rather than |
168 | CONFIG_SLAB=y, thus avoiding the slab allocator's periodic |
169 | use of each CPU's workqueues to run its cache_reap() |
170 | function. |
171 | b. Avoid using oprofile, thus avoiding OS jitter from |
172 | wq_sync_buffer(). |
173 | c. Limit your CPU frequency so that a CPU-frequency |
174 | governor is not required, possibly enlisting the aid of |
175 | special heatsinks or other cooling technologies. If done |
176 | correctly, and if you CPU architecture permits, you should |
177 | be able to build your kernel with CONFIG_CPU_FREQ=n to |
178 | avoid the CPU-frequency governor periodically running |
179 | on each CPU, including cs_dbs_timer() and od_dbs_timer(). |
180 | WARNING: Please check your CPU specifications to |
181 | make sure that this is safe on your particular system. |
182 | d. It is not possible to entirely get rid of OS jitter |
183 | from vmstat_update() on CONFIG_SMP=y systems, but you |
184 | can decrease its frequency by writing a large value to |
185 | /proc/sys/vm/stat_interval. The default value is HZ, |
186 | for an interval of one second. Of course, larger values |
187 | will make your virtual-memory statistics update more |
188 | slowly. Of course, you can also run your workload at |
189 | a real-time priority, thus preempting vmstat_update(). |
190 | e. If running on high-end powerpc servers, build with |
191 | CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS |
192 | daemon from running on each CPU every second or so. |
193 | (This will require editing Kconfig files and will defeat |
194 | this platform's RAS functionality.) This avoids jitter |
195 | due to the rtas_event_scan() function. |
196 | WARNING: Please check your CPU specifications to |
197 | make sure that this is safe on your particular system. |
198 | f. If running on Cell Processor, build your kernel with |
199 | CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from |
200 | spu_gov_work(). |
201 | WARNING: Please check your CPU specifications to |
202 | make sure that this is safe on your particular system. |
203 | g. If running on PowerMAC, build your kernel with |
204 | CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, |
205 | avoiding OS jitter from rackmeter_do_timer(). |
206 | |
207 | Name: rcuc/%u |
208 | Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. |
209 | To reduce its OS jitter, do at least one of the following: |
210 | 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these |
211 | kthreads from being created in the first place, and also obviates |
212 | the need for RCU priority boosting. This approach is feasible |
213 | for workloads that do not require high degrees of responsiveness. |
214 | 2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these |
215 | kthreads from being created in the first place. This approach |
216 | is feasible only if your workload never requires RCU priority |
217 | boosting, for example, if you ensure frequent idle time on all |
218 | CPUs that might execute within the kernel. |
219 | 3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, |
220 | which offloads all RCU callbacks to kthreads that can be moved |
221 | off of CPUs susceptible to OS jitter. This approach prevents the |
222 | rcuc/%u kthreads from having any work to do, so that they are |
223 | never awakened. |
224 | 4. Ensure that the CPU never enters the kernel, and, in particular, |
225 | avoid initiating any CPU hotplug operations on this CPU. This is |
226 | another way of preventing any callbacks from being queued on the |
227 | CPU, again preventing the rcuc/%u kthreads from having any work |
228 | to do. |
229 | |
230 | Name: rcuob/%d, rcuop/%d, and rcuos/%d |
231 | Purpose: Offload RCU callbacks from the corresponding CPU. |
232 | To reduce its OS jitter, do at least one of the following: |
233 | 1. Use affinity, cgroups, or other mechanism to force these kthreads |
234 | to execute on some other CPU. |
235 | 2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these |
236 | kthreads from being created in the first place. However, please |
237 | note that this will not eliminate OS jitter, but will instead |
238 | shift it to RCU_SOFTIRQ. |
239 | |
240 | Name: watchdog/%u |
241 | Purpose: Detect software lockups on each CPU. |
242 | To reduce its OS jitter, do at least one of the following: |
243 | 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these |
244 | kthreads from being created in the first place. |
245 | 2. Echo a zero to /proc/sys/kernel/watchdog to disable the |
246 | watchdog timer. |
247 | 3. Echo a large number of /proc/sys/kernel/watchdog_thresh in |
248 | order to reduce the frequency of OS jitter due to the watchdog |
249 | timer down to a level that is acceptable for your workload. |
250 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9