Root/
1 | Runtime Power Management Framework for I/O Devices |
2 | |
3 | (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. |
4 | (C) 2010 Alan Stern <stern@rowland.harvard.edu> |
5 | |
6 | 1. Introduction |
7 | |
8 | Support for runtime power management (runtime PM) of I/O devices is provided |
9 | at the power management core (PM core) level by means of: |
10 | |
11 | * The power management workqueue pm_wq in which bus types and device drivers can |
12 | put their PM-related work items. It is strongly recommended that pm_wq be |
13 | used for queuing all work items related to runtime PM, because this allows |
14 | them to be synchronized with system-wide power transitions (suspend to RAM, |
15 | hibernation and resume from system sleep states). pm_wq is declared in |
16 | include/linux/pm_runtime.h and defined in kernel/power/main.c. |
17 | |
18 | * A number of runtime PM fields in the 'power' member of 'struct device' (which |
19 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can |
20 | be used for synchronizing runtime PM operations with one another. |
21 | |
22 | * Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in |
23 | include/linux/pm.h). |
24 | |
25 | * A set of helper functions defined in drivers/base/power/runtime.c that can be |
26 | used for carrying out runtime PM operations in such a way that the |
27 | synchronization between them is taken care of by the PM core. Bus types and |
28 | device drivers are encouraged to use these functions. |
29 | |
30 | The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM |
31 | fields of 'struct dev_pm_info' and the core helper functions provided for |
32 | runtime PM are described below. |
33 | |
34 | 2. Device Runtime PM Callbacks |
35 | |
36 | There are three device runtime PM callbacks defined in 'struct dev_pm_ops': |
37 | |
38 | struct dev_pm_ops { |
39 | ... |
40 | int (*runtime_suspend)(struct device *dev); |
41 | int (*runtime_resume)(struct device *dev); |
42 | int (*runtime_idle)(struct device *dev); |
43 | ... |
44 | }; |
45 | |
46 | The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are |
47 | executed by the PM core for either the device type, or the class (if the device |
48 | type's struct dev_pm_ops object does not exist), or the bus type (if the |
49 | device type's and class' struct dev_pm_ops objects do not exist) of the given |
50 | device (this allows device types to override callbacks provided by bus types or |
51 | classes if necessary). The bus type, device type and class callbacks are |
52 | referred to as subsystem-level callbacks in what follows. |
53 | |
54 | By default, the callbacks are always invoked in process context with interrupts |
55 | enabled. However, subsystems can use the pm_runtime_irq_safe() helper function |
56 | to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume() |
57 | callbacks should be invoked in atomic context with interrupts disabled. |
58 | This implies that these callback routines must not block or sleep, but it also |
59 | means that the synchronous helper functions listed at the end of Section 4 can |
60 | be used within an interrupt handler or in an atomic context. |
61 | |
62 | The subsystem-level suspend callback is _entirely_ _responsible_ for handling |
63 | the suspend of the device as appropriate, which may, but need not include |
64 | executing the device driver's own ->runtime_suspend() callback (from the |
65 | PM core's point of view it is not necessary to implement a ->runtime_suspend() |
66 | callback in a device driver as long as the subsystem-level suspend callback |
67 | knows what to do to handle the device). |
68 | |
69 | * Once the subsystem-level suspend callback has completed successfully |
70 | for given device, the PM core regards the device as suspended, which need |
71 | not mean that the device has been put into a low power state. It is |
72 | supposed to mean, however, that the device will not process data and will |
73 | not communicate with the CPU(s) and RAM until the subsystem-level resume |
74 | callback is executed for it. The runtime PM status of a device after |
75 | successful execution of the subsystem-level suspend callback is 'suspended'. |
76 | |
77 | * If the subsystem-level suspend callback returns -EBUSY or -EAGAIN, |
78 | the device's runtime PM status is 'active', which means that the device |
79 | _must_ be fully operational afterwards. |
80 | |
81 | * If the subsystem-level suspend callback returns an error code different |
82 | from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will |
83 | refuse to run the helper functions described in Section 4 for the device, |
84 | until the status of it is directly set either to 'active', or to 'suspended' |
85 | (the PM core provides special helper functions for this purpose). |
86 | |
87 | In particular, if the driver requires remote wake-up capability (i.e. hardware |
88 | mechanism allowing the device to request a change of its power state, such as |
89 | PCI PME) for proper functioning and device_run_wake() returns 'false' for the |
90 | device, then ->runtime_suspend() should return -EBUSY. On the other hand, if |
91 | device_run_wake() returns 'true' for the device and the device is put into a low |
92 | power state during the execution of the subsystem-level suspend callback, it is |
93 | expected that remote wake-up will be enabled for the device. Generally, remote |
94 | wake-up should be enabled for all input devices put into a low power state at |
95 | run time. |
96 | |
97 | The subsystem-level resume callback is _entirely_ _responsible_ for handling the |
98 | resume of the device as appropriate, which may, but need not include executing |
99 | the device driver's own ->runtime_resume() callback (from the PM core's point of |
100 | view it is not necessary to implement a ->runtime_resume() callback in a device |
101 | driver as long as the subsystem-level resume callback knows what to do to handle |
102 | the device). |
103 | |
104 | * Once the subsystem-level resume callback has completed successfully, the PM |
105 | core regards the device as fully operational, which means that the device |
106 | _must_ be able to complete I/O operations as needed. The runtime PM status |
107 | of the device is then 'active'. |
108 | |
109 | * If the subsystem-level resume callback returns an error code, the PM core |
110 | regards this as a fatal error and will refuse to run the helper functions |
111 | described in Section 4 for the device, until its status is directly set |
112 | either to 'active' or to 'suspended' (the PM core provides special helper |
113 | functions for this purpose). |
114 | |
115 | The subsystem-level idle callback is executed by the PM core whenever the device |
116 | appears to be idle, which is indicated to the PM core by two counters, the |
117 | device's usage counter and the counter of 'active' children of the device. |
118 | |
119 | * If any of these counters is decreased using a helper function provided by |
120 | the PM core and it turns out to be equal to zero, the other counter is |
121 | checked. If that counter also is equal to zero, the PM core executes the |
122 | subsystem-level idle callback with the device as an argument. |
123 | |
124 | The action performed by a subsystem-level idle callback is totally dependent on |
125 | the subsystem in question, but the expected and recommended action is to check |
126 | if the device can be suspended (i.e. if all of the conditions necessary for |
127 | suspending the device are satisfied) and to queue up a suspend request for the |
128 | device in that case. The value returned by this callback is ignored by the PM |
129 | core. |
130 | |
131 | The helper functions provided by the PM core, described in Section 4, guarantee |
132 | that the following constraints are met with respect to the bus type's runtime |
133 | PM callbacks: |
134 | |
135 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute |
136 | ->runtime_suspend() in parallel with ->runtime_resume() or with another |
137 | instance of ->runtime_suspend() for the same device) with the exception that |
138 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with |
139 | ->runtime_idle() (although ->runtime_idle() will not be started while any |
140 | of the other callbacks is being executed for the same device). |
141 | |
142 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' |
143 | devices (i.e. the PM core will only execute ->runtime_idle() or |
144 | ->runtime_suspend() for the devices the runtime PM status of which is |
145 | 'active'). |
146 | |
147 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device |
148 | the usage counter of which is equal to zero _and_ either the counter of |
149 | 'active' children of which is equal to zero, or the 'power.ignore_children' |
150 | flag of which is set. |
151 | |
152 | (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the |
153 | PM core will only execute ->runtime_resume() for the devices the runtime |
154 | PM status of which is 'suspended'). |
155 | |
156 | Additionally, the helper functions provided by the PM core obey the following |
157 | rules: |
158 | |
159 | * If ->runtime_suspend() is about to be executed or there's a pending request |
160 | to execute it, ->runtime_idle() will not be executed for the same device. |
161 | |
162 | * A request to execute or to schedule the execution of ->runtime_suspend() |
163 | will cancel any pending requests to execute ->runtime_idle() for the same |
164 | device. |
165 | |
166 | * If ->runtime_resume() is about to be executed or there's a pending request |
167 | to execute it, the other callbacks will not be executed for the same device. |
168 | |
169 | * A request to execute ->runtime_resume() will cancel any pending or |
170 | scheduled requests to execute the other callbacks for the same device, |
171 | except for scheduled autosuspends. |
172 | |
173 | 3. Runtime PM Device Fields |
174 | |
175 | The following device runtime PM fields are present in 'struct dev_pm_info', as |
176 | defined in include/linux/pm.h: |
177 | |
178 | struct timer_list suspend_timer; |
179 | - timer used for scheduling (delayed) suspend and autosuspend requests |
180 | |
181 | unsigned long timer_expires; |
182 | - timer expiration time, in jiffies (if this is different from zero, the |
183 | timer is running and will expire at that time, otherwise the timer is not |
184 | running) |
185 | |
186 | struct work_struct work; |
187 | - work structure used for queuing up requests (i.e. work items in pm_wq) |
188 | |
189 | wait_queue_head_t wait_queue; |
190 | - wait queue used if any of the helper functions needs to wait for another |
191 | one to complete |
192 | |
193 | spinlock_t lock; |
194 | - lock used for synchronisation |
195 | |
196 | atomic_t usage_count; |
197 | - the usage counter of the device |
198 | |
199 | atomic_t child_count; |
200 | - the count of 'active' children of the device |
201 | |
202 | unsigned int ignore_children; |
203 | - if set, the value of child_count is ignored (but still updated) |
204 | |
205 | unsigned int disable_depth; |
206 | - used for disabling the helper funcions (they work normally if this is |
207 | equal to zero); the initial value of it is 1 (i.e. runtime PM is |
208 | initially disabled for all devices) |
209 | |
210 | unsigned int runtime_error; |
211 | - if set, there was a fatal error (one of the callbacks returned error code |
212 | as described in Section 2), so the helper funtions will not work until |
213 | this flag is cleared; this is the error code returned by the failing |
214 | callback |
215 | |
216 | unsigned int idle_notification; |
217 | - if set, ->runtime_idle() is being executed |
218 | |
219 | unsigned int request_pending; |
220 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) |
221 | |
222 | enum rpm_request request; |
223 | - type of request that's pending (valid if request_pending is set) |
224 | |
225 | unsigned int deferred_resume; |
226 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is |
227 | being executed for that device and it is not practical to wait for the |
228 | suspend to complete; means "start a resume as soon as you've suspended" |
229 | |
230 | unsigned int run_wake; |
231 | - set if the device is capable of generating runtime wake-up events |
232 | |
233 | enum rpm_status runtime_status; |
234 | - the runtime PM status of the device; this field's initial value is |
235 | RPM_SUSPENDED, which means that each device is initially regarded by the |
236 | PM core as 'suspended', regardless of its real hardware status |
237 | |
238 | unsigned int runtime_auto; |
239 | - if set, indicates that the user space has allowed the device driver to |
240 | power manage the device at run time via the /sys/devices/.../power/control |
241 | interface; it may only be modified with the help of the pm_runtime_allow() |
242 | and pm_runtime_forbid() helper functions |
243 | |
244 | unsigned int no_callbacks; |
245 | - indicates that the device does not use the runtime PM callbacks (see |
246 | Section 8); it may be modified only by the pm_runtime_no_callbacks() |
247 | helper function |
248 | |
249 | unsigned int irq_safe; |
250 | - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks |
251 | will be invoked with the spinlock held and interrupts disabled |
252 | |
253 | unsigned int use_autosuspend; |
254 | - indicates that the device's driver supports delayed autosuspend (see |
255 | Section 9); it may be modified only by the |
256 | pm_runtime{_dont}_use_autosuspend() helper functions |
257 | |
258 | unsigned int timer_autosuspends; |
259 | - indicates that the PM core should attempt to carry out an autosuspend |
260 | when the timer expires rather than a normal suspend |
261 | |
262 | int autosuspend_delay; |
263 | - the delay time (in milliseconds) to be used for autosuspend |
264 | |
265 | unsigned long last_busy; |
266 | - the time (in jiffies) when the pm_runtime_mark_last_busy() helper |
267 | function was last called for this device; used in calculating inactivity |
268 | periods for autosuspend |
269 | |
270 | All of the above fields are members of the 'power' member of 'struct device'. |
271 | |
272 | 4. Runtime PM Device Helper Functions |
273 | |
274 | The following runtime PM helper functions are defined in |
275 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: |
276 | |
277 | void pm_runtime_init(struct device *dev); |
278 | - initialize the device runtime PM fields in 'struct dev_pm_info' |
279 | |
280 | void pm_runtime_remove(struct device *dev); |
281 | - make sure that the runtime PM of the device will be disabled after |
282 | removing the device from device hierarchy |
283 | |
284 | int pm_runtime_idle(struct device *dev); |
285 | - execute the subsystem-level idle callback for the device; returns 0 on |
286 | success or error code on failure, where -EINPROGRESS means that |
287 | ->runtime_idle() is already being executed |
288 | |
289 | int pm_runtime_suspend(struct device *dev); |
290 | - execute the subsystem-level suspend callback for the device; returns 0 on |
291 | success, 1 if the device's runtime PM status was already 'suspended', or |
292 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt |
293 | to suspend the device again in future and -EACCES means that |
294 | 'power.disable_depth' is different from 0 |
295 | |
296 | int pm_runtime_autosuspend(struct device *dev); |
297 | - same as pm_runtime_suspend() except that the autosuspend delay is taken |
298 | into account; if pm_runtime_autosuspend_expiration() says the delay has |
299 | not yet expired then an autosuspend is scheduled for the appropriate time |
300 | and 0 is returned |
301 | |
302 | int pm_runtime_resume(struct device *dev); |
303 | - execute the subsystem-level resume callback for the device; returns 0 on |
304 | success, 1 if the device's runtime PM status was already 'active' or |
305 | error code on failure, where -EAGAIN means it may be safe to attempt to |
306 | resume the device again in future, but 'power.runtime_error' should be |
307 | checked additionally, and -EACCES means that 'power.disable_depth' is |
308 | different from 0 |
309 | |
310 | int pm_request_idle(struct device *dev); |
311 | - submit a request to execute the subsystem-level idle callback for the |
312 | device (the request is represented by a work item in pm_wq); returns 0 on |
313 | success or error code if the request has not been queued up |
314 | |
315 | int pm_request_autosuspend(struct device *dev); |
316 | - schedule the execution of the subsystem-level suspend callback for the |
317 | device when the autosuspend delay has expired; if the delay has already |
318 | expired then the work item is queued up immediately |
319 | |
320 | int pm_schedule_suspend(struct device *dev, unsigned int delay); |
321 | - schedule the execution of the subsystem-level suspend callback for the |
322 | device in future, where 'delay' is the time to wait before queuing up a |
323 | suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work |
324 | item is queued up immediately); returns 0 on success, 1 if the device's PM |
325 | runtime status was already 'suspended', or error code if the request |
326 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of |
327 | ->runtime_suspend() is already scheduled and not yet expired, the new |
328 | value of 'delay' will be used as the time to wait |
329 | |
330 | int pm_request_resume(struct device *dev); |
331 | - submit a request to execute the subsystem-level resume callback for the |
332 | device (the request is represented by a work item in pm_wq); returns 0 on |
333 | success, 1 if the device's runtime PM status was already 'active', or |
334 | error code if the request hasn't been queued up |
335 | |
336 | void pm_runtime_get_noresume(struct device *dev); |
337 | - increment the device's usage counter |
338 | |
339 | int pm_runtime_get(struct device *dev); |
340 | - increment the device's usage counter, run pm_request_resume(dev) and |
341 | return its result |
342 | |
343 | int pm_runtime_get_sync(struct device *dev); |
344 | - increment the device's usage counter, run pm_runtime_resume(dev) and |
345 | return its result |
346 | |
347 | void pm_runtime_put_noidle(struct device *dev); |
348 | - decrement the device's usage counter |
349 | |
350 | int pm_runtime_put(struct device *dev); |
351 | - decrement the device's usage counter; if the result is 0 then run |
352 | pm_request_idle(dev) and return its result |
353 | |
354 | int pm_runtime_put_autosuspend(struct device *dev); |
355 | - decrement the device's usage counter; if the result is 0 then run |
356 | pm_request_autosuspend(dev) and return its result |
357 | |
358 | int pm_runtime_put_sync(struct device *dev); |
359 | - decrement the device's usage counter; if the result is 0 then run |
360 | pm_runtime_idle(dev) and return its result |
361 | |
362 | int pm_runtime_put_sync_suspend(struct device *dev); |
363 | - decrement the device's usage counter; if the result is 0 then run |
364 | pm_runtime_suspend(dev) and return its result |
365 | |
366 | int pm_runtime_put_sync_autosuspend(struct device *dev); |
367 | - decrement the device's usage counter; if the result is 0 then run |
368 | pm_runtime_autosuspend(dev) and return its result |
369 | |
370 | void pm_runtime_enable(struct device *dev); |
371 | - decrement the device's 'power.disable_depth' field; if that field is equal |
372 | to zero, the runtime PM helper functions can execute subsystem-level |
373 | callbacks described in Section 2 for the device |
374 | |
375 | int pm_runtime_disable(struct device *dev); |
376 | - increment the device's 'power.disable_depth' field (if the value of that |
377 | field was previously zero, this prevents subsystem-level runtime PM |
378 | callbacks from being run for the device), make sure that all of the pending |
379 | runtime PM operations on the device are either completed or canceled; |
380 | returns 1 if there was a resume request pending and it was necessary to |
381 | execute the subsystem-level resume callback for the device to satisfy that |
382 | request, otherwise 0 is returned |
383 | |
384 | int pm_runtime_barrier(struct device *dev); |
385 | - check if there's a resume request pending for the device and resume it |
386 | (synchronously) in that case, cancel any other pending runtime PM requests |
387 | regarding it and wait for all runtime PM operations on it in progress to |
388 | complete; returns 1 if there was a resume request pending and it was |
389 | necessary to execute the subsystem-level resume callback for the device to |
390 | satisfy that request, otherwise 0 is returned |
391 | |
392 | void pm_suspend_ignore_children(struct device *dev, bool enable); |
393 | - set/unset the power.ignore_children flag of the device |
394 | |
395 | int pm_runtime_set_active(struct device *dev); |
396 | - clear the device's 'power.runtime_error' flag, set the device's runtime |
397 | PM status to 'active' and update its parent's counter of 'active' |
398 | children as appropriate (it is only valid to use this function if |
399 | 'power.runtime_error' is set or 'power.disable_depth' is greater than |
400 | zero); it will fail and return error code if the device has a parent |
401 | which is not active and the 'power.ignore_children' flag of which is unset |
402 | |
403 | void pm_runtime_set_suspended(struct device *dev); |
404 | - clear the device's 'power.runtime_error' flag, set the device's runtime |
405 | PM status to 'suspended' and update its parent's counter of 'active' |
406 | children as appropriate (it is only valid to use this function if |
407 | 'power.runtime_error' is set or 'power.disable_depth' is greater than |
408 | zero) |
409 | |
410 | bool pm_runtime_suspended(struct device *dev); |
411 | - return true if the device's runtime PM status is 'suspended' and its |
412 | 'power.disable_depth' field is equal to zero, or false otherwise |
413 | |
414 | bool pm_runtime_status_suspended(struct device *dev); |
415 | - return true if the device's runtime PM status is 'suspended' |
416 | |
417 | void pm_runtime_allow(struct device *dev); |
418 | - set the power.runtime_auto flag for the device and decrease its usage |
419 | counter (used by the /sys/devices/.../power/control interface to |
420 | effectively allow the device to be power managed at run time) |
421 | |
422 | void pm_runtime_forbid(struct device *dev); |
423 | - unset the power.runtime_auto flag for the device and increase its usage |
424 | counter (used by the /sys/devices/.../power/control interface to |
425 | effectively prevent the device from being power managed at run time) |
426 | |
427 | void pm_runtime_no_callbacks(struct device *dev); |
428 | - set the power.no_callbacks flag for the device and remove the runtime |
429 | PM attributes from /sys/devices/.../power (or prevent them from being |
430 | added when the device is registered) |
431 | |
432 | void pm_runtime_irq_safe(struct device *dev); |
433 | - set the power.irq_safe flag for the device, causing the runtime-PM |
434 | callbacks to be invoked with interrupts off |
435 | |
436 | void pm_runtime_mark_last_busy(struct device *dev); |
437 | - set the power.last_busy field to the current time |
438 | |
439 | void pm_runtime_use_autosuspend(struct device *dev); |
440 | - set the power.use_autosuspend flag, enabling autosuspend delays |
441 | |
442 | void pm_runtime_dont_use_autosuspend(struct device *dev); |
443 | - clear the power.use_autosuspend flag, disabling autosuspend delays |
444 | |
445 | void pm_runtime_set_autosuspend_delay(struct device *dev, int delay); |
446 | - set the power.autosuspend_delay value to 'delay' (expressed in |
447 | milliseconds); if 'delay' is negative then runtime suspends are |
448 | prevented |
449 | |
450 | unsigned long pm_runtime_autosuspend_expiration(struct device *dev); |
451 | - calculate the time when the current autosuspend delay period will expire, |
452 | based on power.last_busy and power.autosuspend_delay; if the delay time |
453 | is 1000 ms or larger then the expiration time is rounded up to the |
454 | nearest second; returns 0 if the delay period has already expired or |
455 | power.use_autosuspend isn't set, otherwise returns the expiration time |
456 | in jiffies |
457 | |
458 | It is safe to execute the following helper functions from interrupt context: |
459 | |
460 | pm_request_idle() |
461 | pm_request_autosuspend() |
462 | pm_schedule_suspend() |
463 | pm_request_resume() |
464 | pm_runtime_get_noresume() |
465 | pm_runtime_get() |
466 | pm_runtime_put_noidle() |
467 | pm_runtime_put() |
468 | pm_runtime_put_autosuspend() |
469 | pm_runtime_enable() |
470 | pm_suspend_ignore_children() |
471 | pm_runtime_set_active() |
472 | pm_runtime_set_suspended() |
473 | pm_runtime_suspended() |
474 | pm_runtime_mark_last_busy() |
475 | pm_runtime_autosuspend_expiration() |
476 | |
477 | If pm_runtime_irq_safe() has been called for a device then the following helper |
478 | functions may also be used in interrupt context: |
479 | |
480 | pm_runtime_suspend() |
481 | pm_runtime_autosuspend() |
482 | pm_runtime_resume() |
483 | pm_runtime_get_sync() |
484 | pm_runtime_put_sync() |
485 | pm_runtime_put_sync_suspend() |
486 | |
487 | 5. Runtime PM Initialization, Device Probing and Removal |
488 | |
489 | Initially, the runtime PM is disabled for all devices, which means that the |
490 | majority of the runtime PM helper funtions described in Section 4 will return |
491 | -EAGAIN until pm_runtime_enable() is called for the device. |
492 | |
493 | In addition to that, the initial runtime PM status of all devices is |
494 | 'suspended', but it need not reflect the actual physical state of the device. |
495 | Thus, if the device is initially active (i.e. it is able to process I/O), its |
496 | runtime PM status must be changed to 'active', with the help of |
497 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. |
498 | |
499 | However, if the device has a parent and the parent's runtime PM is enabled, |
500 | calling pm_runtime_set_active() for the device will affect the parent, unless |
501 | the parent's 'power.ignore_children' flag is set. Namely, in that case the |
502 | parent won't be able to suspend at run time, using the PM core's helper |
503 | functions, as long as the child's status is 'active', even if the child's |
504 | runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for |
505 | the child yet or pm_runtime_disable() has been called for it). For this reason, |
506 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() |
507 | should be called for it too as soon as reasonably possible or its runtime PM |
508 | status should be changed back to 'suspended' with the help of |
509 | pm_runtime_set_suspended(). |
510 | |
511 | If the default initial runtime PM status of the device (i.e. 'suspended') |
512 | reflects the actual state of the device, its bus type's or its driver's |
513 | ->probe() callback will likely need to wake it up using one of the PM core's |
514 | helper functions described in Section 4. In that case, pm_runtime_resume() |
515 | should be used. Of course, for this purpose the device's runtime PM has to be |
516 | enabled earlier by calling pm_runtime_enable(). |
517 | |
518 | If the device bus type's or driver's ->probe() callback runs |
519 | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, |
520 | they will fail returning -EAGAIN, because the device's usage counter is |
521 | incremented by the driver core before executing ->probe(). Still, it may be |
522 | desirable to suspend the device as soon as ->probe() has finished, so the driver |
523 | core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for |
524 | the device at that time. |
525 | |
526 | Moreover, the driver core prevents runtime PM callbacks from racing with the bus |
527 | notifier callback in __device_release_driver(), which is necessary, because the |
528 | notifier is used by some subsystems to carry out operations affecting the |
529 | runtime PM functionality. It does so by calling pm_runtime_get_sync() before |
530 | driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications. This |
531 | resumes the device if it's in the suspended state and prevents it from |
532 | being suspended again while those routines are being executed. |
533 | |
534 | To allow bus types and drivers to put devices into the suspended state by |
535 | calling pm_runtime_suspend() from their ->remove() routines, the driver core |
536 | executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER |
537 | notifications in __device_release_driver(). This requires bus types and |
538 | drivers to make their ->remove() callbacks avoid races with runtime PM directly, |
539 | but also it allows of more flexibility in the handling of devices during the |
540 | removal of their drivers. |
541 | |
542 | The user space can effectively disallow the driver of the device to power manage |
543 | it at run time by changing the value of its /sys/devices/.../power/control |
544 | attribute to "on", which causes pm_runtime_forbid() to be called. In principle, |
545 | this mechanism may also be used by the driver to effectively turn off the |
546 | runtime power management of the device until the user space turns it on. |
547 | Namely, during the initialization the driver can make sure that the runtime PM |
548 | status of the device is 'active' and call pm_runtime_forbid(). It should be |
549 | noted, however, that if the user space has already intentionally changed the |
550 | value of /sys/devices/.../power/control to "auto" to allow the driver to power |
551 | manage the device at run time, the driver may confuse it by using |
552 | pm_runtime_forbid() this way. |
553 | |
554 | 6. Runtime PM and System Sleep |
555 | |
556 | Runtime PM and system sleep (i.e., system suspend and hibernation, also known |
557 | as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of |
558 | ways. If a device is active when a system sleep starts, everything is |
559 | straightforward. But what should happen if the device is already suspended? |
560 | |
561 | The device may have different wake-up settings for runtime PM and system sleep. |
562 | For example, remote wake-up may be enabled for runtime suspend but disallowed |
563 | for system sleep (device_may_wakeup(dev) returns 'false'). When this happens, |
564 | the subsystem-level system suspend callback is responsible for changing the |
565 | device's wake-up setting (it may leave that to the device driver's system |
566 | suspend routine). It may be necessary to resume the device and suspend it again |
567 | in order to do so. The same is true if the driver uses different power levels |
568 | or other settings for runtime suspend and system sleep. |
569 | |
570 | During system resume, the simplest approach is to bring all devices back to full |
571 | power, even if they had been suspended before the system suspend began. There |
572 | are several reasons for this, including: |
573 | |
574 | * The device might need to switch power levels, wake-up settings, etc. |
575 | |
576 | * Remote wake-up events might have been lost by the firmware. |
577 | |
578 | * The device's children may need the device to be at full power in order |
579 | to resume themselves. |
580 | |
581 | * The driver's idea of the device state may not agree with the device's |
582 | physical state. This can happen during resume from hibernation. |
583 | |
584 | * The device might need to be reset. |
585 | |
586 | * Even though the device was suspended, if its usage counter was > 0 then most |
587 | likely it would need a runtime resume in the near future anyway. |
588 | |
589 | If the device had been suspended before the system suspend began and it's |
590 | brought back to full power during resume, then its runtime PM status will have |
591 | to be updated to reflect the actual post-system sleep status. The way to do |
592 | this is: |
593 | |
594 | pm_runtime_disable(dev); |
595 | pm_runtime_set_active(dev); |
596 | pm_runtime_enable(dev); |
597 | |
598 | The PM core always increments the runtime usage counter before calling the |
599 | ->suspend() callback and decrements it after calling the ->resume() callback. |
600 | Hence disabling runtime PM temporarily like this will not cause any runtime |
601 | suspend attempts to be permanently lost. If the usage count goes to zero |
602 | following the return of the ->resume() callback, the ->runtime_idle() callback |
603 | will be invoked as usual. |
604 | |
605 | On some systems, however, system sleep is not entered through a global firmware |
606 | or hardware operation. Instead, all hardware components are put into low-power |
607 | states directly by the kernel in a coordinated way. Then, the system sleep |
608 | state effectively follows from the states the hardware components end up in |
609 | and the system is woken up from that state by a hardware interrupt or a similar |
610 | mechanism entirely under the kernel's control. As a result, the kernel never |
611 | gives control away and the states of all devices during resume are precisely |
612 | known to it. If that is the case and none of the situations listed above takes |
613 | place (in particular, if the system is not waking up from hibernation), it may |
614 | be more efficient to leave the devices that had been suspended before the system |
615 | suspend began in the suspended state. |
616 | |
617 | The PM core does its best to reduce the probability of race conditions between |
618 | the runtime PM and system suspend/resume (and hibernation) callbacks by carrying |
619 | out the following operations: |
620 | |
621 | * During system suspend it calls pm_runtime_get_noresume() and |
622 | pm_runtime_barrier() for every device right before executing the |
623 | subsystem-level .suspend() callback for it. In addition to that it calls |
624 | pm_runtime_disable() for every device right after executing the |
625 | subsystem-level .suspend() callback for it. |
626 | |
627 | * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync() |
628 | for every device right before and right after executing the subsystem-level |
629 | .resume() callback for it, respectively. |
630 | |
631 | 7. Generic subsystem callbacks |
632 | |
633 | Subsystems may wish to conserve code space by using the set of generic power |
634 | management callbacks provided by the PM core, defined in |
635 | driver/base/power/generic_ops.c: |
636 | |
637 | int pm_generic_runtime_idle(struct device *dev); |
638 | - invoke the ->runtime_idle() callback provided by the driver of this |
639 | device, if defined, and call pm_runtime_suspend() for this device if the |
640 | return value is 0 or the callback is not defined |
641 | |
642 | int pm_generic_runtime_suspend(struct device *dev); |
643 | - invoke the ->runtime_suspend() callback provided by the driver of this |
644 | device and return its result, or return -EINVAL if not defined |
645 | |
646 | int pm_generic_runtime_resume(struct device *dev); |
647 | - invoke the ->runtime_resume() callback provided by the driver of this |
648 | device and return its result, or return -EINVAL if not defined |
649 | |
650 | int pm_generic_suspend(struct device *dev); |
651 | - if the device has not been suspended at run time, invoke the ->suspend() |
652 | callback provided by its driver and return its result, or return 0 if not |
653 | defined |
654 | |
655 | int pm_generic_suspend_noirq(struct device *dev); |
656 | - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq() |
657 | callback provided by the device's driver and return its result, or return |
658 | 0 if not defined |
659 | |
660 | int pm_generic_resume(struct device *dev); |
661 | - invoke the ->resume() callback provided by the driver of this device and, |
662 | if successful, change the device's runtime PM status to 'active' |
663 | |
664 | int pm_generic_resume_noirq(struct device *dev); |
665 | - invoke the ->resume_noirq() callback provided by the driver of this device |
666 | |
667 | int pm_generic_freeze(struct device *dev); |
668 | - if the device has not been suspended at run time, invoke the ->freeze() |
669 | callback provided by its driver and return its result, or return 0 if not |
670 | defined |
671 | |
672 | int pm_generic_freeze_noirq(struct device *dev); |
673 | - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq() |
674 | callback provided by the device's driver and return its result, or return |
675 | 0 if not defined |
676 | |
677 | int pm_generic_thaw(struct device *dev); |
678 | - if the device has not been suspended at run time, invoke the ->thaw() |
679 | callback provided by its driver and return its result, or return 0 if not |
680 | defined |
681 | |
682 | int pm_generic_thaw_noirq(struct device *dev); |
683 | - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq() |
684 | callback provided by the device's driver and return its result, or return |
685 | 0 if not defined |
686 | |
687 | int pm_generic_poweroff(struct device *dev); |
688 | - if the device has not been suspended at run time, invoke the ->poweroff() |
689 | callback provided by its driver and return its result, or return 0 if not |
690 | defined |
691 | |
692 | int pm_generic_poweroff_noirq(struct device *dev); |
693 | - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq() |
694 | callback provided by the device's driver and return its result, or return |
695 | 0 if not defined |
696 | |
697 | int pm_generic_restore(struct device *dev); |
698 | - invoke the ->restore() callback provided by the driver of this device and, |
699 | if successful, change the device's runtime PM status to 'active' |
700 | |
701 | int pm_generic_restore_noirq(struct device *dev); |
702 | - invoke the ->restore_noirq() callback provided by the device's driver |
703 | |
704 | These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(), |
705 | ->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(), |
706 | ->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(), |
707 | ->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() callback |
708 | pointers in the subsystem-level dev_pm_ops structures. |
709 | |
710 | If a subsystem wishes to use all of them at the same time, it can simply assign |
711 | the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its |
712 | dev_pm_ops structure pointer. |
713 | |
714 | Device drivers that wish to use the same function as a system suspend, freeze, |
715 | poweroff and runtime suspend callback, and similarly for system resume, thaw, |
716 | restore, and runtime resume, can achieve this with the help of the |
717 | UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its |
718 | last argument to NULL). |
719 | |
720 | 8. "No-Callback" Devices |
721 | |
722 | Some "devices" are only logical sub-devices of their parent and cannot be |
723 | power-managed on their own. (The prototype example is a USB interface. Entire |
724 | USB devices can go into low-power mode or send wake-up requests, but neither is |
725 | possible for individual interfaces.) The drivers for these devices have no |
726 | need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend() |
727 | and ->runtime_resume() would always return 0 without doing anything else and |
728 | ->runtime_idle() would always call pm_runtime_suspend(). |
729 | |
730 | Subsystems can tell the PM core about these devices by calling |
731 | pm_runtime_no_callbacks(). This should be done after the device structure is |
732 | initialized and before it is registered (although after device registration is |
733 | also okay). The routine will set the device's power.no_callbacks flag and |
734 | prevent the non-debugging runtime PM sysfs attributes from being created. |
735 | |
736 | When power.no_callbacks is set, the PM core will not invoke the |
737 | ->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks. |
738 | Instead it will assume that suspends and resumes always succeed and that idle |
739 | devices should be suspended. |
740 | |
741 | As a consequence, the PM core will never directly inform the device's subsystem |
742 | or driver about runtime power changes. Instead, the driver for the device's |
743 | parent must take responsibility for telling the device's driver when the |
744 | parent's power state changes. |
745 | |
746 | 9. Autosuspend, or automatically-delayed suspends |
747 | |
748 | Changing a device's power state isn't free; it requires both time and energy. |
749 | A device should be put in a low-power state only when there's some reason to |
750 | think it will remain in that state for a substantial time. A common heuristic |
751 | says that a device which hasn't been used for a while is liable to remain |
752 | unused; following this advice, drivers should not allow devices to be suspended |
753 | at runtime until they have been inactive for some minimum period. Even when |
754 | the heuristic ends up being non-optimal, it will still prevent devices from |
755 | "bouncing" too rapidly between low-power and full-power states. |
756 | |
757 | The term "autosuspend" is an historical remnant. It doesn't mean that the |
758 | device is automatically suspended (the subsystem or driver still has to call |
759 | the appropriate PM routines); rather it means that runtime suspends will |
760 | automatically be delayed until the desired period of inactivity has elapsed. |
761 | |
762 | Inactivity is determined based on the power.last_busy field. Drivers should |
763 | call pm_runtime_mark_last_busy() to update this field after carrying out I/O, |
764 | typically just before calling pm_runtime_put_autosuspend(). The desired length |
765 | of the inactivity period is a matter of policy. Subsystems can set this length |
766 | initially by calling pm_runtime_set_autosuspend_delay(), but after device |
767 | registration the length should be controlled by user space, using the |
768 | /sys/devices/.../power/autosuspend_delay_ms attribute. |
769 | |
770 | In order to use autosuspend, subsystems or drivers must call |
771 | pm_runtime_use_autosuspend() (preferably before registering the device), and |
772 | thereafter they should use the various *_autosuspend() helper functions instead |
773 | of the non-autosuspend counterparts: |
774 | |
775 | Instead of: pm_runtime_suspend use: pm_runtime_autosuspend; |
776 | Instead of: pm_schedule_suspend use: pm_request_autosuspend; |
777 | Instead of: pm_runtime_put use: pm_runtime_put_autosuspend; |
778 | Instead of: pm_runtime_put_sync use: pm_runtime_put_sync_autosuspend. |
779 | |
780 | Drivers may also continue to use the non-autosuspend helper functions; they |
781 | will behave normally, not taking the autosuspend delay into account. |
782 | Similarly, if the power.use_autosuspend field isn't set then the autosuspend |
783 | helper functions will behave just like the non-autosuspend counterparts. |
784 | |
785 | The implementation is well suited for asynchronous use in interrupt contexts. |
786 | However such use inevitably involves races, because the PM core can't |
787 | synchronize ->runtime_suspend() callbacks with the arrival of I/O requests. |
788 | This synchronization must be handled by the driver, using its private lock. |
789 | Here is a schematic pseudo-code example: |
790 | |
791 | foo_read_or_write(struct foo_priv *foo, void *data) |
792 | { |
793 | lock(&foo->private_lock); |
794 | add_request_to_io_queue(foo, data); |
795 | if (foo->num_pending_requests++ == 0) |
796 | pm_runtime_get(&foo->dev); |
797 | if (!foo->is_suspended) |
798 | foo_process_next_request(foo); |
799 | unlock(&foo->private_lock); |
800 | } |
801 | |
802 | foo_io_completion(struct foo_priv *foo, void *req) |
803 | { |
804 | lock(&foo->private_lock); |
805 | if (--foo->num_pending_requests == 0) { |
806 | pm_runtime_mark_last_busy(&foo->dev); |
807 | pm_runtime_put_autosuspend(&foo->dev); |
808 | } else { |
809 | foo_process_next_request(foo); |
810 | } |
811 | unlock(&foo->private_lock); |
812 | /* Send req result back to the user ... */ |
813 | } |
814 | |
815 | int foo_runtime_suspend(struct device *dev) |
816 | { |
817 | struct foo_priv foo = container_of(dev, ...); |
818 | int ret = 0; |
819 | |
820 | lock(&foo->private_lock); |
821 | if (foo->num_pending_requests > 0) { |
822 | ret = -EBUSY; |
823 | } else { |
824 | /* ... suspend the device ... */ |
825 | foo->is_suspended = 1; |
826 | } |
827 | unlock(&foo->private_lock); |
828 | return ret; |
829 | } |
830 | |
831 | int foo_runtime_resume(struct device *dev) |
832 | { |
833 | struct foo_priv foo = container_of(dev, ...); |
834 | |
835 | lock(&foo->private_lock); |
836 | /* ... resume the device ... */ |
837 | foo->is_suspended = 0; |
838 | pm_runtime_mark_last_busy(&foo->dev); |
839 | if (foo->num_pending_requests > 0) |
840 | foo_process_requests(foo); |
841 | unlock(&foo->private_lock); |
842 | return 0; |
843 | } |
844 | |
845 | The important point is that after foo_io_completion() asks for an autosuspend, |
846 | the foo_runtime_suspend() callback may race with foo_read_or_write(). |
847 | Therefore foo_runtime_suspend() has to check whether there are any pending I/O |
848 | requests (while holding the private lock) before allowing the suspend to |
849 | proceed. |
850 | |
851 | In addition, the power.autosuspend_delay field can be changed by user space at |
852 | any time. If a driver cares about this, it can call |
853 | pm_runtime_autosuspend_expiration() from within the ->runtime_suspend() |
854 | callback while holding its private lock. If the function returns a nonzero |
855 | value then the delay has not yet expired and the callback should return |
856 | -EAGAIN. |
857 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9