Root/
1 | The padata parallel execution mechanism |
2 | Last updated for 2.6.36 |
3 | |
4 | Padata is a mechanism by which the kernel can farm work out to be done in |
5 | parallel on multiple CPUs while retaining the ordering of tasks. It was |
6 | developed for use with the IPsec code, which needs to be able to perform |
7 | encryption and decryption on large numbers of packets without reordering |
8 | those packets. The crypto developers made a point of writing padata in a |
9 | sufficiently general fashion that it could be put to other uses as well. |
10 | |
11 | The first step in using padata is to set up a padata_instance structure for |
12 | overall control of how tasks are to be run: |
13 | |
14 | #include <linux/padata.h> |
15 | |
16 | struct padata_instance *padata_alloc(struct workqueue_struct *wq, |
17 | const struct cpumask *pcpumask, |
18 | const struct cpumask *cbcpumask); |
19 | |
20 | The pcpumask describes which processors will be used to execute work |
21 | submitted to this instance in parallel. The cbcpumask defines which |
22 | processors are allowed to be used as the serialization callback processor. |
23 | The workqueue wq is where the work will actually be done; it should be |
24 | a multithreaded queue, naturally. |
25 | |
26 | To allocate a padata instance with the cpu_possible_mask for both |
27 | cpumasks this helper function can be used: |
28 | |
29 | struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); |
30 | |
31 | Note: Padata maintains two kinds of cpumasks internally. The user supplied |
32 | cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable' |
33 | cpumasks. The usable cpumasks are always a subset of active CPUs in the |
34 | user supplied cpumasks; these are the cpumasks padata actually uses. So |
35 | it is legal to supply a cpumask to padata that contains offline CPUs. |
36 | Once an offline CPU in the user supplied cpumask comes online, padata |
37 | is going to use it. |
38 | |
39 | There are functions for enabling and disabling the instance: |
40 | |
41 | int padata_start(struct padata_instance *pinst); |
42 | void padata_stop(struct padata_instance *pinst); |
43 | |
44 | These functions are setting or clearing the "PADATA_INIT" flag; |
45 | if that flag is not set, other functions will refuse to work. |
46 | padata_start returns zero on success (flag set) or -EINVAL if the |
47 | padata cpumask contains no active CPU (flag not set). |
48 | padata_stop clears the flag and blocks until the padata instance |
49 | is unused. |
50 | |
51 | The list of CPUs to be used can be adjusted with these functions: |
52 | |
53 | int padata_set_cpumasks(struct padata_instance *pinst, |
54 | cpumask_var_t pcpumask, |
55 | cpumask_var_t cbcpumask); |
56 | int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, |
57 | cpumask_var_t cpumask); |
58 | int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask); |
59 | int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask); |
60 | |
61 | Changing the CPU masks are expensive operations, though, so it should not be |
62 | done with great frequency. |
63 | |
64 | It's possible to change both cpumasks of a padata instance with |
65 | padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask) |
66 | and for the serial callback function (cbcpumask). padata_set_cpumask is used to |
67 | change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL, |
68 | PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use. |
69 | To simply add or remove one CPU from a certain cpumask the functions |
70 | padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or |
71 | remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. |
72 | |
73 | If a user is interested in padata cpumask changes, he can register to |
74 | the padata cpumask change notifier: |
75 | |
76 | int padata_register_cpumask_notifier(struct padata_instance *pinst, |
77 | struct notifier_block *nblock); |
78 | |
79 | To unregister from that notifier: |
80 | |
81 | int padata_unregister_cpumask_notifier(struct padata_instance *pinst, |
82 | struct notifier_block *nblock); |
83 | |
84 | The padata cpumask change notifier notifies about changes of the usable |
85 | cpumasks, i.e. the subset of active CPUs in the user supplied cpumask. |
86 | |
87 | Padata calls the notifier chain with: |
88 | |
89 | blocking_notifier_call_chain(&pinst->cpumask_change_notifier, |
90 | notification_mask, |
91 | &pd_new->cpumask); |
92 | |
93 | Here cpumask_change_notifier is registered notifier, notification_mask |
94 | is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer |
95 | to a struct padata_cpumask that contains the new cpumask information. |
96 | |
97 | Actually submitting work to the padata instance requires the creation of a |
98 | padata_priv structure: |
99 | |
100 | struct padata_priv { |
101 | /* Other stuff here... */ |
102 | void (*parallel)(struct padata_priv *padata); |
103 | void (*serial)(struct padata_priv *padata); |
104 | }; |
105 | |
106 | This structure will almost certainly be embedded within some larger |
107 | structure specific to the work to be done. Most of its fields are private to |
108 | padata, but the structure should be zeroed at initialisation time, and the |
109 | parallel() and serial() functions should be provided. Those functions will |
110 | be called in the process of getting the work done as we will see |
111 | momentarily. |
112 | |
113 | The submission of work is done with: |
114 | |
115 | int padata_do_parallel(struct padata_instance *pinst, |
116 | struct padata_priv *padata, int cb_cpu); |
117 | |
118 | The pinst and padata structures must be set up as described above; cb_cpu |
119 | specifies which CPU will be used for the final callback when the work is |
120 | done; it must be in the current instance's CPU mask. The return value from |
121 | padata_do_parallel() is zero on success, indicating that the work is in |
122 | progress. -EBUSY means that somebody, somewhere else is messing with the |
123 | instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being |
124 | in that CPU mask or about a not running instance. |
125 | |
126 | Each task submitted to padata_do_parallel() will, in turn, be passed to |
127 | exactly one call to the above-mentioned parallel() function, on one CPU, so |
128 | true parallelism is achieved by submitting multiple tasks. Despite the |
129 | fact that the workqueue is used to make these calls, parallel() is run with |
130 | software interrupts disabled and thus cannot sleep. The parallel() |
131 | function gets the padata_priv structure pointer as its lone parameter; |
132 | information about the actual work to be done is probably obtained by using |
133 | container_of() to find the enclosing structure. |
134 | |
135 | Note that parallel() has no return value; the padata subsystem assumes that |
136 | parallel() will take responsibility for the task from this point. The work |
137 | need not be completed during this call, but, if parallel() leaves work |
138 | outstanding, it should be prepared to be called again with a new job before |
139 | the previous one completes. When a task does complete, parallel() (or |
140 | whatever function actually finishes the job) should inform padata of the |
141 | fact with a call to: |
142 | |
143 | void padata_do_serial(struct padata_priv *padata); |
144 | |
145 | At some point in the future, padata_do_serial() will trigger a call to the |
146 | serial() function in the padata_priv structure. That call will happen on |
147 | the CPU requested in the initial call to padata_do_parallel(); it, too, is |
148 | done through the workqueue, but with local software interrupts disabled. |
149 | Note that this call may be deferred for a while since the padata code takes |
150 | pains to ensure that tasks are completed in the order in which they were |
151 | submitted. |
152 | |
153 | The one remaining function in the padata API should be called to clean up |
154 | when a padata instance is no longer needed: |
155 | |
156 | void padata_free(struct padata_instance *pinst); |
157 | |
158 | This function will busy-wait while any remaining tasks are completed, so it |
159 | might be best not to call it while there is work outstanding. Shutting |
160 | down the workqueue, if necessary, should be done separately. |
161 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9