Root/
1 | |
2 | Miscellaneous Device control operations for the autofs4 kernel module |
3 | ==================================================================== |
4 | |
5 | The problem |
6 | =========== |
7 | |
8 | There is a problem with active restarts in autofs (that is to say |
9 | restarting autofs when there are busy mounts). |
10 | |
11 | During normal operation autofs uses a file descriptor opened on the |
12 | directory that is being managed in order to be able to issue control |
13 | operations. Using a file descriptor gives ioctl operations access to |
14 | autofs specific information stored in the super block. The operations |
15 | are things such as setting an autofs mount catatonic, setting the |
16 | expire timeout and requesting expire checks. As is explained below, |
17 | certain types of autofs triggered mounts can end up covering an autofs |
18 | mount itself which prevents us being able to use open(2) to obtain a |
19 | file descriptor for these operations if we don't already have one open. |
20 | |
21 | Currently autofs uses "umount -l" (lazy umount) to clear active mounts |
22 | at restart. While using lazy umount works for most cases, anything that |
23 | needs to walk back up the mount tree to construct a path, such as |
24 | getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works |
25 | because the point from which the path is constructed has been detached |
26 | from the mount tree. |
27 | |
28 | The actual problem with autofs is that it can't reconnect to existing |
29 | mounts. Immediately one thinks of just adding the ability to remount |
30 | autofs file systems would solve it, but alas, that can't work. This is |
31 | because autofs direct mounts and the implementation of "on demand mount |
32 | and expire" of nested mount trees have the file system mounted directly |
33 | on top of the mount trigger directory dentry. |
34 | |
35 | For example, there are two types of automount maps, direct (in the kernel |
36 | module source you will see a third type called an offset, which is just |
37 | a direct mount in disguise) and indirect. |
38 | |
39 | Here is a master map with direct and indirect map entries: |
40 | |
41 | /- /etc/auto.direct |
42 | /test /etc/auto.indirect |
43 | |
44 | and the corresponding map files: |
45 | |
46 | /etc/auto.direct: |
47 | |
48 | /automount/dparse/g6 budgie:/autofs/export1 |
49 | /automount/dparse/g1 shark:/autofs/export1 |
50 | and so on. |
51 | |
52 | /etc/auto.indirect: |
53 | |
54 | g1 shark:/autofs/export1 |
55 | g6 budgie:/autofs/export1 |
56 | and so on. |
57 | |
58 | For the above indirect map an autofs file system is mounted on /test and |
59 | mounts are triggered for each sub-directory key by the inode lookup |
60 | operation. So we see a mount of shark:/autofs/export1 on /test/g1, for |
61 | example. |
62 | |
63 | The way that direct mounts are handled is by making an autofs mount on |
64 | each full path, such as /automount/dparse/g1, and using it as a mount |
65 | trigger. So when we walk on the path we mount shark:/autofs/export1 "on |
66 | top of this mount point". Since these are always directories we can |
67 | use the follow_link inode operation to trigger the mount. |
68 | |
69 | But, each entry in direct and indirect maps can have offsets (making |
70 | them multi-mount map entries). |
71 | |
72 | For example, an indirect mount map entry could also be: |
73 | |
74 | g1 \ |
75 | / shark:/autofs/export5/testing/test \ |
76 | /s1 shark:/autofs/export/testing/test/s1 \ |
77 | /s2 shark:/autofs/export5/testing/test/s2 \ |
78 | /s1/ss1 shark:/autofs/export1 \ |
79 | /s2/ss2 shark:/autofs/export2 |
80 | |
81 | and a similarly a direct mount map entry could also be: |
82 | |
83 | /automount/dparse/g1 \ |
84 | / shark:/autofs/export5/testing/test \ |
85 | /s1 shark:/autofs/export/testing/test/s1 \ |
86 | /s2 shark:/autofs/export5/testing/test/s2 \ |
87 | /s1/ss1 shark:/autofs/export2 \ |
88 | /s2/ss2 shark:/autofs/export2 |
89 | |
90 | One of the issues with version 4 of autofs was that, when mounting an |
91 | entry with a large number of offsets, possibly with nesting, we needed |
92 | to mount and umount all of the offsets as a single unit. Not really a |
93 | problem, except for people with a large number of offsets in map entries. |
94 | This mechanism is used for the well known "hosts" map and we have seen |
95 | cases (in 2.4) where the available number of mounts are exhausted or |
96 | where the number of privileged ports available is exhausted. |
97 | |
98 | In version 5 we mount only as we go down the tree of offsets and |
99 | similarly for expiring them which resolves the above problem. There is |
100 | somewhat more detail to the implementation but it isn't needed for the |
101 | sake of the problem explanation. The one important detail is that these |
102 | offsets are implemented using the same mechanism as the direct mounts |
103 | above and so the mount points can be covered by a mount. |
104 | |
105 | The current autofs implementation uses an ioctl file descriptor opened |
106 | on the mount point for control operations. The references held by the |
107 | descriptor are accounted for in checks made to determine if a mount is |
108 | in use and is also used to access autofs file system information held |
109 | in the mount super block. So the use of a file handle needs to be |
110 | retained. |
111 | |
112 | |
113 | The Solution |
114 | ============ |
115 | |
116 | To be able to restart autofs leaving existing direct, indirect and |
117 | offset mounts in place we need to be able to obtain a file handle |
118 | for these potentially covered autofs mount points. Rather than just |
119 | implement an isolated operation it was decided to re-implement the |
120 | existing ioctl interface and add new operations to provide this |
121 | functionality. |
122 | |
123 | In addition, to be able to reconstruct a mount tree that has busy mounts, |
124 | the uid and gid of the last user that triggered the mount needs to be |
125 | available because these can be used as macro substitution variables in |
126 | autofs maps. They are recorded at mount request time and an operation |
127 | has been added to retrieve them. |
128 | |
129 | Since we're re-implementing the control interface, a couple of other |
130 | problems with the existing interface have been addressed. First, when |
131 | a mount or expire operation completes a status is returned to the |
132 | kernel by either a "send ready" or a "send fail" operation. The |
133 | "send fail" operation of the ioctl interface could only ever send |
134 | ENOENT so the re-implementation allows user space to send an actual |
135 | status. Another expensive operation in user space, for those using |
136 | very large maps, is discovering if a mount is present. Usually this |
137 | involves scanning /proc/mounts and since it needs to be done quite |
138 | often it can introduce significant overhead when there are many entries |
139 | in the mount table. An operation to lookup the mount status of a mount |
140 | point dentry (covered or not) has also been added. |
141 | |
142 | Current kernel development policy recommends avoiding the use of the |
143 | ioctl mechanism in favor of systems such as Netlink. An implementation |
144 | using this system was attempted to evaluate its suitability and it was |
145 | found to be inadequate, in this case. The Generic Netlink system was |
146 | used for this as raw Netlink would lead to a significant increase in |
147 | complexity. There's no question that the Generic Netlink system is an |
148 | elegant solution for common case ioctl functions but it's not a complete |
149 | replacement probably because it's primary purpose in life is to be a |
150 | message bus implementation rather than specifically an ioctl replacement. |
151 | While it would be possible to work around this there is one concern |
152 | that lead to the decision to not use it. This is that the autofs |
153 | expire in the daemon has become far to complex because umount |
154 | candidates are enumerated, almost for no other reason than to "count" |
155 | the number of times to call the expire ioctl. This involves scanning |
156 | the mount table which has proved to be a big overhead for users with |
157 | large maps. The best way to improve this is try and get back to the |
158 | way the expire was done long ago. That is, when an expire request is |
159 | issued for a mount (file handle) we should continually call back to |
160 | the daemon until we can't umount any more mounts, then return the |
161 | appropriate status to the daemon. At the moment we just expire one |
162 | mount at a time. A Generic Netlink implementation would exclude this |
163 | possibility for future development due to the requirements of the |
164 | message bus architecture. |
165 | |
166 | |
167 | autofs4 Miscellaneous Device mount control interface |
168 | ==================================================== |
169 | |
170 | The control interface is opening a device node, typically /dev/autofs. |
171 | |
172 | All the ioctls use a common structure to pass the needed parameter |
173 | information and return operation results: |
174 | |
175 | struct autofs_dev_ioctl { |
176 | __u32 ver_major; |
177 | __u32 ver_minor; |
178 | __u32 size; /* total size of data passed in |
179 | * including this struct */ |
180 | __s32 ioctlfd; /* automount command fd */ |
181 | |
182 | __u32 arg1; /* Command parameters */ |
183 | __u32 arg2; |
184 | |
185 | char path[0]; |
186 | }; |
187 | |
188 | The ioctlfd field is a mount point file descriptor of an autofs mount |
189 | point. It is returned by the open call and is used by all calls except |
190 | the check for whether a given path is a mount point, where it may |
191 | optionally be used to check a specific mount corresponding to a given |
192 | mount point file descriptor, and when requesting the uid and gid of the |
193 | last successful mount on a directory within the autofs file system. |
194 | |
195 | The fields arg1 and arg2 are used to communicate parameters and results of |
196 | calls made as described below. |
197 | |
198 | The path field is used to pass a path where it is needed and the size field |
199 | is used account for the increased structure length when translating the |
200 | structure sent from user space. |
201 | |
202 | This structure can be initialized before setting specific fields by using |
203 | the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). |
204 | |
205 | All of the ioctls perform a copy of this structure from user space to |
206 | kernel space and return -EINVAL if the size parameter is smaller than |
207 | the structure size itself, -ENOMEM if the kernel memory allocation fails |
208 | or -EFAULT if the copy itself fails. Other checks include a version check |
209 | of the compiled in user space version against the module version and a |
210 | mismatch results in a -EINVAL return. If the size field is greater than |
211 | the structure size then a path is assumed to be present and is checked to |
212 | ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is |
213 | returned. Following these checks, for all ioctl commands except |
214 | AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and |
215 | AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is |
216 | not a valid descriptor or doesn't correspond to an autofs mount point |
217 | an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is |
218 | returned. |
219 | |
220 | |
221 | The ioctls |
222 | ========== |
223 | |
224 | An example of an implementation which uses this interface can be seen |
225 | in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the |
226 | distribution tar available for download from kernel.org in directory |
227 | /pub/linux/daemons/autofs/v5. |
228 | |
229 | The device node ioctl operations implemented by this interface are: |
230 | |
231 | |
232 | AUTOFS_DEV_IOCTL_VERSION |
233 | ------------------------ |
234 | |
235 | Get the major and minor version of the autofs4 device ioctl kernel module |
236 | implementation. It requires an initialized struct autofs_dev_ioctl as an |
237 | input parameter and sets the version information in the passed in structure. |
238 | It returns 0 on success or the error -EINVAL if a version mismatch is |
239 | detected. |
240 | |
241 | |
242 | AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD |
243 | ------------------------------------------------------------------ |
244 | |
245 | Get the major and minor version of the autofs4 protocol version understood |
246 | by loaded module. This call requires an initialized struct autofs_dev_ioctl |
247 | with the ioctlfd field set to a valid autofs mount point descriptor |
248 | and sets the requested version number in structure field arg1. These |
249 | commands return 0 on success or one of the negative error codes if |
250 | validation fails. |
251 | |
252 | |
253 | AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT |
254 | ---------------------------------------------------------- |
255 | |
256 | Obtain and release a file descriptor for an autofs managed mount point |
257 | path. The open call requires an initialized struct autofs_dev_ioctl with |
258 | the the path field set and the size field adjusted appropriately as well |
259 | as the arg1 field set to the device number of the autofs mount. The |
260 | device number can be obtained from the mount options shown in |
261 | /proc/mounts. The close call requires an initialized struct |
262 | autofs_dev_ioct with the ioctlfd field set to the descriptor obtained |
263 | from the open call. The release of the file descriptor can also be done |
264 | with close(2) so any open descriptors will also be closed at process exit. |
265 | The close call is included in the implemented operations largely for |
266 | completeness and to provide for a consistent user space implementation. |
267 | |
268 | |
269 | AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD |
270 | -------------------------------------------------------- |
271 | |
272 | Return mount and expire result status from user space to the kernel. |
273 | Both of these calls require an initialized struct autofs_dev_ioctl |
274 | with the ioctlfd field set to the descriptor obtained from the open |
275 | call and the arg1 field set to the wait queue token number, received |
276 | by user space in the foregoing mount or expire request. The arg2 field |
277 | is set to the status to be returned. For the ready call this is always |
278 | 0 and for the fail call it is set to the errno of the operation. |
279 | |
280 | |
281 | AUTOFS_DEV_IOCTL_SETPIPEFD_CMD |
282 | ------------------------------ |
283 | |
284 | Set the pipe file descriptor used for kernel communication to the daemon. |
285 | Normally this is set at mount time using an option but when reconnecting |
286 | to a existing mount we need to use this to tell the autofs mount about |
287 | the new kernel pipe descriptor. In order to protect mounts against |
288 | incorrectly setting the pipe descriptor we also require that the autofs |
289 | mount be catatonic (see next call). |
290 | |
291 | The call requires an initialized struct autofs_dev_ioctl with the |
292 | ioctlfd field set to the descriptor obtained from the open call and |
293 | the arg1 field set to descriptor of the pipe. On success the call |
294 | also sets the process group id used to identify the controlling process |
295 | (eg. the owning automount(8) daemon) to the process group of the caller. |
296 | |
297 | |
298 | AUTOFS_DEV_IOCTL_CATATONIC_CMD |
299 | ------------------------------ |
300 | |
301 | Make the autofs mount point catatonic. The autofs mount will no longer |
302 | issue mount requests, the kernel communication pipe descriptor is released |
303 | and any remaining waits in the queue released. |
304 | |
305 | The call requires an initialized struct autofs_dev_ioctl with the |
306 | ioctlfd field set to the descriptor obtained from the open call. |
307 | |
308 | |
309 | AUTOFS_DEV_IOCTL_TIMEOUT_CMD |
310 | ---------------------------- |
311 | |
312 | Set the expire timeout for mounts withing an autofs mount point. |
313 | |
314 | The call requires an initialized struct autofs_dev_ioctl with the |
315 | ioctlfd field set to the descriptor obtained from the open call. |
316 | |
317 | |
318 | AUTOFS_DEV_IOCTL_REQUESTER_CMD |
319 | ------------------------------ |
320 | |
321 | Return the uid and gid of the last process to successfully trigger a the |
322 | mount on the given path dentry. |
323 | |
324 | The call requires an initialized struct autofs_dev_ioctl with the path |
325 | field set to the mount point in question and the size field adjusted |
326 | appropriately as well as the arg1 field set to the device number of the |
327 | containing autofs mount. Upon return the struct field arg1 contains the |
328 | uid and arg2 the gid. |
329 | |
330 | When reconstructing an autofs mount tree with active mounts we need to |
331 | re-connect to mounts that may have used the original process uid and |
332 | gid (or string variations of them) for mount lookups within the map entry. |
333 | This call provides the ability to obtain this uid and gid so they may be |
334 | used by user space for the mount map lookups. |
335 | |
336 | |
337 | AUTOFS_DEV_IOCTL_EXPIRE_CMD |
338 | --------------------------- |
339 | |
340 | Issue an expire request to the kernel for an autofs mount. Typically |
341 | this ioctl is called until no further expire candidates are found. |
342 | |
343 | The call requires an initialized struct autofs_dev_ioctl with the |
344 | ioctlfd field set to the descriptor obtained from the open call. In |
345 | addition an immediate expire, independent of the mount timeout, can be |
346 | requested by setting the arg1 field to 1. If no expire candidates can |
347 | be found the ioctl returns -1 with errno set to EAGAIN. |
348 | |
349 | This call causes the kernel module to check the mount corresponding |
350 | to the given ioctlfd for mounts that can be expired, issues an expire |
351 | request back to the daemon and waits for completion. |
352 | |
353 | AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD |
354 | ------------------------------ |
355 | |
356 | Checks if an autofs mount point is in use. |
357 | |
358 | The call requires an initialized struct autofs_dev_ioctl with the |
359 | ioctlfd field set to the descriptor obtained from the open call and |
360 | it returns the result in the arg1 field, 1 for busy and 0 otherwise. |
361 | |
362 | |
363 | AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD |
364 | --------------------------------- |
365 | |
366 | Check if the given path is a mountpoint. |
367 | |
368 | The call requires an initialized struct autofs_dev_ioctl. There are two |
369 | possible variations. Both use the path field set to the path of the mount |
370 | point to check and the size field adjusted appropriately. One uses the |
371 | ioctlfd field to identify a specific mount point to check while the other |
372 | variation uses the path and optionally arg1 set to an autofs mount type. |
373 | The call returns 1 if this is a mount point and sets arg1 to the device |
374 | number of the mount and field arg2 to the relevant super block magic |
375 | number (described below) or 0 if it isn't a mountpoint. In both cases |
376 | the the device number (as returned by new_encode_dev()) is returned |
377 | in field arg1. |
378 | |
379 | If supplied with a file descriptor we're looking for a specific mount, |
380 | not necessarily at the top of the mounted stack. In this case the path |
381 | the descriptor corresponds to is considered a mountpoint if it is itself |
382 | a mountpoint or contains a mount, such as a multi-mount without a root |
383 | mount. In this case we return 1 if the descriptor corresponds to a mount |
384 | point and and also returns the super magic of the covering mount if there |
385 | is one or 0 if it isn't a mountpoint. |
386 | |
387 | If a path is supplied (and the ioctlfd field is set to -1) then the path |
388 | is looked up and is checked to see if it is the root of a mount. If a |
389 | type is also given we are looking for a particular autofs mount and if |
390 | a match isn't found a fail is returned. If the the located path is the |
391 | root of a mount 1 is returned along with the super magic of the mount |
392 | or 0 otherwise. |
393 | |
394 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9