Root/
1 | Changes since 2.5.0: |
2 | |
3 | --- |
4 | [recommended] |
5 | |
6 | New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), |
7 | sb_set_blocksize() and sb_min_blocksize(). |
8 | |
9 | Use them. |
10 | |
11 | (sb_find_get_block() replaces 2.4's get_hash_table()) |
12 | |
13 | --- |
14 | [recommended] |
15 | |
16 | New methods: ->alloc_inode() and ->destroy_inode(). |
17 | |
18 | Remove inode->u.foo_inode_i |
19 | Declare |
20 | struct foo_inode_info { |
21 | /* fs-private stuff */ |
22 | struct inode vfs_inode; |
23 | }; |
24 | static inline struct foo_inode_info *FOO_I(struct inode *inode) |
25 | { |
26 | return list_entry(inode, struct foo_inode_info, vfs_inode); |
27 | } |
28 | |
29 | Use FOO_I(inode) instead of &inode->u.foo_inode_i; |
30 | |
31 | Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate |
32 | foo_inode_info and return the address of ->vfs_inode, the latter should free |
33 | FOO_I(inode) (see in-tree filesystems for examples). |
34 | |
35 | Make them ->alloc_inode and ->destroy_inode in your super_operations. |
36 | |
37 | Keep in mind that now you need explicit initialization of private data |
38 | typically between calling iget_locked() and unlocking the inode. |
39 | |
40 | At some point that will become mandatory. |
41 | |
42 | --- |
43 | [mandatory] |
44 | |
45 | Change of file_system_type method (->read_super to ->get_sb) |
46 | |
47 | ->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. |
48 | |
49 | Turn your foo_read_super() into a function that would return 0 in case of |
50 | success and negative number in case of error (-EINVAL unless you have more |
51 | informative error value to report). Call it foo_fill_super(). Now declare |
52 | |
53 | int foo_get_sb(struct file_system_type *fs_type, |
54 | int flags, const char *dev_name, void *data, struct vfsmount *mnt) |
55 | { |
56 | return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, |
57 | mnt); |
58 | } |
59 | |
60 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of |
61 | filesystem). |
62 | |
63 | Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as |
64 | foo_get_sb. |
65 | |
66 | --- |
67 | [mandatory] |
68 | |
69 | Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames. |
70 | Most likely there is no need to change anything, but if you relied on |
71 | global exclusion between renames for some internal purpose - you need to |
72 | change your internal locking. Otherwise exclusion warranties remain the |
73 | same (i.e. parents and victim are locked, etc.). |
74 | |
75 | --- |
76 | [informational] |
77 | |
78 | Now we have the exclusion between ->lookup() and directory removal (by |
79 | ->rmdir() and ->rename()). If you used to need that exclusion and do |
80 | it by internal locking (most of filesystems couldn't care less) - you |
81 | can relax your locking. |
82 | |
83 | --- |
84 | [mandatory] |
85 | |
86 | ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), |
87 | ->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() |
88 | and ->readdir() are called without BKL now. Grab it on entry, drop upon return |
89 | - that will guarantee the same locking you used to have. If your method or its |
90 | parts do not need BKL - better yet, now you can shift lock_kernel() and |
91 | unlock_kernel() so that they would protect exactly what needs to be |
92 | protected. |
93 | |
94 | --- |
95 | [mandatory] |
96 | |
97 | BKL is also moved from around sb operations. ->write_super() Is now called |
98 | without BKL held. BKL should have been shifted into individual fs sb_op |
99 | functions. If you don't need it, remove it. |
100 | |
101 | --- |
102 | [informational] |
103 | |
104 | check for ->link() target not being a directory is done by callers. Feel |
105 | free to drop it... |
106 | |
107 | --- |
108 | [informational] |
109 | |
110 | ->link() callers hold ->i_mutex on the object we are linking to. Some of your |
111 | problems might be over... |
112 | |
113 | --- |
114 | [mandatory] |
115 | |
116 | new file_system_type method - kill_sb(superblock). If you are converting |
117 | an existing filesystem, set it according to ->fs_flags: |
118 | FS_REQUIRES_DEV - kill_block_super |
119 | FS_LITTER - kill_litter_super |
120 | neither - kill_anon_super |
121 | FS_LITTER is gone - just remove it from fs_flags. |
122 | |
123 | --- |
124 | [mandatory] |
125 | |
126 | FS_SINGLE is gone (actually, that had happened back when ->get_sb() |
127 | went in - and hadn't been documented ;-/). Just remove it from fs_flags |
128 | (and see ->get_sb() entry for other actions). |
129 | |
130 | --- |
131 | [mandatory] |
132 | |
133 | ->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so |
134 | watch for ->i_mutex-grabbing code that might be used by your ->setattr(). |
135 | Callers of notify_change() need ->i_mutex now. |
136 | |
137 | --- |
138 | [recommended] |
139 | |
140 | New super_block field "struct export_operations *s_export_op" for |
141 | explicit support for exporting, e.g. via NFS. The structure is fully |
142 | documented at its declaration in include/linux/fs.h, and in |
143 | Documentation/filesystems/nfs/Exporting. |
144 | |
145 | Briefly it allows for the definition of decode_fh and encode_fh operations |
146 | to encode and decode filehandles, and allows the filesystem to use |
147 | a standard helper function for decode_fh, and provide file-system specific |
148 | support for this helper, particularly get_parent. |
149 | |
150 | It is planned that this will be required for exporting once the code |
151 | settles down a bit. |
152 | |
153 | [mandatory] |
154 | |
155 | s_export_op is now required for exporting a filesystem. |
156 | isofs, ext2, ext3, resierfs, fat |
157 | can be used as examples of very different filesystems. |
158 | |
159 | --- |
160 | [mandatory] |
161 | |
162 | iget4() and the read_inode2 callback have been superseded by iget5_locked() |
163 | which has the following prototype, |
164 | |
165 | struct inode *iget5_locked(struct super_block *sb, unsigned long ino, |
166 | int (*test)(struct inode *, void *), |
167 | int (*set)(struct inode *, void *), |
168 | void *data); |
169 | |
170 | 'test' is an additional function that can be used when the inode |
171 | number is not sufficient to identify the actual file object. 'set' |
172 | should be a non-blocking function that initializes those parts of a |
173 | newly created inode to allow the test function to succeed. 'data' is |
174 | passed as an opaque value to both test and set functions. |
175 | |
176 | When the inode has been created by iget5_locked(), it will be returned with the |
177 | I_NEW flag set and will still be locked. The filesystem then needs to finalize |
178 | the initialization. Once the inode is initialized it must be unlocked by |
179 | calling unlock_new_inode(). |
180 | |
181 | The filesystem is responsible for setting (and possibly testing) i_ino |
182 | when appropriate. There is also a simpler iget_locked function that |
183 | just takes the superblock and inode number as arguments and does the |
184 | test and set for you. |
185 | |
186 | e.g. |
187 | inode = iget_locked(sb, ino); |
188 | if (inode->i_state & I_NEW) { |
189 | err = read_inode_from_disk(inode); |
190 | if (err < 0) { |
191 | iget_failed(inode); |
192 | return err; |
193 | } |
194 | unlock_new_inode(inode); |
195 | } |
196 | |
197 | Note that if the process of setting up a new inode fails, then iget_failed() |
198 | should be called on the inode to render it dead, and an appropriate error |
199 | should be passed back to the caller. |
200 | |
201 | --- |
202 | [recommended] |
203 | |
204 | ->getattr() finally getting used. See instances in nfs, minix, etc. |
205 | |
206 | --- |
207 | [mandatory] |
208 | |
209 | ->revalidate() is gone. If your filesystem had it - provide ->getattr() |
210 | and let it call whatever you had as ->revlidate() + (for symlinks that |
211 | had ->revalidate()) add calls in ->follow_link()/->readlink(). |
212 | |
213 | --- |
214 | [mandatory] |
215 | |
216 | ->d_parent changes are not protected by BKL anymore. Read access is safe |
217 | if at least one of the following is true: |
218 | * filesystem has no cross-directory rename() |
219 | * we know that parent had been locked (e.g. we are looking at |
220 | ->d_parent of ->lookup() argument). |
221 | * we are called from ->rename(). |
222 | * the child's ->d_lock is held |
223 | Audit your code and add locking if needed. Notice that any place that is |
224 | not protected by the conditions above is risky even in the old tree - you |
225 | had been relying on BKL and that's prone to screwups. Old tree had quite |
226 | a few holes of that kind - unprotected access to ->d_parent leading to |
227 | anything from oops to silent memory corruption. |
228 | |
229 | --- |
230 | [mandatory] |
231 | |
232 | FS_NOMOUNT is gone. If you use it - just set MS_NOUSER in flags |
233 | (see rootfs for one kind of solution and bdev/socket/pipe for another). |
234 | |
235 | --- |
236 | [recommended] |
237 | |
238 | Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter |
239 | is still alive, but only because of the mess in drivers/s390/block/dasd.c. |
240 | As soon as it gets fixed is_read_only() will die. |
241 | |
242 | --- |
243 | [mandatory] |
244 | |
245 | ->permission() is called without BKL now. Grab it on entry, drop upon |
246 | return - that will guarantee the same locking you used to have. If |
247 | your method or its parts do not need BKL - better yet, now you can |
248 | shift lock_kernel() and unlock_kernel() so that they would protect |
249 | exactly what needs to be protected. |
250 | |
251 | --- |
252 | [mandatory] |
253 | |
254 | ->statfs() is now called without BKL held. BKL should have been |
255 | shifted into individual fs sb_op functions where it's not clear that |
256 | it's safe to remove it. If you don't need it, remove it. |
257 | |
258 | --- |
259 | [mandatory] |
260 | |
261 | is_read_only() is gone; use bdev_read_only() instead. |
262 | |
263 | --- |
264 | [mandatory] |
265 | |
266 | destroy_buffers() is gone; use invalidate_bdev(). |
267 | |
268 | --- |
269 | [mandatory] |
270 | |
271 | fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is |
272 | deliberate; as soon as struct block_device * is propagated in a reasonable |
273 | way by that code fixing will become trivial; until then nothing can be |
274 | done. |
275 | |
276 | [mandatory] |
277 | |
278 | block truncatation on error exit from ->write_begin, and ->direct_IO |
279 | moved from generic methods (block_write_begin, cont_write_begin, |
280 | nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at |
281 | ext2_write_failed and callers for an example. |
282 | |
283 | [mandatory] |
284 | |
285 | ->truncate is going away. The whole truncate sequence needs to be |
286 | implemented in ->setattr, which is now mandatory for filesystems |
287 | implementing on-disk size changes. Start with a copy of the old inode_setattr |
288 | and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to |
289 | be in order of zeroing blocks using block_truncate_page or similar helpers, |
290 | size update and on finally on-disk truncation which should not fail. |
291 | inode_change_ok now includes the size checks for ATTR_SIZE and must be called |
292 | in the beginning of ->setattr unconditionally. |
293 | |
294 | [mandatory] |
295 | |
296 | ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should |
297 | be used instead. It gets called whenever the inode is evicted, whether it has |
298 | remaining links or not. Caller does *not* evict the pagecache or inode-associated |
299 | metadata buffers; getting rid of those is responsibility of method, as it had |
300 | been for ->delete_inode(). |
301 | |
302 | ->drop_inode() returns int now; it's called on final iput() with |
303 | inode->i_lock held and it returns true if filesystems wants the inode to be |
304 | dropped. As before, generic_drop_inode() is still the default and it's been |
305 | updated appropriately. generic_delete_inode() is also alive and it consists |
306 | simply of return 1. Note that all actual eviction work is done by caller after |
307 | ->drop_inode() returns. |
308 | |
309 | clear_inode() is gone; use end_writeback() instead. As before, it must |
310 | be called exactly once on each call of ->evict_inode() (as it used to be for |
311 | each call of ->delete_inode()). Unlike before, if you are using inode-associated |
312 | metadata buffers (i.e. mark_buffer_dirty_inode()), it's your responsibility to |
313 | call invalidate_inode_buffers() before end_writeback(). |
314 | No async writeback (and thus no calls of ->write_inode()) will happen |
315 | after end_writeback() returns, so actions that should not overlap with ->write_inode() |
316 | (e.g. freeing on-disk inode if i_nlink is 0) ought to be done after that call. |
317 | |
318 | NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out |
319 | if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput() |
320 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly |
321 | free the on-disk inode, you may end up doing that while ->write_inode() is writing |
322 | to it. |
323 | |
324 | --- |
325 | [mandatory] |
326 | |
327 | .d_delete() now only advises the dcache as to whether or not to cache |
328 | unreferenced dentries, and is now only called when the dentry refcount goes to |
329 | 0. Even on 0 refcount transition, it must be able to tolerate being called 0, |
330 | 1, or more times (eg. constant, idempotent). |
331 | |
332 | --- |
333 | [mandatory] |
334 | |
335 | .d_compare() calling convention and locking rules are significantly |
336 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and |
337 | look at examples of other filesystems) for guidance. |
338 | |
339 | --- |
340 | [mandatory] |
341 | |
342 | .d_hash() calling convention and locking rules are significantly |
343 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and |
344 | look at examples of other filesystems) for guidance. |
345 | |
346 | --- |
347 | [mandatory] |
348 | dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c |
349 | for details of what locks to replace dcache_lock with in order to protect |
350 | particular things. Most of the time, a filesystem only needs ->d_lock, which |
351 | protects *all* the dcache state of a given dentry. |
352 | |
353 | -- |
354 | [mandatory] |
355 | |
356 | Filesystems must RCU-free their inodes, if they can have been accessed |
357 | via rcu-walk path walk (basically, if the file can have had a path name in the |
358 | vfs namespace). |
359 | |
360 | i_dentry and i_rcu share storage in a union, and the vfs expects |
361 | i_dentry to be reinitialized before it is freed, so an: |
362 | |
363 | INIT_LIST_HEAD(&inode->i_dentry); |
364 | |
365 | must be done in the RCU callback. |
366 | |
367 | -- |
368 | [recommended] |
369 | vfs now tries to do path walking in "rcu-walk mode", which avoids |
370 | atomic operations and scalability hazards on dentries and inodes (see |
371 | Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes |
372 | (above) are examples of the changes required to support this. For more complex |
373 | filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so |
374 | no changes are required to the filesystem. However, this is costly and loses |
375 | the benefits of rcu-walk mode. We will begin to add filesystem callbacks that |
376 | are rcu-walk aware, shown below. Filesystems should take advantage of this |
377 | where possible. |
378 | |
379 | -- |
380 | [mandatory] |
381 | d_revalidate is a callback that is made on every path element (if |
382 | the filesystem provides it), which requires dropping out of rcu-walk mode. This |
383 | may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be |
384 | returned if the filesystem cannot handle rcu-walk. See |
385 | Documentation/filesystems/vfs.txt for more details. |
386 | |
387 | permission and check_acl are inode permission checks that are called |
388 | on many or all directory inodes on the way down a path walk (to check for |
389 | exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU). |
390 | See Documentation/filesystems/vfs.txt for more details. |
391 | |
392 | -- |
393 | [mandatory] |
394 | In ->fallocate() you must check the mode option passed in. If your |
395 | filesystem does not support hole punching (deallocating space in the middle of a |
396 | file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. |
397 | Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, |
398 | so the i_size should not change when hole punching, even when puching the end of |
399 | a file off. |
400 | |
401 | -- |
402 | [mandatory] |
403 | ->get_sb() is gone. Switch to use of ->mount(). Typically it's just |
404 | a matter of switching from calling get_sb_... to mount_... and changing the |
405 | function type. If you were doing it manually, just switch from setting ->mnt_root |
406 | to some pointer to returning that pointer. On errors return ERR_PTR(...). |
407 | |
408 | -- |
409 | [mandatory] |
410 | ->permission() and generic_permission()have lost flags |
411 | argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. |
412 | generic_permission() has also lost the check_acl argument; ACL checking |
413 | has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl |
414 | to read an ACL from disk. |
415 | |
416 | -- |
417 | [mandatory] |
418 | If you implement your own ->llseek() you must handle SEEK_HOLE and |
419 | SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to |
420 | support it in some way. The generic handler assumes that the entire file is |
421 | data and there is a virtual hole at the end of the file. So if the provided |
422 | offset is less than i_size and SEEK_DATA is specified, return the same offset. |
423 | If the above is true for the offset and you are given SEEK_HOLE, return the end |
424 | of the file. If the offset is i_size or greater return -ENXIO in either case. |
425 | |
426 | [mandatory] |
427 | If you have your own ->fsync() you must make sure to call |
428 | filemap_write_and_wait_range() so that all dirty pages are synced out properly. |
429 | You must also keep in mind that ->fsync() is not called with i_mutex held |
430 | anymore, so if you require i_mutex locking you must make sure to take it and |
431 | release it yourself. |
432 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9