Root/
1 | Red-black Trees (rbtree) in Linux |
2 | January 18, 2007 |
3 | Rob Landley <rob@landley.net> |
4 | ============================= |
5 | |
6 | What are red-black trees, and what are they for? |
7 | ------------------------------------------------ |
8 | |
9 | Red-black trees are a type of self-balancing binary search tree, used for |
10 | storing sortable key/value data pairs. This differs from radix trees (which |
11 | are used to efficiently store sparse arrays and thus use long integer indexes |
12 | to insert/access/delete nodes) and hash tables (which are not kept sorted to |
13 | be easily traversed in order, and must be tuned for a specific size and |
14 | hash function where rbtrees scale gracefully storing arbitrary keys). |
15 | |
16 | Red-black trees are similar to AVL trees, but provide faster real-time bounded |
17 | worst case performance for insertion and deletion (at most two rotations and |
18 | three rotations, respectively, to balance the tree), with slightly slower |
19 | (but still O(log n)) lookup time. |
20 | |
21 | To quote Linux Weekly News: |
22 | |
23 | There are a number of red-black trees in use in the kernel. |
24 | The deadline and CFQ I/O schedulers employ rbtrees to |
25 | track requests; the packet CD/DVD driver does the same. |
26 | The high-resolution timer code uses an rbtree to organize outstanding |
27 | timer requests. The ext3 filesystem tracks directory entries in a |
28 | red-black tree. Virtual memory areas (VMAs) are tracked with red-black |
29 | trees, as are epoll file descriptors, cryptographic keys, and network |
30 | packets in the "hierarchical token bucket" scheduler. |
31 | |
32 | This document covers use of the Linux rbtree implementation. For more |
33 | information on the nature and implementation of Red Black Trees, see: |
34 | |
35 | Linux Weekly News article on red-black trees |
36 | http://lwn.net/Articles/184495/ |
37 | |
38 | Wikipedia entry on red-black trees |
39 | http://en.wikipedia.org/wiki/Red-black_tree |
40 | |
41 | Linux implementation of red-black trees |
42 | --------------------------------------- |
43 | |
44 | Linux's rbtree implementation lives in the file "lib/rbtree.c". To use it, |
45 | "#include <linux/rbtree.h>". |
46 | |
47 | The Linux rbtree implementation is optimized for speed, and thus has one |
48 | less layer of indirection (and better cache locality) than more traditional |
49 | tree implementations. Instead of using pointers to separate rb_node and data |
50 | structures, each instance of struct rb_node is embedded in the data structure |
51 | it organizes. And instead of using a comparison callback function pointer, |
52 | users are expected to write their own tree search and insert functions |
53 | which call the provided rbtree functions. Locking is also left up to the |
54 | user of the rbtree code. |
55 | |
56 | Creating a new rbtree |
57 | --------------------- |
58 | |
59 | Data nodes in an rbtree tree are structures containing a struct rb_node member: |
60 | |
61 | struct mytype { |
62 | struct rb_node node; |
63 | char *keystring; |
64 | }; |
65 | |
66 | When dealing with a pointer to the embedded struct rb_node, the containing data |
67 | structure may be accessed with the standard container_of() macro. In addition, |
68 | individual members may be accessed directly via rb_entry(node, type, member). |
69 | |
70 | At the root of each rbtree is an rb_root structure, which is initialized to be |
71 | empty via: |
72 | |
73 | struct rb_root mytree = RB_ROOT; |
74 | |
75 | Searching for a value in an rbtree |
76 | ---------------------------------- |
77 | |
78 | Writing a search function for your tree is fairly straightforward: start at the |
79 | root, compare each value, and follow the left or right branch as necessary. |
80 | |
81 | Example: |
82 | |
83 | struct mytype *my_search(struct rb_root *root, char *string) |
84 | { |
85 | struct rb_node *node = root->rb_node; |
86 | |
87 | while (node) { |
88 | struct mytype *data = container_of(node, struct mytype, node); |
89 | int result; |
90 | |
91 | result = strcmp(string, data->keystring); |
92 | |
93 | if (result < 0) |
94 | node = node->rb_left; |
95 | else if (result > 0) |
96 | node = node->rb_right; |
97 | else |
98 | return data; |
99 | } |
100 | return NULL; |
101 | } |
102 | |
103 | Inserting data into an rbtree |
104 | ----------------------------- |
105 | |
106 | Inserting data in the tree involves first searching for the place to insert the |
107 | new node, then inserting the node and rebalancing ("recoloring") the tree. |
108 | |
109 | The search for insertion differs from the previous search by finding the |
110 | location of the pointer on which to graft the new node. The new node also |
111 | needs a link to its parent node for rebalancing purposes. |
112 | |
113 | Example: |
114 | |
115 | int my_insert(struct rb_root *root, struct mytype *data) |
116 | { |
117 | struct rb_node **new = &(root->rb_node), *parent = NULL; |
118 | |
119 | /* Figure out where to put new node */ |
120 | while (*new) { |
121 | struct mytype *this = container_of(*new, struct mytype, node); |
122 | int result = strcmp(data->keystring, this->keystring); |
123 | |
124 | parent = *new; |
125 | if (result < 0) |
126 | new = &((*new)->rb_left); |
127 | else if (result > 0) |
128 | new = &((*new)->rb_right); |
129 | else |
130 | return FALSE; |
131 | } |
132 | |
133 | /* Add new node and rebalance tree. */ |
134 | rb_link_node(&data->node, parent, new); |
135 | rb_insert_color(&data->node, root); |
136 | |
137 | return TRUE; |
138 | } |
139 | |
140 | Removing or replacing existing data in an rbtree |
141 | ------------------------------------------------ |
142 | |
143 | To remove an existing node from a tree, call: |
144 | |
145 | void rb_erase(struct rb_node *victim, struct rb_root *tree); |
146 | |
147 | Example: |
148 | |
149 | struct mytype *data = mysearch(&mytree, "walrus"); |
150 | |
151 | if (data) { |
152 | rb_erase(&data->node, &mytree); |
153 | myfree(data); |
154 | } |
155 | |
156 | To replace an existing node in a tree with a new one with the same key, call: |
157 | |
158 | void rb_replace_node(struct rb_node *old, struct rb_node *new, |
159 | struct rb_root *tree); |
160 | |
161 | Replacing a node this way does not re-sort the tree: If the new node doesn't |
162 | have the same key as the old node, the rbtree will probably become corrupted. |
163 | |
164 | Iterating through the elements stored in an rbtree (in sort order) |
165 | ------------------------------------------------------------------ |
166 | |
167 | Four functions are provided for iterating through an rbtree's contents in |
168 | sorted order. These work on arbitrary trees, and should not need to be |
169 | modified or wrapped (except for locking purposes): |
170 | |
171 | struct rb_node *rb_first(struct rb_root *tree); |
172 | struct rb_node *rb_last(struct rb_root *tree); |
173 | struct rb_node *rb_next(struct rb_node *node); |
174 | struct rb_node *rb_prev(struct rb_node *node); |
175 | |
176 | To start iterating, call rb_first() or rb_last() with a pointer to the root |
177 | of the tree, which will return a pointer to the node structure contained in |
178 | the first or last element in the tree. To continue, fetch the next or previous |
179 | node by calling rb_next() or rb_prev() on the current node. This will return |
180 | NULL when there are no more nodes left. |
181 | |
182 | The iterator functions return a pointer to the embedded struct rb_node, from |
183 | which the containing data structure may be accessed with the container_of() |
184 | macro, and individual members may be accessed directly via |
185 | rb_entry(node, type, member). |
186 | |
187 | Example: |
188 | |
189 | struct rb_node *node; |
190 | for (node = rb_first(&mytree); node; node = rb_next(node)) |
191 | printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); |
192 | |
193 | Support for Augmented rbtrees |
194 | ----------------------------- |
195 | |
196 | Augmented rbtree is an rbtree with "some" additional data stored in |
197 | each node, where the additional data for node N must be a function of |
198 | the contents of all nodes in the subtree rooted at N. This data can |
199 | be used to augment some new functionality to rbtree. Augmented rbtree |
200 | is an optional feature built on top of basic rbtree infrastructure. |
201 | An rbtree user who wants this feature will have to call the augmentation |
202 | functions with the user provided augmentation callback when inserting |
203 | and erasing nodes. |
204 | |
205 | C files implementing augmented rbtree manipulation must include |
206 | <linux/rbtree_augmented.h> instead of <linus/rbtree.h>. Note that |
207 | linux/rbtree_augmented.h exposes some rbtree implementations details |
208 | you are not expected to rely on; please stick to the documented APIs |
209 | there and do not include <linux/rbtree_augmented.h> from header files |
210 | either so as to minimize chances of your users accidentally relying on |
211 | such implementation details. |
212 | |
213 | On insertion, the user must update the augmented information on the path |
214 | leading to the inserted node, then call rb_link_node() as usual and |
215 | rb_augment_inserted() instead of the usual rb_insert_color() call. |
216 | If rb_augment_inserted() rebalances the rbtree, it will callback into |
217 | a user provided function to update the augmented information on the |
218 | affected subtrees. |
219 | |
220 | When erasing a node, the user must call rb_erase_augmented() instead of |
221 | rb_erase(). rb_erase_augmented() calls back into user provided functions |
222 | to updated the augmented information on affected subtrees. |
223 | |
224 | In both cases, the callbacks are provided through struct rb_augment_callbacks. |
225 | 3 callbacks must be defined: |
226 | |
227 | - A propagation callback, which updates the augmented value for a given |
228 | node and its ancestors, up to a given stop point (or NULL to update |
229 | all the way to the root). |
230 | |
231 | - A copy callback, which copies the augmented value for a given subtree |
232 | to a newly assigned subtree root. |
233 | |
234 | - A tree rotation callback, which copies the augmented value for a given |
235 | subtree to a newly assigned subtree root AND recomputes the augmented |
236 | information for the former subtree root. |
237 | |
238 | The compiled code for rb_erase_augmented() may inline the propagation and |
239 | copy callbacks, which results in a large function, so each augmented rbtree |
240 | user should have a single rb_erase_augmented() call site in order to limit |
241 | compiled code size. |
242 | |
243 | |
244 | Sample usage: |
245 | |
246 | Interval tree is an example of augmented rb tree. Reference - |
247 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. |
248 | More details about interval trees: |
249 | |
250 | Classical rbtree has a single key and it cannot be directly used to store |
251 | interval ranges like [lo:hi] and do a quick lookup for any overlap with a new |
252 | lo:hi or to find whether there is an exact match for a new lo:hi. |
253 | |
254 | However, rbtree can be augmented to store such interval ranges in a structured |
255 | way making it possible to do efficient lookup and exact match. |
256 | |
257 | This "extra information" stored in each node is the maximum hi |
258 | (max_hi) value among all the nodes that are its descendents. This |
259 | information can be maintained at each node just be looking at the node |
260 | and its immediate children. And this will be used in O(log n) lookup |
261 | for lowest match (lowest start address among all possible matches) |
262 | with something like: |
263 | |
264 | struct interval_tree_node * |
265 | interval_tree_first_match(struct rb_root *root, |
266 | unsigned long start, unsigned long last) |
267 | { |
268 | struct interval_tree_node *node; |
269 | |
270 | if (!root->rb_node) |
271 | return NULL; |
272 | node = rb_entry(root->rb_node, struct interval_tree_node, rb); |
273 | |
274 | while (true) { |
275 | if (node->rb.rb_left) { |
276 | struct interval_tree_node *left = |
277 | rb_entry(node->rb.rb_left, |
278 | struct interval_tree_node, rb); |
279 | if (left->__subtree_last >= start) { |
280 | /* |
281 | * Some nodes in left subtree satisfy Cond2. |
282 | * Iterate to find the leftmost such node N. |
283 | * If it also satisfies Cond1, that's the match |
284 | * we are looking for. Otherwise, there is no |
285 | * matching interval as nodes to the right of N |
286 | * can't satisfy Cond1 either. |
287 | */ |
288 | node = left; |
289 | continue; |
290 | } |
291 | } |
292 | if (node->start <= last) { /* Cond1 */ |
293 | if (node->last >= start) /* Cond2 */ |
294 | return node; /* node is leftmost match */ |
295 | if (node->rb.rb_right) { |
296 | node = rb_entry(node->rb.rb_right, |
297 | struct interval_tree_node, rb); |
298 | if (node->__subtree_last >= start) |
299 | continue; |
300 | } |
301 | } |
302 | return NULL; /* No match */ |
303 | } |
304 | } |
305 | |
306 | Insertion/removal are defined using the following augmented callbacks: |
307 | |
308 | static inline unsigned long |
309 | compute_subtree_last(struct interval_tree_node *node) |
310 | { |
311 | unsigned long max = node->last, subtree_last; |
312 | if (node->rb.rb_left) { |
313 | subtree_last = rb_entry(node->rb.rb_left, |
314 | struct interval_tree_node, rb)->__subtree_last; |
315 | if (max < subtree_last) |
316 | max = subtree_last; |
317 | } |
318 | if (node->rb.rb_right) { |
319 | subtree_last = rb_entry(node->rb.rb_right, |
320 | struct interval_tree_node, rb)->__subtree_last; |
321 | if (max < subtree_last) |
322 | max = subtree_last; |
323 | } |
324 | return max; |
325 | } |
326 | |
327 | static void augment_propagate(struct rb_node *rb, struct rb_node *stop) |
328 | { |
329 | while (rb != stop) { |
330 | struct interval_tree_node *node = |
331 | rb_entry(rb, struct interval_tree_node, rb); |
332 | unsigned long subtree_last = compute_subtree_last(node); |
333 | if (node->__subtree_last == subtree_last) |
334 | break; |
335 | node->__subtree_last = subtree_last; |
336 | rb = rb_parent(&node->rb); |
337 | } |
338 | } |
339 | |
340 | static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new) |
341 | { |
342 | struct interval_tree_node *old = |
343 | rb_entry(rb_old, struct interval_tree_node, rb); |
344 | struct interval_tree_node *new = |
345 | rb_entry(rb_new, struct interval_tree_node, rb); |
346 | |
347 | new->__subtree_last = old->__subtree_last; |
348 | } |
349 | |
350 | static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) |
351 | { |
352 | struct interval_tree_node *old = |
353 | rb_entry(rb_old, struct interval_tree_node, rb); |
354 | struct interval_tree_node *new = |
355 | rb_entry(rb_new, struct interval_tree_node, rb); |
356 | |
357 | new->__subtree_last = old->__subtree_last; |
358 | old->__subtree_last = compute_subtree_last(old); |
359 | } |
360 | |
361 | static const struct rb_augment_callbacks augment_callbacks = { |
362 | augment_propagate, augment_copy, augment_rotate |
363 | }; |
364 | |
365 | void interval_tree_insert(struct interval_tree_node *node, |
366 | struct rb_root *root) |
367 | { |
368 | struct rb_node **link = &root->rb_node, *rb_parent = NULL; |
369 | unsigned long start = node->start, last = node->last; |
370 | struct interval_tree_node *parent; |
371 | |
372 | while (*link) { |
373 | rb_parent = *link; |
374 | parent = rb_entry(rb_parent, struct interval_tree_node, rb); |
375 | if (parent->__subtree_last < last) |
376 | parent->__subtree_last = last; |
377 | if (start < parent->start) |
378 | link = &parent->rb.rb_left; |
379 | else |
380 | link = &parent->rb.rb_right; |
381 | } |
382 | |
383 | node->__subtree_last = last; |
384 | rb_link_node(&node->rb, rb_parent, link); |
385 | rb_insert_augmented(&node->rb, root, &augment_callbacks); |
386 | } |
387 | |
388 | void interval_tree_remove(struct interval_tree_node *node, |
389 | struct rb_root *root) |
390 | { |
391 | rb_erase_augmented(&node->rb, root, &augment_callbacks); |
392 | } |
393 |
Branches:
ben-wpan
ben-wpan-stefan
javiroman/ks7010
jz-2.6.34
jz-2.6.34-rc5
jz-2.6.34-rc6
jz-2.6.34-rc7
jz-2.6.35
jz-2.6.36
jz-2.6.37
jz-2.6.38
jz-2.6.39
jz-3.0
jz-3.1
jz-3.11
jz-3.12
jz-3.13
jz-3.15
jz-3.16
jz-3.18-dt
jz-3.2
jz-3.3
jz-3.4
jz-3.5
jz-3.6
jz-3.6-rc2-pwm
jz-3.9
jz-3.9-clk
jz-3.9-rc8
jz47xx
jz47xx-2.6.38
master
Tags:
od-2011-09-04
od-2011-09-18
v2.6.34-rc5
v2.6.34-rc6
v2.6.34-rc7
v3.9