Root/
| 1 | This file describes the kernel architecture. It does no go into detail on all |
| 2 | the fields of structs; for that, refer to the source code. |
| 3 | |
| 4 | # Overview |
| 5 | |
| 6 | Iris is an operating system. The kernel should be called "the Iris kernel", |
| 7 | but sometimes it is simply called "Iris". If there can be confusion, the terms |
| 8 | "kernel" and "userspace" are used to clarify. |
| 9 | |
| 10 | Iris uses a capability based microkernel. Being a microkernel means that most |
| 11 | parts that would be part of a monolithic kernel are not part of the Iris |
| 12 | kernel, but of the Iris userspace. Being capability based means that there is |
| 13 | no public dictionary of running processes; in order to communicate with another |
| 14 | process, the caller must have received a capability to them. |
| 15 | |
| 16 | |
| 17 | # First class objects |
| 18 | |
| 19 | First class objects are implemented by the kernel. They can be used through |
| 20 | capabilities. |
| 21 | |
| 22 | - Cap: a single capability. Can be invoked and passed to others. |
| 23 | - Caps: storage container for a fixed number of Cap objects. Every Thread has |
| 24 | at least one of these so it can communicate with the kernel, its parent, and |
| 25 | other processes. |
| 26 | - Receiver: an object that allows to create Cap objects. When those are |
| 27 | invoked, the Receiver's listener receives the message. |
| 28 | - Thread: an execution context. On creation, a number of slots is specified |
| 29 | and space is reserved for that many Caps pointers. Only Cap objects in those |
| 30 | Caps can be invoked from the thread. |
| 31 | - Page: a single page of memory, always 4kB. A Page can be mapped in a Memory |
| 32 | and then accessed by a Thread. |
| 33 | - Memory: Everything[*] needs a Memory object to be stored in. In addition to |
| 34 | storing first class objects, a Memory can own Page objects and map them. A |
| 35 | mapped page is accessible for running Threads that stored in the Memory. |
| 36 | - List: Helper for implementing a list of Cap objects, which are stored by the |
| 37 | caller. A List allows servers to keep a list of clients without paying for |
| 38 | its storage. This prevents a denial of service attack. Each item is stored |
| 39 | with a code that is set and only accessible by the List owner. |
| 40 | - ListItem: an item in a List object. |
| 41 | |
| 42 | [*] There is of course one exception to the rule that everything is stored in a |
| 43 | Memory. Everything is a tree, with Memory objects as nodes and all other |
| 44 | objects as leaves. The root of the tree is not stored in anything. This |
| 45 | node is called the "top Memory". |
| 46 | |
| 47 | Example: A new process consists of a Memory with one or more mapped Page |
| 48 | objects that hold the code and data, a Receiver, a Thread, and a Caps |
| 49 | that contains a Cap for each of those objects, plus one for its parent |
| 50 | process. That Caps is stored in slot 0. |
| 51 | |
| 52 | Note that the kernel provides system calls through capabilities. If a thread |
| 53 | doesn't hold the capability, it cannot make the system call. The parent Cap is |
| 54 | used to request access to other processes, or devices. The Thread has no way |
| 55 | to know if it is talking to the thing it requested, or something that simulates |
| 56 | it. That is intentional; Threads should not be able to detect that they are |
| 57 | being debugged. |
| 58 | |
| 59 | |
| 60 | # Capability invocations |
| 61 | |
| 62 | When a Cap is invoked, a message is sent to the Receiver that created it (or, |
| 63 | if it was created by the kernel, to the kernel). This message contains three |
| 64 | 64 bit numbers (which are usually treated as two 32 bit numbers each) and two |
| 65 | Cap objects. Two of the numbers, named d0 and d1, are passed with the |
| 66 | invocation, the third one is named protected_data and is defined when the Cap |
| 67 | is created. The owner of the Cap cannot see or change protected_data; it is |
| 68 | the target's way of recognizing who's sending the message. |
| 69 | |
| 70 | The Cap objects in the message are called arg and reply. By convention, a call |
| 71 | that requires a reply passes a Cap for it, which will be invoked with the |
| 72 | reply. However, this is only a convention; if a program wants, it can use both |
| 73 | arg and reply as regular arguments if no reply is required. Normally a Caps is |
| 74 | passed in arg if more than one Cap should be sent though. |
| 75 | |
| 76 | Cap objects can be passed around. The target of the invocation cannot see if |
| 77 | the original recipient is calling, or some other process that was given access. |
| 78 | The Receiver does allow to revoke a Cap; after this, any invocation no longer |
| 79 | sends a message to the Receiver. When sending a Cap, a flag specifies whether |
| 80 | it is mapped (the default), or copied. A mapped Cap is revoked when its source |
| 81 | is revoked; a copy is not. To give a Cap to another process and then drop it, |
| 82 | it must be copied. Otherwise the new Cap is immediately revoked. |
| 83 | |
| 84 | |
| 85 | # Interrupts |
| 86 | |
| 87 | Interrupts are handled by one or a few interrupt handlers. In a microkernel, |
| 88 | it would be ideal to let userspace handle them, but that is not reasonable |
| 89 | given the hardware architecture. However, it is possible for the kernel to |
| 90 | find out who should handle it, and then pass it to userspace. In Linux-terms: |
| 91 | the top half is in kernel space, but the bottom half is not. (Note that those |
| 92 | two halves are highly asymmetrical; the top half is very small, the bottom half |
| 93 | can be very large.) So this is what Iris does. A process can register as an |
| 94 | interrupt handler, the kernel masks the interrupt when it arrives, so it isn't |
| 95 | immediately triggered again, enables all interrupts and sends a message to the |
| 96 | registered process. It will normally clear the interrupt condition and |
| 97 | reregister itself as the interrupt handler. The reregistration is required to |
| 98 | avoid queueing of interrupts; if they are not reregistered, they are no longer |
| 99 | handled. |
| 100 | |
| 101 | |
| 102 | # Userspace |
| 103 | |
| 104 | When the system boots, the kernel is started with its first process. This |
| 105 | process sets up userspace. Unlike Linux init systems, the first process does |
| 106 | not continue running; it is hard to change (because the filesystem is not yet |
| 107 | accessible) and so it must be as simple as possible. |
| 108 | |
| 109 | As part of the startup, drivers for built in devices are started. These are |
| 110 | regular userspace programs, most of them handle interrupts and all of them have |
| 111 | access to memory mapped I/O. Note that this means they are just as critical as |
| 112 | the kernel; in a monolithical system, only the kernel needs to be ultimately |
| 113 | trusted (if it is compromised, all is lost). With a microkernel, it's both the |
| 114 | kernel and some parts of userspace. The total amount of trusted code is likely |
| 115 | smaller in a microkernel design, because it is easier to split parts that don't |
| 116 | need to be critical into their own process. |
| 117 | |
| 118 | A user session is a process which can start other processes and switch between |
| 119 | them. For this, it contains the following components: |
| 120 | |
| 121 | - A bag of device Cap objects, which can be mapped to the active process (and |
| 122 | revoked when they are deactivated). What's in the bag can change. For |
| 123 | example, if the user wants sound to continue playing while switching to |
| 124 | another user, the sound Cap must not be in the bag. |
| 125 | - An interface for task switching: when the user makes a system request (which |
| 126 | is some dedicated hardware, such as a button), the active process is |
| 127 | deactivated and the session itself (or a designated helper) is activated. It |
| 128 | allows switching to a different process, or starting a new one, or stopping |
| 129 | or ending running processes. The session can also allow communication |
| 130 | between certain processes. (The processes need to cooperate to actually make |
| 131 | the link; they ask the session for a link of a certain type (for example, a |
| 132 | file system) and the session responds with the Cap or an error. |
| 133 | - There is a list of things that can be started; an important one is a shell, |
| 134 | which allows control over the session. In other words, the shell is able to |
| 135 | start and end other processes, make and break communication links, and define |
| 136 | which programs can be started. |
| 137 | |
| 138 | |
| 139 | # Multi user support |
| 140 | |
| 141 | For multi user support, a login manager is required which can start user |
| 142 | sessions and switch between them. This is very similar to what a user session |
| 143 | does, and so the same process is used for it. Just a few changes are required, |
| 144 | and those can be implemented by choosing different helper programs. |
| 145 | |
| 146 | The login manager lets the user select an identity to log in. The login |
| 147 | program itself is run by the user session, so that users can change the way |
| 148 | they log in without asking the administrator to set it up for them. For |
| 149 | example, one user may set up to only allow logging in with a physical crypto |
| 150 | device, while a guest login may be set up that doesn't require credentials at |
| 151 | all. |
| 152 |
Branches:
master
