The structure of the Reiser file system
For our example partition, part of the S+ tree looks like this (think of the key as a large 128-bit number for now):
Block headers Each disk block that belongs to an internal or leaf node starts with a block header. Only unformatted blocks don't have a block header. A block header is always 24 bytes long and contains the following information:
Name |
Size |
Description |
Level |
2 |
level of the block in the tree |
Nr. of items |
2 |
number of items in the block |
Free space |
2 |
free space left in the block |
Reserved |
2 |
|
Right key |
16 |
right delimiting key for the block |
The right delimiting key was originally used for leaf nodes but is now only kept for compatibility.
Example:
The following is the block header of block 8416, the leftmost leaf node in the tree. 00000000 01 00 06 00 e4 04 00 00 00 00 00 00 00 00 00 00 ....ä...........
00000010 00 00 00 00 00 00 00 00
Level: 1 Items: 6 Free space: 1252 bytes
Keys Keys are used in the Reiser file system to uniquely identify items, but also to locate them in the tree and achieve local groupings of items that belong together. A key consists of four objects: the directory id, the object id, the offset within the object, and a type. Note that the actual object identifier is only one part of the key. The directory id is present so that files that belong into the same directory are grouped together and for the most part are located in the same subtree(s). The offset is present because an indirect item can at most contain (blocksize-48)/4 pointers to unformatted blocks (see indirect items below). For a block size of 4096 bytes this would result in a maximum file size of 4048KB. To be able to handle larger files, multiple keys are used to reference the file. All fields of the key are the same, except for the offset, which denotes the offset in bytes of the file, which a particular key references. I do not know why the type of an object is part of the actual key.
In reiserfs up until version 3.5 the offset and the type fields were both 4 byte values. This meant, that the maximum file size was limited to roughly 2^32 bytes, or 4GB (2^32 bytes plus the data of one more indirect item plus the tail, actually). To increase the maximum file size in the file system, in version 3.6, the offset field was increased to 60 bits, and the type field shrunk to 4 bits. This now allows for a theoretical maximum file size of 2^60 bytes, but since there can be only 2^32 blocks with a maximum of 2^16 bytes per block, the file system itself only supports 2^48 bytes.
In order not to be incompatible to older versions of the file system, there are now to different versions of keys around, which can be very confusing as the key itself doesn't carry a version number. To make up for this, the formerly reserved last 16 bits of the item header now contain a version number, so if necessary, the key's version number can be obtained from there. This makes it fairly straightforward for keys contained in leaf nodes, but if one really wanted to determine the version of a key inside an internal node, one would have to follow the tree down to the leaf, first. The code in the reiserfs library actually uses this ugly hack to determine the key format: |