File System Design part 1: XFS

All "leaf" nodes are at the same level of the tree and are connected to each other, so you can move between them without having to go back up into the tree.

So, a B+Tree is a type of balanced tree, with these extra features to ensure better performance on discs. XFS uses a ton of these. For example, XFS stores inodes in B+Trees. You can imagine that searching a balanced binary tree, in this case a B+Tree, for a file is a lot more efficient and scalable than searching a list. So, now that we understand the data structure on which XFS is based, we can look at the basic design of the file system.

The Design Of XFS

Let’s start with the XFS version of cylinder groups, allocation groups. These are meant to solve the same thrashing problem we saw before. There is another reason for allocation groups in XFS, they are completely autonomous. The kernel can interact with multiple allocation groups at the same time. This makes XFS very multi-thread friendly. Like in FFS, these allocation groups store the inode list for their region of the drive. However, in XFS these inodes are stored in B+Trees and actually are stored in two B+Trees, one for free inodes and the other for used inodes. This makes creating a new file a very efficient operation, since you don’t have to search for a free inode. Now, lets talk about the inodes themselves. XFS is designed for large files. Because of this, the XFS inodes use "extents" to define data block ranges. An extent is a starting address on the disc and an offset from that address. These are much more efficient than actually listing all the address of the data blocks which make up a file. Extents are really better for long continuous stretches of data. They are very poor for fragmented discs, as short stretches of data turn an extent list into just a list of addresses where data is stored, just like the old system. XFS has an interesting way of promoting defragmented files called delayed allocation. XFS waits as long as possible to actually write data to the disc. Sure, it reserves the amount of space it needs, but it doesn’t actually define where that space is going to be on the disc. This means if your appending a lot of data to the end of a file, it will wait and be able to intelligently reserve continuous space for this appended data.

So, thats how XFS deals with the scalability and speed problems of FFS. Dealing with the power loss problem is at least as interesting. XFS uses what is called a metadata journal. Basically, this means that every disc transaction is written in a journal before it is written to the disc and then marked as "done" in the journal when it finishes. If the system crashes during the writing of the journal entry, that incomplete entry can be ignored since the data on the disc has not been touched yet and if the journal entry is not marked done, then that operation can be rolled back to preserve disc integrity. Its a very nice system. As stated above, XFS practices a type of journaling called "metadata journaling." This means only the inodes are journaled, not the actual data. This will preserve the integrity of the file system, but does not preserve the integrity of the data. As noted, the actual data tends to be considered rather boring and unimportant in file system design. [5] [4]

本新闻共3页,当前在第2页 1 2 3

上一篇：在win下访问ext、reiserfs、xfs、ufs的分区

下一篇：Recovering Deleted/Lost/Missing Data From Novell Servers

5y'pd=4