This information has been compiled from resources published on the internet, the Linux kernel source and information gained from experimentation.
The primary focus of the document is to provide enough information to allow someone to read the file system.
Introduction
When Linus was first creating Linux, he used the Minix filesystem. This served initial development well but soon there was a need for something bigger and better.
In April 1992, the Extended File System was created. Although solving the problems of Minix, it had problems of it's own so a successor was created. This new filesystem was known as the Second Extended File System or Ext2 FS (or EXT2).
Structure
The file system is created from a sequential collection of blocks. These blocks can be either 1k, 2k or 4k in size. These blocks are divided up into groups for various reasons which will be discussed below.
The starting point for the filesystem is the superblock. This is a structure 1024 bytes in length and is always located at an offset of 1024 bytes from the start of the filesystem. This means that it might be in block 0 or block 1, depending on the block size of the filesystem. This is the reason for the s_first_data_block entry in the superblock.
The following is a list of the structure used by Linux. Note that other OS's may use a different structure. For details see ???.h
field name | type | comment |
s_inodes_count | ULONG | Count of inodes in the filesystem |
s_blocks_count | ULONG | Count of blocks in the filesystem |
s_r_blocks_count | ULONG | Count of the number of reserved blocks |
s_free_blocks_count | ULONG | Count of the number of free blocks |
s_free_inodes_count | ULONG | Count of the number of free inodes |
s_first_data_block | ULONG | The first block which contains data |
s_log_block_size | ULONG | Indicator of the block size |
s_log_frag_size | LONG | Indicator of the size of the fragments |
s_blocks_per_group | ULONG | Count of the number of blocks in each block group |
s_frags_per_group | ULONG | Count of the number of fragments in each block group |
s_inodes_per_group | ULONG | Count of the number of inodes in each block group |
s_mtime | ULONG | The time that the filesystem was last mounted |
s_wtime | ULONG | The time that the filesystem was last written to |
s_mnt_count | USHORT | The number of times the file system has been mounted |
s_max_mnt_count | SHORT | The number of times the file system can be mounted |
s_magic | USHORT | Magic number indicating ex2fs |
s_state | USHORT | Flags indicating the current state of the filesystem |
s_errors | USHORT | Flags indicating the procedures for error reporting |
s_pad | USHORT | padding |
s_lastcheck | ULONG | The time that the filesystem was last checked |
s_checkinterval | ULONG | The maximum time permissable between checks |
s_creator_os | ULONG | Indicator of which OS created the filesystem |
s_rev_level | ULONG | The revision level of the filesystem |
s_reserved | ULONG[235] | padding to 1024 bytes |
s_r_blocks_count :
this
is the number of blocks which are reserved for the super user
s_first_data_block :
Depending on the block size, the first data block will be either 0 or 1. See
diagram below
s_log_block_size :
0 = 1k
block size
1 = 2k
2 = 4k
s_log_frag_size :
At the
moment, it seems that fragments are not implemented. In the future I may have to
find out how they work.
s_blocks_per_group
The
filesystem is divided up into block groups. Note that the last block group may
be incomplete
s_inodes_per_group
Each
block group has space reserved for a number of inodes.
s_mtime, s_wtime
This may
be the mount time, or the umount time. I am not sure which.
s_mnt_count, s_max_mnt_count
Once this count reaches the maximum, the filesystem must be
checked, the count is then reset.
s_magic
This should
contain the magic number 0xEF53
s_state
This contains a
set of flags which indicate wether the filesystem is clean etc.
s_errors
This contains
flags which indicate how the filesystem should be treated if errors are
found.
s_creator_os
For Linux
this is ???
s_rev_level
The current
revision is ???
The information in the superblock is used to access the rest of the data on the disk.
The number of block groups = the number of blocks / the number of blocks per group; // rounded up
(There are several calculations which require divisions rounded up. I wrote a function called div2 which does the division and then checks to see if it should round up the result)
All block and inode addresses start at 1. The first block on the disk is block 1. 0 is used to indicate no block. (Sparse files can have these inside them)
Each block group can be found at the block address ((group_number - 1)* blocks_per_group) and is of course blocks_per_group long. Group numbers are 1 based as well
Each group is just a series of blocks, however the first blocks in the group have a special purpose. The remainder are used for storing data.
| Superblock | Group Descriptors | Block Bitmap | Node Bitmap | Node Table | Data blocks ||---------------------------------------------|------------------------------------------------------------------------------|| This is the same for all groups | this is specific to each group |
As discussed above, the superblock is stored at offset 1024 on the disk. There are also backup superblocks stored in the first datablock of all blockgroups grater than one. With the advent of revision 1 of the filesystem, there is a flag SPARSE_SUPERBLOCK which means that the superblock is not stored on every block group but just a select few. This improves performance as changes to the superblock do not have to be written to as many places.
The Group Descriptors contains information on the block groups. This data is covers all the groups and is stored in all the groups for redundency. This is an array of the following structure
field name | type | comment |
bg_block_bitmap | ULONG | The address of the block containing the block bitmap for this group |
bg_inode_bitmap | ULONG | The address of the block containing the inode bitmap for this group |
bg_inode_table | ULONG | The address of the block containing the inode table for this group |
bg_free_blocks_count | USHORT | The count of free blocks in this group |
bg_free_inodes_count | USHORT | The count of free inodes in this group |
bg_used_dirs_count | USHORT | The number inodes in this group which are directories |
bg_pad | USHORT | padding |
bg_reserved | ULONG[3] | padding |
The size of the descriptors can be calculated as (sizeof(ext2_group) * number_of_groups) / block size; // rounded up if necessary
The information in this structure us used to locate the block and inode bitmaps and inode table.
Remember that the first entry corresponds to block group 1.
The block bitmap is a bitmap indicating which blocks in the group have been allocated. If the bit is set then the block is allocated. The size of the bitmap is (blocks_per_group / 8) / block_size;// with both divisions rounded up.
It is necessary to find out which group a particular group is in to be able to look up the bitmap. The group = ((block_number - 1) / blocks_per_group) + 1; // rounded up
The block in that group is then block_number - (group * blocks_per_group)
The inode bitmap is essentially the same as the block bitmap, but indicates which inodes are allocated. The size of the inode bitmpap is (inodes_per_group / 8) / block_size;// with both divisions rounded up.
The same calculations can be used for finding the group of a particular inode. The group = ((INode_number - 1) / INodes_per_group) + 1; // rounded up
The inode in that group is then INode_Number - (Group * INodes_per_group)
The inode table is an array of the inodes for that particular group. Again, the first entry is for the first inode in that group.
field name | type | description |
i_mode | USHORT | File mode |
i_uid | USHORT | Owner Uid |
i_size | ULONG | Size in bytes |
i_atime | ULONG | Access time |
i_ctime | ULONG | Creation time |
i_mtime | ULONG | Modification time |
i_dtime | ULONG | Deletion Time |
i_gid | USHORT | Group Id |
i_links_count | USHORT | Links count |
i_blocks | ULONG | Blocks count |
i_flags | ULONG | File flags |
i_reserved1 | ULONG | OS dependent |
i_block | ULONG[15] | Pointers to blocks |
i_version | ULONG | File version (for NFS) |
i_file_acl | ULONG | File ACL |
i_dir_acl | ULONG | Directory ACL |
i_faddr | ULONG | Fragment address |
i_frag | UCHAR | Fragment number |
i_fsize | UCHAR | Fragment size |
i_pad1 | USHORT | |
i_reserved2 | ULONG[2] |
The file mode is a set of flags that specify the type of file and the access permissions
identifier | value | comment |
S_IFMT | F000 | format mask |
S_IFSOCK | A000 | socket |
S_IFLNK | C000 | symbolic link |
S_IFREG | 8000 | regular file |
S_IFBLK | 6000 | block device |
S_IFDIR | 4000 | directory |
S_IFCHR | 2000 | character device |
S_IFIFO | 1000 | fifo |
S_ISUID | 0800 | SUID |
S_ISGID | 0400 | SGID |
S_ISVTX | 0200 | sticky bit |
S_IRWXU | 01C0 | user mask |
S_IRUSR | 0100 | read |
S_IWUSR | 0080 | write |
S_IXUSR | 0040 | execute |
S_IRWXG | 0038 | group mask |
S_IRGRP | 0020 | read |
S_IWGRP | 0010 | write |
S_IXGRP | 0008 | execute |
S_IRWXO | 0007 | other mask |
S_IROTH | 0004 | read |
S_IWOTH | 0002 | write |
S_IXOTH | 0001 | execute |
The i_block entry is an array of block addresses. The first EXT2_NDIR_BLOCKS (12) are direct block addresses. The data in these blocks is the content of the file. (According to ???, ??% of files in a unix environment are less than 12k, assuming a 1k block size the decision to use 12 direct blocks makes a lot of sense).
The next block EXT2_IND_BLOCK in the indirect block. This is the address of a block which contains a list of addresses of blocks which contain the data. There are block_size / sizeof(ULONG) addresses in this block.
The EXT2_DIND_BLOCK is simalar, but it is a double indirect block. It countains the address of a block which has a list of indirect block addresses. Each indirect block then has another list is blocks.
The EXT2_TIND_BLOCK is simalar again, but it is the tripple indirect block. It contains a list of double indirect blocks etc.
Now that you know how to find and read inodes, you can start to read the files. There are a set of special inodes which are reserved for certain puposes. These include
indetifier | value | description |
EXT2_BAD_INO | 1 | Bad blocks inode |
EXT2_ROOT_INO | 2 | Root inode |
EXT2_ACL_IDX_INO | 3 | ACL inode |
EXT2_ACL_DATA_INO | 4 | ACL inode |
EXT2_BOOT_LOADER_INO | 5 | Boot loader inode |
EXT2_UNDEL_DIR_INO | 6 | Undelete directory inode |
EXT2_FIRST_INO | 11 | First non reserved inode |
The most important inode here is the root inode. This is the inode at the root of the file system. This inode is a directory, which like all directories has the following structure:
field name | type | description |
inode | ULONG | address if inode |
rec_len | USHORT | length of this record |
name_len | USHORT | length of file name |
name | CHAR[0] | the file name |
A directory is a list of these structures. The structures can not pass over a block boundary, so the last record is extended to fill the block. And entry with an inode of 0 should be ignored.