Fat 32
Fat 32
Home MP3 Player 8051 Tools All Projects PJRC Store Site Map
You are here: 8051 Tools Code Library IDE Hard Drive FAT32 Info Search PJRC
The MBR can only represent four partitions. A technique called "extended" partitioning is used to allow more than
four, and often times it is used when there are more than two partitions. All we're going to say about extended
partitions is that they appear in this table just like a normal partition, and their first sector has another partition table
that describes the partitions within its space. But for the sake of simply getting some code to work, we're going to
not worry about extended partitions (and repartition and reformat any drive that has them....) The most common
scenario is only one partition using the whole drive, with partitions 2, 3 and 4 blank.
Each partition description is just 16 bytes, and the good news is that you can usually just ignore most of them. The
fifth byte is a Type Code that tells what type of filesystem is supposed to be contained within the partition, and the
ninth through twelth bytes indicate the LBA Begin address where that partition begins on the disk.
Normally you only need to check the Type Code of each entry, looking for either 0x0B or 0x0C (the two that are
used for FAT32), and then read the LBA Begin to learn where the FAT32 filesystem is located on the disk.
TODO: add a table of known type codes, with the FAT32 ones colored
The Number of Sectors field can be checked to make sure you do not access (particularly write) beyond the end of
the space that is allocated for the parition. However, the FAT32 filesystem itself contains information about its size,
so this Number of Sectors field is redundant. Several of Microsoft's operating systems ignore it and instead rely on
the size information embedded within the first sector of the filesystem. (yes, I have experimentally verified this,
though unintentionally :) Linux checks the Number of Sectors field and properly prevents access beyond the
allocated space. Most firmware will probably ignore it.
The Boot Flag, CHS Begin, and CHS End fields should be ignored.
Microsoft's specification lists many variables, and the FAT32 Volume ID is slightly different than the older ones
used for FAT16 and FAT12. Fortunately, most of the information is not needed for simple code. Only four variables
are required, and three others should be checked to make sure they have the expected values.
After checking the three fields to make sure the filesystem is using 512 byte sectors, 2 FATs, and has a correct
signature, you may want to "boil down" these variables read from the MBR and Volume ID into just four simple
numbers that are needed for accessing the FAT32 filesystem. Here are simple formulas in C syntax:
http://www.pjrc.com/tech/8051/ide/fat32.html (4 of 15) [8/1/2007 17:32:27]
Paul's 8051 Code Library: Understanding the FAT32 Filesystem
As you can see, most of the information is needed only to learn the location of the first cluster and the FAT. You will
need to remember the size of the clusters and where the root directory is located, but the other information is
usually not needed (at least for simply reading files).
If you compare these formulas to the ones in Microsoft's specification, you should notice two differences. They lack
"RootDirSectors", because FAT32 stores the root directory the same way as files and subdirectories, so
RootDirSectors is always zero with FAT32. For FAT16 and FAT12, this extra step is needed to compute the special
space allocated for the root directory.
Microsoft's formulas do not show the "Partition_LBA_Begin" term. Their formulas are all relative to the beginning of
the filesystem, which they don't explicitly state very well. You must add the "Partition_LBA_Begin" term found from
the MBR to compute correct LBA addresses for the IDE interface, because to the drive the MBR is at zero, not the
Volume ID. Not adding Partition_LBA_Begin is one of the most common errors most developers make, so
especially if you are using Microsoft's spec, do not forget to add this for correct LBA addressing.
The rest of this page will usually refer to "fat_begin_lba", "cluster_begin_lba", "sectors_per_cluster", and
"root_dir_first_cluster", rather than the individual fields from the MBR and Volume ID, because it is easiest to
compute these numbers when starting up and then you no longer need all the details from the MBR and Volume
ID.
The vast majority of the disk space is the clusters section, which is used to hold all the files and directories. The
clusters begin their numbering at 2, so there is no cluster #0 or cluster #1. To access any particular cluster, you
need to use this formula to turn the cluster number into the LBA address for the IDE drive:
Normally clusters are at least 4k (8 sectors), and sizes of 8k, 16k and 32k are also widely used. Some later
versions of Microsoft Windows allow using even larger cluster sizes, by effectively considering the sector size to be
some mulitple of 512 bytes. The FAT32 specification from Microsoft states that 32k is the maximum cluster size.
In this section, we'll only briefly look at directories as much as is necessary to learn where the files are, then we'll
look at how to access the rest of a file using the FAT, and later we'll revisit directory structure in more detail.
Directory data is organized in 32 byte records. This is nice, because any sector holds exactly 16 records, and no
directory record will ever cross a sector boundry. There are four types of 32-byte directory records.
Unused directory records are a result of deleting files. The first byte is overwritten with 0xE5, and later when a new
file is created it can be reused. At the end of the directory is a record that begins with zero. All other records will be
non-zero in their first byte, so this is an easy way to determine when you have reached the end of the directory.
Records that do not begin with 0xE5 or zero are actual directory data, and the format can be determined by
checking the Attrib byte. For now, we are only going to be concerned with the normal directory records that have
the old 8.3 short filename format. In FAT32, all files and subdirectories have short names, even if the user gave the
file a longer name, so you can access all files without needing to decode the long filename records (as long as your
code simply ignores them). Here is the format of a normal directory record:
The Attrib byte has six bits defined, as shown in the table below. Most simple firmware will check the Attrib byte to
determine if the 32 bytes are a normal record or long filename data, and to determine if it is a normal file or a
subdirectory. Long filename records have all four of the least significant bits set. Normal files rarely have any of
these four bits set.
The remaining fields are relatively simple and straigforward. The first 11 bytes are the short filename (old 8.3
format). The extension is always the last three bytes. If the file's name is shorter than 8 bytes, the unused bytes are
filled with spaces (0x20). The starting cluster number is found as two 16 bit sections, and the file size (in bytes) is
found in the last four bytes of the record. In both, the least significant byte is first. The first cluster number tells you
where the file's data begins on the drive, and the size field tells you how long the file is The actual space allocated
on the disk will be an integer number of clusters, so the file size lets you know how much of the last cluster is the
file's data.
The File Allocation Table is a big array of 32 bit integers, where each one's position in the array corresponds to a
cluster number, and the value stored indicates the next cluster in that file. The purpose of the FAT is to tell you
where the next cluster of a file is located on the disk, when you know where the current cluster is at. Every sector of
of the FAT holds 128 of these 32 bit integers, so looking up the next cluster of a file is relatively easy. Bits 7-31 of
the current cluster tell you which sectors to read from the FAT, and bits 0-6 tell you which of the 128 integers in that
sector contain is the number of the next cluster of your file (or if all ones, that the current cluster is the last).
Here is a visual example of three small files and a root directory, all allocated near within first several clusters (so
that one one sector of the FAT is needed for the drawing). Please note that in this example, the root directory is 5
clusters... a very unlikely scenario for a drive with only 3 files. The idea is to show how to follow cluster chains, but
please keep in mind that a small root directory (in only 1 cluster) would have "FFFFFFFF" at position 2 (or
whereever the volume ID indicated as the first cluster), rather than "00000009" leading to even more clusters.
However, with more files, the root directory would likely span several clusters, and rarely would a root directory
occupy consecutive clusters due to all the allocation for files written before the directory grew to use more clusters.
With that caveat in mind, here's that simple example:
Figure 7: FAT32 Sector, Cluster chains for root directory and three files
In this example, the root directory begins at cluster 2, which is learned from the Volume ID. The number at position
2 in the FAT has a 9, so the second cluster is #9. Likewise, clusters A, B, and 11 hold the remainder of the root
directory. The number at position 11 has 0xFFFFFFFF which indicates that this is the last cluster of the root
directory. In practice, your code may not even attempt to read position 11 because the an end-of-directory marker
(first of 32 bytes is zero) would be found. Likewise, a filesystem with a small root directory might fit entirely into one
cluster, and the FAT could not be needed at all to read such a directory. But if you reach the end of the 32 byte
entries within the first cluster without finding the end-of-directory marker, then the FAT must be used to find the
remaining clusters.
Similarly, three files are shown. In each case, the FAT gives no indication of which cluster is the first... that must be
extracted from the directory, and then the FAT is used to access subsequent clusters of the file. The number of
clusters allocated to the file should always be enough to hold the number of bytes specified by the size field in the
directory. A portion of the last cluster will be unused, except in the rare case where the file's size is an exact
multiple of the cluster size. Files that are zero length do not have any clusters allocated to them, and the cluster
number in the directory should be zero. Files that fit into just one cluster will have only the "FFFFFFFF" end of file
marker in the FAT at their cluster position.
Shortcut Hint: One approach to the simplest possible code is to keep the root directory very small (few files, only
8.3 filenames), avoid subdirectories, and run the "defrag" program on a PC. That way, the root directory is found in
just one cluster and every file occupies only consecutive clusters. Though this approach is very limiting, it means
you can read files without using the FAT at all.
According to Microsoft's spec, the cluster numbers are really only 28 bits and the upper 4 bits of a cluster are
"reserved". You should clear those top for bits to zeros before depending on a cluster number. Also, the end-of-file
number is actually anything equal to or greater than 0xFFFFFFF8, but in practive 0xFFFFFFFF is always written.
Zeros in the FAT mark clusters that are free space. Also, please remember that the cluster numbers are stored with
their least significant byte first (Figure 7 shows them human-readable formatted, but they are actually stored LSB
first).
The good news for a firmware designer working with limited resources is that FAT is really very simple. However,
FAT's simplicity is also its weakness as a general purpose file system for high performance computing. For
example, to append data to a file, the operating system must traverse the entire cluster chain. Seeking to random
places within a file also requires many reads within the FAT. For the sake of comparison, unix filesystems use a
tree-like structure, where a cluster (also called a "block") has either a list of blocks (often called "inode" in unix
terminology), or a list of other inodes which in turn point to blocks. In this manner, the location on disk of any block
can be found by reading 1, 2, or 3 (for huge files) inodes. The other weakness of FAT, which is mitigated by
caching, is that it is physically located near the "beginning" of the disk, rather than physically distributed. Unix inode-
based filesystems scatter the inode blocks evenly among the data blocks, and attempt to describe the location of
data using nearby inode blocks.
However, you can find FAT32 code in the mp3 player project. The 0.1.x and 0.5.x firmware used a simple one-
sector-at-a-time approach that is very easy to understand and only needs a single 512 byte buffer, but is not very
The 0.6.x mp3 player firmware does use the FAT properly to follow cluster chains. It uses megabytes of memory
from the SIMM to buffer cluster and sectors from the FAT. A table of recently read FAT sectors is maintained, and
a binary search is implemented to quickly locate the memory associated with any cached FAT sector. The firmware
manages the megabytes of RAM by dividing it into 4k chunks (which works together with the bank swapping
implemented by the FPGA), and the first 32 hold a big array of 16-byte structs with info about what is held in each
4k block. These structs implement linked lists that for each file cached in memory. A file I/O API is built on top of
this to allow the main program to access the cached data. The 0.6.x firmware also uses DMA, so data transfers are
quite fast.
Both of the MP3 player project FAT32 implementations are read-only. Both are GPL'd, but both are quite specific to
that project. You can choose either very simple without features and needing defrag, or quite complex with lots of
features but heavily dependent on the FPGA and SIMM.
Perhaps someday I will write a more general purpose FAT32 implementation with nice documentation for this code
library section. But this is not planned for the near future. Hopefully the description on this page will at least help
you to understand how FAT32 works so you can write your own code or understand the project-specific code in the
mp3 players.
Please Help....
If you have found this web page helpful, please create a direct link on your own project webpage (hopefully with at
least "FAT32" in your link text). Visitors to your site will be able to find this page, and your link will help search
engines to direct other developers searching for FAT32 information to this page.
Understanding FAT32 can be difficult, especially if you can only find poorly written documentation like Microsoft's
http://www.pjrc.com/tech/8051/ide/fat32.html (12 of 15) [8/1/2007 17:32:27]
Paul's 8051 Code Library: Understanding the FAT32 Filesystem
specification. I wrote this page to help bridge the gap and make understanding FAT32 easier. Please help make it
easier for others to find it.
Thanks.