In this article, I'll present a small utility that I wrote to sort the files' table for the FAT32 file system. This utility solves the problem that most FAT32-based music players have - Undefined songs sorting. This problem occured in my car's stereo system, and in Amazon's Kindle.
Seeking and resolving the problem was a nice journey, that involved researching and programming down to the low level of the flash drive's file system. I've supplied most of the background knowledge needed to understand this article, but you need some prior knowledge in C++ programming in order to understand the code.
A Random Stranger
So this is how this whole thing started:
Not a long time ago, I bought a stereo system for my car - Pioneer DEH-P4050UB. The main advantage in this product, was the ability to read music from a USB flash drive. So far so good, but I like listening to my music in a random order, and I've noticed something disturbing — on random mode, if the song ends, then the next one will be randomly picked. HOWEVER, if I switch to the next song by myself (Using the Fast-Forward button) then the next song will be the SEQUENTIAL song after the current one.
After thinking about it for a while, I remembered that there's a "Next Directory" button, so I can just switch to the next album anytime. That way, I can put some very short audio file, which will be the first in the album in an alphabetical order (i.e. 11111udi.mp3), and after playing this file (for a very short time), the next song will be randomly picked! That way I can get a random song with only one click!
Unfortunately, the gods of Pioneer had different plans for me. I've noticed that, for some reason, the new short audio file was not the first one to be played in the directory. After that, I've also noticed that lots of my songs in the albums are not played in an alphabetical order, which tells me that there's some mysterious order that needs to be revealed.
FAT32 in a Nutshell
Before getting our hands dirty, let's get to know a little about the file system that my USB was formatted to (as required by the stereo system). Learning all the aspects of FAT32, can take a long time. I'll hereby give some key features of this file system, that'll help you understand the solution to my problem.
Not getting into specifics, the file system begins with the boot sector, which holds all the parameters' values for the file system (i.e sectors' size, as explained later). The FAT32 file system layout is described in this picture:
Taken from the article "Paul's 8051's code library: Understanding FAT32 Filesystems."
Let's discuss the important things:
- Volume ID (Boot Sector) - Holds some vital information about the current partition, like the size of a cluster and the size of the FAT Tables in the file system.
- Clusters - In order to navigate conveniently through the file system, all the bits are divided into larger groups of data called "Sectors" and "Clusters." Sectors are the smallest group, their size is fixed and stated in the Boot Sector (Currently - it's 512 Bytes). Since Sectors are big, but not big enough — we use the Clusters, which are nothing more than just a collection of sectors. The official number of sectors per cluster is also defined in the boot sector (Usually, Cluster = 8 Sectors). From now on — we'll just work with the clusters as our smallest data unit.
- FAT Tables - One cluster cannot always be enough to hold the file's data. A single cluster can hold 4096KB of data (8 Sectors * 512 Bytes Each). So in order to spread the data in different clusters, we need a way to track them down later on. That's why we have the FAT Tables! Given some cluster number, we can go to one of the FAT tables and calculate the next cluster in the chain, and then the next one, and the next one...up until reaching some flag that stops this chain. There are TWO tables, that backup each other in case one table gets corrupted for some reason.
Every file and folder in the file system, is represented in one or more entries called "Dir Entries". The Dir Entry holds all the information about the file/folder, as can be seen in the next figure:
- The "First Cluster Number" (DIR_FstClusLO and DIR_FstClustHI) is the cluster where we can find the DATA itself for the File/Folder.
- File's data can vary from Text to Binary file of some image or even executable.
- Folder's data contains a list of all its files and sub-folders. Each entry in that list is a "Dir Entry", that leads to its own data cluster
The "DIR_Name" holds the file/folder's name. You might ask yourself: "Only 11 bytes for a "filename" field?! But I can use longer file names in FAT32!" Well, You're right. There's a little "trick" to write longer file names, using other type of enrtries called "Long File Name Entries" (LFN Entries):
Every file that has a long file name (more then 11 bytes) or a name in some different language, that requires UNICODE, is using 1 or more LFN entries. Each entry has a few bytes to hold the file's name. Combining all the LFN entries we get the long file name!
Next to all the file's LFN entries, there's a short dir entry, which holds some vital information about the file, and helps support older operating systems, which are not familier with the LFN entries. So actually, a file that has a long name uses more then one Dir Entry. It uses LFN Entries as much as it need, and one short Dir Entry for backward compatibility.
In the next figure, we can see entries for 2 files: "udini01.txt" and "long name for a file.txt". The first one has a short name, therefore it uses only one short dir entry. But, the second file needs more entries, so it uses 2 LFN entries (Marked in #1 and #2), and after that — a backwards compatibility short entry, with a truncated file name.
This screen shot is from the program "DiskExplorer for FAT," which gives you a great look into your file system to see how everything is stored in there. You can download it here
If you look closely, you can even see some other attributes of the files. For instance, the data of these files is stored in cluster number 3 and 4 for "udini.txt" and "long name for a file.txt" respectively.
Seeking the Problem
After getting some background knowledge, let's get back to our issue: My songs were ordered in some mysterious way, while playing in my car stereo. After playing with some OS commands to read directories, I've found that the
FindNextFile functions, return the same files' order as my car stereo. So what does dictate their order?
Regarding the way these functions order the files, the MSDN says: "The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system."
Meaning — I need to find the exact cause in the file system for this order, and sort it in the device itself. The alternative is to change the firmware of my car stereo, to sort the files in each directory it reads, and that's just a lot of work.
Let's see some files, and the way they're ordered in the Windows Explorer, and inside the FAT32 file system.
The files in the windows explorer (Sorted):
The files in the File System's files table (Unsorted):
Using the excellent "DiskExplorer for FAT," I could see the order of the files in the Files Table inside the file system.
Let's concentrate on the order of the numbers (marked in circles). We can see that the order in the file system is different from the order in the Windows Explorer's window. I wrote a small C program that uses the
FindNextFile functions (presented above), and it gave me the order:
- Riverside - 04 - left out
- Riverside - 02 - driven to destruction
- Riverside - 05 - hybrid times
- Riverside - 03 - egoist hedonist
- Riverside - 01 - hyperactive.mp3
Which is exactly the same order as in the Files Table, concluding that the functions return the files in the order they are stored in the Files Table! This order is caused by the OS that copies the files to the device, which cannot be controlled easily. If I'll try to copy the files one at a time in the correct order, then the Files Table will be sorted in the same way. Of course that's hardly a solution for a USB flash drive with a few hundreds of songs in it... What should we do?! Let's see...
Sorting Things Out
In order to get the music played in the right order (Alphabetically and NOT the order you copy them!), I had to write a utility that sorts the Dir Entries in an alphabetical order.
The Entries Entities
There's a lot of information when you read the file system. Every file or folder can have more than 10 dir entries (Thanks to the LFN support), and we need to organize the data properly. Here's a simplified class diagram of the classes I used to maintain the data:
- CEntry - That's the abstract father of all the entry types in the system. Each file or folder in the file system is stored in one of its sub-classes. The
m_data data member holds the 32 bytes of the short entry (And not the LFN entry, if there is one). The
m_chainedLFNEntry is a dynamic array with all the LFN entries for this file/folder. The
getName() returns the name of the file/folder. If there are LFN entries, then the file name's bytes are collected from all the LFN entries in the
m_chainedLFNEntries, and combined to one string that is the long file name itself.
- CFileEntry - Represents a single file in the system.
- CSpecialEntry - This is a special entry that represents some entries that are not files or folders. Such entries are the volume label entry (containing the device's label name). Other special entries are the "." and the ".." entries, which are just "pointers" to the current folder and to the father folder of the current folder, respectively.
- CFolderEntry - This entry represents a folder, which contains all of its files inside it (in the
m_files), and all of its sub-folders (in the
m_folders). It also has the sorting functionality, which is the main feature of this utility.
This class is a high level API to the file system. It uses the
CVolumeAccess class, which will be described later, for the low level access to the device. It gives a more simplified API to perform the main function on the file system. The class holds the
m_rootDir data member, which is the root entity (
CRootEntry) that holds the files and folders' tree, representing the entire file system. The function
initFDT() is in charge of populating the data. The
sort() function is used to sort all the files and folders in the file system.
All the low level access for the file system is done by this utility class.
This class is a singleton. When initialized, it sends a command to the device to lock and dismount it (dismounting the device, when a low level access is needed, is imperative since Windows Vista). Also in the initialization stage, it loads the FAT tables to the memory (the m_FAT1Data\m_FAT2Data data members), so we can use it later on to find all the clusters chained one after another. In the next code snippet, we see how to read the data from a random cluster chain:
DWORD dwNextClusterNum = aStartClusterNum;
DWORD dwNumClustersRead = 0;
// As long we didn't reach the end of the chain
while (dwNextClusterNum != FAT_END_CHAIN)
// Reads from the device the whole cluster, and puts it in the next free
// position in the buffer
dwNextClusterNum = m_FAT1Data[dwNextClusterNum];
The above code is taken from the
readChainedClusters() function. The
aoChainedClustersData buffer is filled with the data from the clusters's chain.
The clusters, in a chain, are not necessarily sequential (actually, most likely they aren't). For example, a clusters' chain could be the series of the clusters: 4, 5, 8, 23, 87, 432. The
aStartClusterNum holds the number of the first cluster in the chain. That number is stated in the file\folders's Dir Entry (See the "FAT32 Byte Directory Entry Structure" figure above).
dwNextClusterNum variable, is used to store the next number. If we'll mark the current cluster's number with X, The next clusters number is the number located in the Xth index on the FAT table. That explains the line
dwNextClusterNum = m_FAT1Data[dwNextClusterNum];
The first FAT table,
m_FAT1Data, was arbitrary chosen over
m_FAT2Data, since both tables are the same. This loop keeps on going until the next cluster has reached the
FAT_END_CHAIN constant, which marks the end of the chain.
readChainedClusters(), we can get all the folders' files/sub-folders list, and so we can navigate through the file system easily, populating the data into appropriate
CEntry objects. The Root folder will be saved in the
CFileSystem object, and all of it's sub-folders will be sorted, recursively, by the
Working with the Utility
What can it do?
The Utility that I wrote, provides the following features:
- Dump Files table (Backup) - Saving the files table to a file, from which we can recover later in case there's some error. The backup file name will always be "dirs.dat".
- Load Files table (Recover) - Load a files table from a file previously saved. The data will be loaded from a file named "dirs.dat" ONLY, located in the same directory as the utility.
- Export Folders List - Stores a list of all the files and folders on the device, as they're ordered in the files' table, in a file called "Files List.txt". The data will be printed in a tree form, and each directory will be in the format: [Serial Num|Name]. The serial num is just a counter of all the folders in the device. It's useful for me, because my car stereo shows me only the folders' serial numbers. With a list like that, I can easily search for some particular album. I hope that other people will also find it usefull for their purposes.
Here's an example for such list
Sort the FAT Table - This is the main function. This command sorts the files table, and flushes the sorted table back to the device. NOTE: This action automatically saves a backup file, so there's no need to use the backup prior to that. If there's a problem with the sorted table, you can always recover by renaming the created backup file to "dirs.dat", and use the "Recover" option in the menu.
How to begin?
In order to sort the files table, you need to choose the drive on which the device is mounted to. Use option 1 in the main menu, and enter the correct drive letter.
Note that the device needs to be connected and mounted to this drive. It also needs to be formatted to FAT32 file system. Otherwise - Any operation on the device will fail.
After that, you can use the sort option (Number 5 in the menu), which will cause the utility to load the files' table, sort it, and flush it back to the device.
Note: The "???" are just some folders in Hebrew characters that this DOS window can't display, but the sorting process won't get affected nor harmed by this.
This process takes about 1-2 minutes for a 16GB USB Drive. It's pretty safe, because you can always recover the table from the backup file. To see the changes, you can print the FilesList before and after the sorting process (don't forget to rename it after each action, or else it'll get overwritten!)
With this utility you can easily order your songs in the right order as it should be! This solution is also applicable to the Amazon's Kindle (Also plays all the songs in the order they were loaded, as described Here)- Just use this utility to sort the kindle's files table (Which is FAT32) and Voila! All the songs are alphabetically ordered..