(untagged)

nFS - File System Within a File

Radu Gruian

0.00/5 (No votes)

25 Apr 2008

A portable library for emulating a file system within a file

Introduction

This article presents nFS, a portable library for emulating a file system within a file. This library has nothing to do with Linux's "NFS".

The library is compatible with MSVS/Win32, gcc/Linux and gcc/Solaris. It is written in C. However, because of its clean OOP-like structure, it is easily portable to C++.

Strictly speaking, for performance reasons, the external representation of the file system is made of two files, as described below; however, these two files always go together, and the library manipulates both at the same time. So, conceptually, the term "within a file" still applies to some extent.

Background

Many applications nowadays have to work with large amounts of data. In some instances, the data needs to be scattered across thousands of files, for various reasons. Letting the OS manage those files itself is one (often costly) option. Another option is to have a mechanism that:

Archives files within a single file in a transparent manner
Provides a file manipulation API similar to the one provided by the OS (creat, open, etc.)
Provides fast globbing (searching for files that match a given pattern)
Provides first-match-only globbing
Provides fast file caching
Supports Unix-style "hard links" on all platforms (including Windows)
Supports a virtually unlimited number of files open at the same time
Imposes virtually no limit on the number of files
Imposes no maximum file size.

The library presented here tries to address these requirements.

Design and Implementation Notes

The file system is structured in 3 main abstraction layers (see Figure 1, below). The bottom layer (1) implements a cached I/O infrastructure on which the other two layers (2, 3) are based.

Figure 1: The main abstraction layers.

The Virtual Disk Module (nfs_data.h/cpp)

This module implements a virtual disk. The disk is divided into 512 byte sectors. The access to those sectors is made via an internal 128-block cache. Nothing is immediately written physically on the real disk, unless it is absolutely necessary. The cache is implemented as a classic sliding-window buffer (see Figure 2, below), that gets dumped on the real disk only when the following two conditions are satisfied:

The buffer is out of sync with the physical medium
The cache window must be slid across the data file (virtual disk)

Figure 2: Cached virtual disk.

The size of the cache is a constant defined in nfs_data.h, and it can be changed. The module has an interface that provides block access functions (reading and writing blocks).

Interlaced I/O (nfs_iio.h/cpp)

This module implements a mechanism for storing multiple logical files into one physical file. The current implementation is based on a variable size channel, multiple channel scheme ("interlaced I/O"). The file is logically divided into N channels (N channels are allocated for N files to be packed in the IIO file). In a "chunk" (see below, Figure 3), each channel has a fixed number of blocks, where the block is the base unit of the IIO file (usually 256 bytes). The channels do not have to have the same number of blocks (same width). The channel ratio (number of blocks per channel for each channel) is expressed as "k1:k2:k3...:kN", where ki is the number of blocks ("width") of channel i. Based on the kind of data to be stored in the IIO file, the optimal channel ratio can be computed by simplifying the N-level ratio p1:p2:p3:...:pN, where pi is the average number of bytes written to channel i. A number of experimental results lead me to an optimal default ratio of 4:8:1:4 (nFS uses 4 channels).

Figure 3: The Interlaced I/O (IIO) file.

Each channel has its own individual 8-block sliding-window cache. Although the blocks in a channel are not physically contiguous, the logically contiguous ones are brought together in the cache buffer (see image above). Because the channels have known "widths", the access to any logical block can be done very quickly by computing a formula that translates a logical block number to a physical block number (given the width of each channel and the number of channels).

Possible improvement: Although the above implementation is very fast, its main drawback is that the channels can become imbalanced in time, due to the different rate at which they grow. This leads to excess data in some channels and lack of data in other channels, which means that some channels will not be used at full potential and thus there will be some space wasted. The channel ratio was provided as a mechanism for minimizing the uncontrolled growth of the IIO file due to channel imbalance. Still, when the size of the data cannot be approximated beforehand, the channel ratio will not be estimated correctly, and the phenomenon described above will occur. An alternative implementation could use exactly the same interface, i.e. the file will still be logically divided in "channels", but the channels could be implemented as completely independent "linked lists" of blocks. Special blocks need to be reserved for describing the "allocation" of blocks in the IIO file, as seen in the figure below.

Figure 4: Alternate implementation of the IIO file.

The first block in the file is a control block, and it has fixed size entries that specify what kinds of blocks follow. Every M-th block is a control block, where M is the number of entries in the control block plus 1. Each entry in the control block specifies what kind of data is stored in the block associated with it (i.e. what "channel" that block belongs to), and what the following block in the channel is. For example, in the above image there are three channels (i.e. three kinds of data, drawn in black, red and blue). Blocks 1,2,5,7,12 belong to the first channel (black), containing data of type D1, and this is described in the control block by the entries associated with those blocks (i.e. 1,2,5,7,12). Channels (and blocks) are allocated on the fly as needed.

This method leads to more compact IIO files in most circumstances, but the big drawback here is that completely random accesses to the physical blocks are not possible, because the blocks are chained. Each read/write operation in Layer 3 (top level, see above) involves several I/O operations in Layer 1 (the IIO file); each of those I/O operations requires the traversal of a chain to find the appropriate block where the data should be transferred, which is a succession of cached I/O accesses itself. Therefore my impression is that the file accesses would be much slower, even when using a cache buffer. A classical space-time tradeoff...

System Tables

The FAT, or File Allocation Table, specifies the way the files are made out of blocks. For example, if a file has 1024 bytes (2 blocks), it will have two nodes allocated in FAT. There is a 1:1 mapping between the FAT and the virtual disk (nfs_data), which means that for every node (4 bytes) in the FAT there will be 1 block (512 bytes) in the data file. Allocating or deallocating data blocks is then reduced to marking nodes in FAT as belonging to one chain or another. The unused space is also chained, which means that when a node is recovered, it will be added to the chain of unused nodes.

The NT, or Node Table, contains one special node for each file stored in the system. It does not have anything to do with Windows NT. When a file is created, a new node is allocated for it in the NT. The node contains a refcount which is initially 1, a pointer to the start of the FAT chain, and the size of the file. Each time a hard link to that file is added, refcount is incremented, and each time a hard link (or the file) is removed, refcount is decremented. When refcount becomes 0, the node is recovered and the associated FAT chain is destroyed.

The DT, or Directory Table, contains surface information about files (such as their names, and pointers to the NT nodes associated with them). Its main purpose is to categorize the file names in such a way that nfs_glob() will be very fast when applied to those files. nfs_glob() is a function that returns a list of file names matching a given pattern. The DT uses a prefix-based Patricia trie structure (finite state automaton with the keys stored in the nodes). For a description of the Patricia trie, see:

My article on Patricia Tries
R. Sedgewick. Algorithms in C, Addison-Wesley, edn 1, pp253, 1990
G.H. Gonnet. Handbook of Algorithms and Data Structures. Addison-Wesley, Intl. Computer Series, pp 109, 1984

Because of the highly specialized search structure that is being used in nFS, the search for particular keys (file names) is performed in sub-linear time in the number of bits in the search key. This makes globbing very fast. The pattern matching is performed in two steps:

The maximum possible prefix without wildcards is extracted from the pattern and the (partial) key given by that prefix is looked up in the Patricia structure.
A pattern matching is performed on every node in the subtree starting at the node found at (1). The results are quick-sorted by default, unless the GLOB_NOSORT flag is specified when the nfs_glob() function is invoked. See the include file for more info.

Figure 5: Relationship between DT, NT, and FAT.

Pattern Matching (nfs_pmatch.h/cpp)

The pattern matching algorithm was borrowed from GNU's implementation of fnmatch(). Unfortunately fnmatch() is not available on Win32, so I had to provide this cross-platform function for doing what in Unix is achieved by fnmatch(). The function takes two parameters, a string and a pattern containing wildcards ( *, ?, [, ], | ), and it returns either true (non-zero) if the strings matches the given pattern, or false (zero) otherwise. See the include file (nfs_pmatch.h) for details.

nFS Main Module (nfs.h/cpp)

The main module uses the lower level functions defined in layers 1 and 2 to provide a clean interface that closely resembles the Unix set of I/O functions: open, creat, read, write, link, unlink, glob, globfree, lseek, perror. The information about the files that are created at run-time is stored in the IIO file, and the actual data is stored in the "virtual disk" file. All accesses are cached. There are a total number of 5 cache buffers in the system: one for each IIO channel (DT (trie), DT (keys), NT, FAT), and a sliding-window cache for the data file. The file system has to be initialized ("started") before it can be used. At the end of a session the FS has to be closed, operation that will also flush all the out-of-sync cache buffers. The FS can also be "restarted", i.e. if the system files exist they will be "loaded" instead of being re-created from scratch. This allows a client application to create file systems on the fly when necessary, and to "load" them as needed later. The errors that appear during nFS operations are stored in the global variable nfs_errno, and subsequent calls to nfs_perror() will dump the associated error messages to the standard output.

Using the Code

The unit test (nfs_unit_test.cpp) included with this package demonstrates how to use the library. The example below shows how to:

Create an nFS file system
Use file_exists() to determine whether some file exists
Use file_create() to create a file
Use file_close() to close a file
Use file_write() to write data into virtual nFS files
Use glob() to search for virtual files
Use perror() to dump error information to stdout

#define how_many  6999
#define data_size 2048
#define gen_fname(i) ...
#define pattern   "SomeFile*.d?t"
char data[...];

void test_nFS() {
    int i, k;

    printf("Starting fs...\n");
    nfs_Handle* fs = nfs_start("fs", FS_RW);
    if (fs == NULL) {
        nfs_perror(fs, "start error: ");
        fatal("stop");
    }

    if (fs == nfs_NULL) {
        nfs_perror(fs, "perror reports");
        fatal("File system could not be created");
    }

    printf("Creating %d files, each %d bytes. wait...\n", how_many, data_size);
    for (i = 0; i < how_many; i++) {
        if (nfs_file_exists(fs, gen_fname(i)))
            printf("file %s already exists\n", gen_fname(i));
        else {
            int f1 = nfs_file_create(fs, gen_fname(i), 0);
            if (f1 >= 0) {
                k = nfs_file_write (fs, f1, data, data_size);
                if (k != data_size) {
                    printf("file: %s\n", fname);
                    fatal("too few bytes written to file");
                }
                nfs_file_close(fs, f1);
            } else {
                printf("error creating file %s\n", gen_fname(i));
                fatal("see above");
            }
        }
    }

    /* globbing */
    nfs_glob_t  tt;
    printf("globbing (%s)...\n", pattern);
    tt.gl_offs = 0;
    nfs_glob(fs, pattern, GLOB_DOOFS, NULL, &tt);

    printf("Results: [%d found]\n", tt.gl_pathc);
    for (i = tt.gl_offs; i < tt.gl_pathc + tt.gl_offs; i++) {
        printf("    match: %s\n", tt.gl_pathv[i]);
    }

    printf("freeing glob results...\n");
    nfs_glob_free(fs, &tt);

    printf("stop...\n");
    nfs_end(fs, 0);
}

Remarks

The library included with this article is fairly limited in that it supports only a small subset of the total number of file-related operations that an OS supports. A possible improvement to nFS would be to extend this subset with additional such operations. Other possible improvements could include encryption, password protection, support for file attributes, etc. Please contact me (see below) if you are interested in such extensions to nFS that are not included in this article. I might be able to help.

Another interesting project would be to convert this to object-oriented C++. Again, contact me if you are interested in that.

History

22^nd February, 2005 - Initial public domain release
9^thFebruary, 2008 - Bug fixes, cosmetic changes
25^th April, 2008 - Updated downloads for the article

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here