|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis article presents The library is compatible with MSVS/Win32, gcc/Linux and gcc/Solaris. It is written in C. However, because of its clean OOP-like structure, it is easily portable to C++. Strictly speaking, for performance reasons, the external representation of the file system is made of two files, as described below; however, these two files always go together, and the library manipulates both at the same time. So, conceptually, the term "within a file" still applies to some extent. BackgroundMany applications nowadays have to work with large amounts of data. In some instances, the data needs to be scattered across thousands of files, for various reasons. Letting the OS manage those files itself is one (often costly) option. Another option is to have a mechanism that:
The library presented here tries to address these requirements. Design and Implementation NotesThe file system is structured in 3 main abstraction layers (see Figure 1, below). The bottom layer (1) implements a cached I/O infrastructure on which the other two layers (2, 3) are based.
Figure 1: The main abstraction layers.
The Virtual Disk Module (nfs_data.h/cpp)This module implements a virtual disk. The disk is divided into 512 byte sectors. The access to those sectors is made via an internal 128-block cache. Nothing is immediately written physically on the real disk, unless it is absolutely necessary. The cache is implemented as a classic sliding-window buffer (see Figure 2, below), that gets dumped on the real disk only when the following two conditions are satisfied:
Figure 2: Cached virtual disk.
The size of the cache is a constant defined in nfs_data.h, and it can be changed. The module has an interface that provides block access functions (reading and writing blocks). Interlaced I/O (nfs_iio.h/cpp)This module implements a mechanism for storing multiple logical files into one physical file. The current implementation is based on a variable size channel, multiple channel scheme ("interlaced I/O"). The file is logically divided into N channels (N channels are allocated for N files to be packed in the IIO file). In a "chunk" (see below, Figure 3), each channel has a fixed number of blocks, where the block is the base unit of the IIO file (usually 256 bytes). The channels do not have to have the same number of blocks (same width). The channel ratio (number of blocks per channel for each channel) is expressed as "k1:k2:k3...:kN", where ki is the number of blocks ("width") of channel i. Based on the kind of data to be stored in the IIO file, the optimal channel ratio can be computed by simplifying the N-level ratio p1:p2:p3:...:pN, where pi is the average number of bytes written to channel i. A number of experimental results lead me to an optimal default ratio of 4:8:1:4 (
Figure 3: The Interlaced I/O (IIO) file.
Each channel has its own individual 8-block sliding-window cache. Although the blocks in a channel are not physically contiguous, the logically contiguous ones are brought together in the cache buffer (see image above). Because the channels have known "widths", the access to any logical block can be done very quickly by computing a formula that translates a logical block number to a physical block number (given the width of each channel and the number of channels). Possible improvement: Although the above implementation is very fast, its main drawback is that the channels can become imbalanced in time, due to the different rate at which they grow. This leads to excess data in some channels and lack of data in other channels, which means that some channels will not be used at full potential and thus there will be some space wasted. The channel ratio was provided as a mechanism for minimizing the uncontrolled growth of the IIO file due to channel imbalance. Still, when the size of the data cannot be approximated beforehand, the channel ratio will not be estimated correctly, and the phenomenon described above will occur. An alternative implementation could use exactly the same interface, i.e. the file will still be logically divided in "channels", but the channels could be implemented as completely independent "linked lists" of blocks. Special blocks need to be reserved for describing the "allocation" of blocks in the IIO file, as seen in the figure below.
Figure 4: Alternate implementation of the IIO file.
The first block in the file is a control block, and it has fixed size entries that specify what kinds of blocks follow. Every M-th block is a control block, where M is the number of entries in the control block plus 1. Each entry in the control block specifies what kind of data is stored in the block associated with it (i.e. what "channel" that block belongs to), and what the following block in the channel is. For example, in the above image there are three channels (i.e. three kinds of data, drawn in black, red and blue). Blocks 1,2,5,7,12 belong to the first channel (black), containing data of type D1, and this is described in the control block by the entries associated with those blocks (i.e. 1,2,5,7,12). Channels (and blocks) are allocated on the fly as needed. This method leads to more compact IIO files in most circumstances, but the big drawback here is that completely random accesses to the physical blocks are not possible, because the blocks are chained. Each read/write operation in Layer 3 (top level, see above) involves several I/O operations in Layer 1 (the IIO file); each of those I/O operations requires the traversal of a chain to find the appropriate block where the data should be transferred, which is a succession of cached I/O accesses itself. Therefore my impression is that the file accesses would be much slower, even when using a cache buffer. A classical space-time tradeoff... System TablesThe FAT, or File Allocation Table, specifies the way the files are made out of blocks. For example, if a file has 1024 bytes (2 blocks), it will have two nodes allocated in FAT. There is a 1:1 mapping between the FAT and the virtual disk ( The NT, or Node Table, contains one special node for each file stored in the system. It does not have anything to do with Windows NT. When a file is created, a new node is allocated for it in the NT. The node contains a The DT, or Directory Table, contains surface information about files (such as their names, and pointers to the NT nodes associated with them). Its main purpose is to categorize the file names in such a way that
Because of the highly specialized search structure that is being used in
Figure 5: Relationship between DT, NT, and FAT.
Pattern Matching (nfs_pmatch.h/cpp)The pattern matching algorithm was borrowed from GNU's implementation of nFS Main Module (nfs.h/cpp)The main module uses the lower level functions defined in layers 1 and 2 to provide a clean interface that closely resembles the Unix set of I/O functions: Using the CodeThe unit test (nfs_unit_test.cpp) included with this package demonstrates how to use the library. The example below shows how to:
#define how_many 6999
#define data_size 2048
#define gen_fname(i) ...
#define pattern "SomeFile*.d?t"
char data[...];
void test_nFS() {
int i, k;
printf("Starting fs...\n");
nfs_Handle* fs = nfs_start("fs", FS_RW);
if (fs == NULL) {
nfs_perror(fs, "start error: ");
fatal("stop");
}
if (fs == nfs_NULL) {
nfs_perror(fs, "perror reports");
fatal("File system could not be created");
}
printf("Creating %d files, each %d bytes. wait...\n", how_many, data_size);
for (i = 0; i < how_many; i++) {
if (nfs_file_exists(fs, gen_fname(i)))
printf("file %s already exists\n", gen_fname(i));
else {
int f1 = nfs_file_create(fs, gen_fname(i), 0);
if (f1 >= 0) {
k = nfs_file_write (fs, f1, data, data_size);
if (k != data_size) {
printf("file: %s\n", fname);
fatal("too few bytes written to file");
}
nfs_file_close(fs, f1);
} else {
printf("error creating file %s\n", gen_fname(i));
fatal("see above");
}
}
}
/* globbing */
nfs_glob_t tt;
printf("globbing (%s)...\n", pattern);
tt.gl_offs = 0;
nfs_glob(fs, pattern, GLOB_DOOFS, NULL, &tt);
printf("Results: [%d found]\n", tt.gl_pathc);
for (i = tt.gl_offs; i < tt.gl_pathc + tt.gl_offs; i++) {
printf(" match: %s\n", tt.gl_pathv[i]);
}
printf("freeing glob results...\n");
nfs_glob_free(fs, &tt);
printf("stop...\n");
nfs_end(fs, 0);
}
RemarksThe library included with this article is fairly limited in that it supports only a small subset of the total number of file-related operations that an OS supports. A possible improvement to Another interesting project would be to convert this to object-oriented C++. Again, contact me if you are interested in that. History
|
||||||||||||||||||||||