|
Introduction
Some time ago, I was sent to work with an old mainframe system. There I became familiar with a source repository where a single file on the system contains several files from the source code. Although this seems very familiar to many people (through TAR or ZIP files), I wanted the ability to work with the files within without copying them to a temporary location. This article is the result of this quest.
The Archive Class
I would be very unpractical to create such a class just to store files. It would work just like many other compression libraries out there – except for no compression at all. Plus, it would impose many limitations to store just files and no directories to organize them.
I created a simple file system based on what has been described as WinFS (Windows Longhorn FileSystem based on SQL). A single file contains a table with every file or folder inside the Archive. It's a simple record, so there's nothing more than a name, number of parent folder, entry points for files, or indexes for folders. I chose to record no dates or times. I also chose to use UNIX-style path separator to avoid escaped backslashes or verbatim strings.
In the end, it's a very simple class containing many of the functionality provided by File and Directory classes in the System.IO namespace, including methods like OpenRead and OpenText.
Why would I use it?
At first sight, many would consider a waste of time to use an Archive. And maybe you're right. The usage of every piece of technology depends on its need within the project. Maybe it's not your case.
At first, I designed an Archive to contain source codes for all of my projects. When it was near completion, I realized it wasn't such a great idea but found other uses for it such as small databases (specially to store serialized objects) and/or files I don't want users picking at. You can even find others I haven't thought of, let us know.
Limitations
As far as I can see, there are very few limits to expose:
- An
Int32 is used to index file blocks, so the archive size limit is 1,5Kb * 2,147,483,647 (= 3,221,225,470 Kb, theoretically). This is also the biggest a single file can get.
- Directories differ from files by using negative start indexes, thus limiting an Archive to 2,147,483,648 directories. Currently, a stored index is used to retrieve the index for a new directory and this is never reset. (Should I revise it?)
- Maybe others that I can't remember right now.
Expanded Universe
This is the goal of open-source, isn't it? So, the Archive class is made easy to understand, maintain and modify. In a few minutes reading the code, you'll be able to expand the class personalizing it to your own needs. Here are some examples that can be easily accomplished:
- Replace the "Archive:" header for a more complex version detailing the use or meaning of the files within;
- Simple scramble cryptography – OK, to use a crypto-stream, the file would require a temporary stream, but some simple cryptography (i.e., using
XOR) is easy;
- Additional file information like creation and modification times (which I chose not to include);
- A new property to define which character is to be used for path separation;
- etc.
Some of the modifications might even be useful to other users, so I suggest if you modify the Archive class, post a brief of the modifications you made here too.
Points of Interest
The code has been widely tested, except for files (as can be seen in the Main procedure), although the code used was tested in a previous version. So, there should be no problems with it. If there are any problems, let me know.
Also, the Archive class uses widely .NET Generic collections, limiting its use to .NET Framework 2.0. If anyone is interested in porting it to prior versions, I can post the results here.
| You must Sign In to use this message board. |
|
| | Msgs 1 to 9 of 9 (Total in Forum: 9) (Refresh) | FirstPrevNext |
|
|
 |
|
|
Hi,
I want to start by saying thanks to the author for a great article (so far)  I am not too experienced in "playing" with the file system, but Im trying to learn.
The last couple of hours I have been playing around with the code, but for some reason I still can comprehend the task of actually getting a file into the archive As the sample demonstrates it is quite easy to play around with folders (creating, moving them etc), but what function/stream adds/appends a file to the archive?
If someone can please help me 
Allan Haugsted
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
You can use the Open or OpenWrite methods. I'd recommend the first one, once it has more parameters and you can always set FileMode.OpenOrCreate (overwriting existing files).
There is no ready-made method to import a file to the archive, if that is what you meant. You have to manually create or open a stream and copy the file, byte by byte. But I may provide an update to the archive classes which include methods to import and export files this way.
[]'s Harkos --- "Money isn't our god, integrity will free our soul." Cut Throat - Sepultura
-- modified at 6:08 Monday 12th December, 2005
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
I've been spending a little time working with your very nice code. Thank you. Perhapes you can solve an issue I'm having.
I open a new archive then create some directories and finally add a file (via a byte[] array). I then close the archive and come back to it sometime later. I notice the directories are all correct but the name of the file is missing (so now I can't locate the file and open it).
I looked around and found that if I call UpdateDir method after CreateNew I can now see the filename when I reopen the archive. So I'm happy on that point -- this could be a little bug.
Now a new problem is that when I open the archive I see the file but the filesize was not captured (nor am I sure the offsets are correct). I can not load my byte's back because the length is unkown and I get 'read past end of file' every time
I like this project for the following reasons:
* I'm not forced to load the entire file into memory to use it (am I correct?) * Its a much simple API (File.IO style) then other options noted in other posts. * Not very much code so very lightweight * No GPL (no restrictions) * Created in C# (and .net v2) * Fast
All positives in my book....I can understand why game developers would like it.
Any help on my questions is much appreciated. Better yet, please post an update when you have a chance
Thanks again --- Scott
Any help is appreciated
Scott
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Scott,
Thanks for your reply. I'm glad my work is of help to others. I'll try to help as much as I can. First, I have two comments about why you like this project:
- You're correct. I don't load any file into memory. The Archive and all other classes are an interface to a single file. If you read the sources, you know that when you read an archived stream, it reads directly from disk as much as you need.
- I really posed no GPL or other open source licenses because I really don't believe in this open source madness. At any time the can pull they're licenses and charge you for that. Even Torvalds did it. This is no open source at all; if you want to give code away, give it with no restrictions. I put no licenses, but I'd like to be mentioned as author of the class if it's used in any projects.
- I'm specially glad to hear the the code is simple to use, lightweight and fast. I never performed any test on this matter, so thanks again.
Now, about your doubts:
- The directory entry is a very simple one. It was meant to record only filenames, start offsets and parent directory ID. No size or date/times. But it can be altered to, if you want. Also, take a look at the FileInfo class, property Length. You will see how I can get sizes and perhaps understand a little better how the pseudo-filesystem works.
- There is no
CreateNew method, but I can assume your creating a file by opening it. And you're right; there seems to be a bug here. After creating a directory, there is a call to UpdateDirs, where all entries are written to the real file. This doesn't happen when a file is created, only when deleted. You can correct it yourself (I with little time now, but as soon as I can, I will update the sources); search the Archive class for the last Open method. You will see a switch instruction and a FileMode.Create option. Before it ends, on the break instruction, add the line UpdateDirs(); to it. This shall solve the problem.
- As for the file size problem, I'll need a little more time to review the code and see if there is really any bug. If there is, I might post it back together with other fixes.
I hope this helps for now, but I'll try to solve these issues as soon as I can.
[]'s Harkos --- "Money isn't our god, integrity will free our soul." Cut Throat - Sepultura
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Indeed I wasn't aware of this NTFS capability. Although interesting, it requires the file never to be moved outside (like a backup, i.e.) and the filesystem to be NTFS. Even though, working with Archives is very simple and its functionality will be available with different filesystems. This is one of my reasons to create it.
[]'s Harkos --- "Money isn't our god, integrity will free our soul." Cut Throat - Sepultura
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
First - I have a pet project that could benefit from dividing a stream into multiple streams. So, I tend to read all the articles on that subject. I was surprised that this article had very little to do with System.IO.Stream.
Secondly - why create something like this? You could have just reused one of the many .NET archive examples that use the ZIP functionality from the Java compatibility portions of the framework. You could have even used it in a way to bypass the compression. Maybe you wouldn't have had all of the flexibility of your custom code, but it would have been a universal file format. Or you could have used the horrid compound document storage through COM interop. Or you could have been modern and created an XML format with binary file detection and conversion.
I have spent a lot of time converting custom file formats to XML or DB implementations. This rework was almost always due to maintenance overhead and field support issues.
Sorry to blast your article - but I think this is very misguided.
Dale Thompson
|
| Sign In·View Thread·PermaLink | 3.46/5 (4 votes) |
|
|
|
 |
|
|
Allow me to dissecate you post:
First - Your right, this has nothing DIRECTLY to do with System.IO.Stream (apart from its inheritance by one of the classes in the project). But in a certain way I AM dividing a stream into many.
Second - Perhaps I could have tried anything you suggested, but it wouldn't accomplish the need I have for this (which does not involve a universal file format of any sort). Besides, I like to code some of my solutions and this sounded like a pretty good way to implement some basic filesystem funcionality without needing to create the whole filesystem. It works for me.
I don't think the article is misguided, but if you're talking about the title, I may agree and try to change it to something more appropriate.
[]'s Harkos --- "Money isn't our god, integrity will free our soul." Cut Throat - Sepultura
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Working with zip files - even when there is no comrpession - isn't always an option. When you add files to an archive, the entire file has to be rebuilt.
I am using this code as a base for one of my projects - a game editor that can have many thousands of game assets on the hard drive at any given time. Consolidating them into one file greatly improves performance and cuts down on massive wasted space and fragmentation.
So don't be so quick to call other people's work useless, just because you haven't yet encountered a use for it. You obviously don't now everything yet, but keep trying - you'll get there someday 
And to the author: Thanks for this code - it has been very useful!
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|