Click here to Skip to main content
15,893,190 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello there! I'm trying to figure out a way to write large generated text files directly into compression using C#. They are list of characters such as:

AAA
AAB
AAC

They compress very well, but the initial files are enormous. The GZIP libraries in C# only cut the file size in half, where as winrar knocks it down to a 10th. This is acceptable where as half isn't.

These files get extremely large, over a terabyte.
Posted
Comments
Sergey Alexandrovich Kryukov 5-Nov-11 22:44pm    
I wonder how big is the set of all possible characters? Not just 3 characters A, B, C? If it's relatively small set, you could compress it way more effectively. GZIP and RAR are just not optimized for such untypical content. And of course you can create a compressed stream and write into it.
--SA
joeswindell 5-Nov-11 22:57pm    
The max I'm looking at is 25^10 a permutation of 10 slots with each slot having 25 possible characters.
BillWoodruff 6-Nov-11 0:08am    
Are there consistent patterns of duplication of content within the files ... which could somehow be analyzed and replaced by some form of tagging system ... pre-encryption ?
joeswindell 6-Nov-11 15:09pm    
Yes MANY of them! I've been looking at compression that uses things like that but I'm not sure I understand it.

1 solution

 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900