|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis article presents a class library for encoding/decoding files and/or text in several algorithms in .NET. Some of the features of this library:
Using The CodeRemember to add a reference in your project to TextCoDec.dll. Once you add the reference, Visual Studio .NET should take care of the copying for you. Dim Yenc As New TextCodec.Yenc
Dim Parts() As String
'encode the windows calculator in 3 parts with a line length of 80
Parts = Yenc.Encode("C:\WINDOWS\system32\calc.exe", 80, 2, _
TextCodec.Yenc.yencVersion.Version1_1)
'decode to c:\
Yenc.Decode(Parts, "c:\", 0)
'test it
Shell("c:\calc.exe")
The AlgorithmsBase64The base64 encoding/decoding was the easiest one to implement: it is part of
the framework ( Quoted PrintableThis algorithm is essentially used to encode non English text. Characters codes outside the range 32 to 126 are transformed to their ASCII hex value preceded by an equal sign, the exception being the character code 61 (the equal sign) which must also be encoded. For i = 0 To Chars.Length - 1
Ascii = Asc(Chars(i))
If Ascii < 32 Or Ascii = 61 Or Ascii > 126 Then
EncodedChar = Hex(Ascii).ToUpper
If EncodedChar.Length = 1 Then EncodedChar = "0" & EncodedChar
ReturnString.Append("=" & EncodedChar)
Else
ReturnString.Append(Chars(i))
End If
Next
UUEncodeThe best algorithm definition I found is the following, taken from here. The uuencode algorithm hinges around a 3-byte-to-4-byte (8-bit to 6-bit data) encoding to convert all data to printable characters. To perform this encoding read in 3 bytes from the file to be encoded whose binary representation is The main encoding is achieved in VB by the following code: For i = 0 To Chars.Length - 1 Step 3
DecodedBytes(0) = Asc(Chars(i))
DecodedBytes(1) = Asc(Chars(i + 1))
DecodedBytes(2) = Asc(Chars(i + 2))
EncodedBytes(0) = (DecodedBytes(0) \ 4 + 32)
EncodedBytes(1) = ((DecodedBytes(0) Mod 4) * 16) + _
(DecodedBytes(1) \ 16 + 32)
EncodedBytes(2) = ((DecodedBytes(1) Mod 16) * 4) + _
(DecodedBytes(2) \ 64 + 32)
EncodedBytes(3) = (DecodedBytes(2) Mod 64) + 32
If (EncodedBytes(0) = 32) Then EncodedBytes(0) = 96
If (EncodedBytes(1) = 32) Then EncodedBytes(1) = 96
If (EncodedBytes(2) = 32) Then EncodedBytes(2) = 96
If (EncodedBytes(3) = 32) Then EncodedBytes(3) = 96
ReturnString.Append(Chr(EncodedBytes(0)))
ReturnString.Append(Chr(EncodedBytes(1)))
ReturnString.Append(Chr(EncodedBytes(2)))
ReturnString.Append(Chr(EncodedBytes(3)))
Next
YencIn essence, the yenc algorithm can be implemented by the following expressions: EncodedCharacter = (Character + 42) Mod 256
EncodedSpecialCharacter = (EncodedCharacter + 64) Mod 256
There are, as always, some characters which make up the exceptions. Those are null (0), line feed (LF), carriage return (CR) and the equal sign (=). The tab character was also an exception but was removed in version 1.2. If the encoded character is one of the afore mentioned, re-encode it with the EncodedSpecialCharacter expression and escape it with the equal sign. The yenc algorithm is flexible, however. If, for some reason a character isn’t suitable in the encoded stream, escape it as you would a special character. This is especially useful for nntp transmission. With the latter protocol, a double dot (..) signifies the end of stream. However, the dot character isn’t by default a special yenc character so you could end up with a line which starts with a double dot. This would confuse some newsreaders; a good principle is to always escape a dot if it is located at the beginning of the line. There is another exception dealing with the line length. The choice of line length is flexible, but it’s length is also variable in the way that you can’t end a line with the escape character. If the last character to be encoded turns out to be a special character you escape it normally and end up with two characters (the escape charater and the encoded one), thus with a line length of length+1. For more information on yenc go to www.yenc.org The main encoding is achieved in VB by the following code: For i = 0 To n - 1
CharCode = (Bytes(i) + 42) Mod 256
Select Case CharCode
Case 0, 13, 10, 61
OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
Case Else
If Version = yencVersion.Version1_1 And CharCode = 9 Then
OutputLine &= "=" & Chr((CharCode + 64) Mod 256)
Else
OutputLine &= Chr(CharCode)
End If
End Select
If OutputLine.Length >= LineLength Then
Output.Append(OutputLine & vbCrLf)
OutputLine = ""
End If
Next
StreamsAs I was rewriting the code from scratch, I was amazed at how streams made my life easier. Not only that, but the code also got a speed boost that is almost unbelievable (about 11400% actually). So why easier, you may ask. Well, almost anything can be turned into a stream. Take a look at the following examples: Dim MyPath As String
Dim MyByteArray() As Byte
Dim MyString As String
Dim MyStream As New Filestream(MyPath)
Dim MyStream As New MemoryStream(MyByteArray)
Dim MyStream As New Memorystream(System.Text.Encoding.Default.GetBytes (MyString))
As you can see, streams are very versatile. That took care of almost all overloads! A stream is also endless, so that sidestepped the problem of decoding multipart yenc files. Because data can be written anywhere on a stream, I didn't have to sort the parts to write them sequentially. I opened a stream, positioned it at the offset of the part (parsed from the part header) and just dumped the decoded data into it. Other OptimizationsOne other object of the .NET Framework allowed the amazing speed increase: the StringBuilder object. If you have to concatenate large strings, I strongly recommend using this object. In some measurements I made, string concatenation is 250 times faster with this object. It is ideal for this project, as an enormous part of the encoding/decoding process involves string concatenation. A Few Words Of AdviceIf speed is more important than presentation, don't declare the encoder/decoder with events. If you handle the progress event, the decoding will be noticeably slower. When encoding large files (larger than 10MB), don’t encode them to memory. Use the overloads that encode to a file. Also, don’t rely too much on the garbage collector. Always destroy variables; it’s always a good principle. If you’re trying to time the encoding/decoding process, be sure to disable any anti-virus. The reason for that is that the written file streams won’t close until the AV has finished checking the file for viruses. For small files this may not affect the results in a noticeable way, but for large files it can really make a mess of things… Finally, if you’re encoding really large files and you’re fortunate enough to have two hard disks, make certain that you read from the slower one and write to the faster one. HDD writing is always slower than reading, and a HDD can’t read and write at the same time (so if you read and write to the same HDD, it’ll position it’s head, read a chunk of data, reposition the head, write another chunk, and so on). CreditsDocumentation was compiled to XML by VB.Doc, a free documentation system for the VB.NET programming language. The help file was generated by NDoc. History
Feedback and ImprovementsFeel free to post questions, enhancements or problems to the forum below. I'll keep an eye on them and help where possible. If you have an enhancement or optimization, post it so that everyone may benefit from them. I'll review them and add them to the project with due credit.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||