
Introduction
This is an implementation of the yEnc algorithm, as described at http://www.yenc.org/ . yEnc is not an official standard, but it is nonetheless a very popular encoding method on binary newsgroups. As algorithms go, yEnc is very simple. It uses 8-bit characters to encode binary data. Since binary data is usually stored as 8-bit bytes, it does not have to accomplish too much :)
yEnc's popularity is due to the fact that it uses a full byte to encode the data, whereas other methods use only 7-bits. This makes messages encoded with yEnc smaller by a factor of 33-40%, according to the website. Smaller means quicker to upload and download, which is important when dealing with large binary files. It has additional benefits as well, in the form of an optional CRC32 check.
My implementation is interesting from 2 points of view:
- It is the only open-source one written in C#
- It is implemented as a cryptographic transform - more on that later
Some Info on yEnc
There are some peculiarities of newsgroup messages:
- messages must be broken into lines, max around 1000 characters
- some characters have meaning, and as such need to be escaped out
The current yEnc algorithm escapes out CR, LF and the NULL character by default. However, individual encoders are free to escape other characters as they wish. Lines are broken at 128 characters, or 256 characters, by convention. Other line lengths are supported.
yEnc data begins with a =ybegin tag at the start of a line. The tag has additional attributes that specify the number of bytes to expect, as well as the name of the file and the length of the lines. Multipart messages are supported. The data ends with a line starting with =yend. The reason for the "=y" is that, due to the nature of the algorithm, it could never occur naturally as part of the data.
Using the code
My implementation of the algorithm deals purely with encoding and decoding the data, not parsing of messages, or even parsing of the yEnc headers and footers. To me, that is a separate challenge, which I'll leave to someone else.
Initially, I started coding the encoder as an implementation of System.Text.Encoder. However, I soon realized that, although I could read the data as text, I was really dealing with bytes. Probably, that should have been obvious to me from the beginning, but sometimes it takes a while :( Eventually, I decided it would work best as an implementation of ICryptoTransform. This is not to imply that it is a cryptographic algorithm, just that it transforms data in similar ways - the size of the input data does not necessarily match the size of the output data. Microsoft chose to implement the Base64 transformation objects in a similar way.
The benefit is that you can use the objects together with a CryptoStream object, which is a fairly easy interface to use, and automatically adds support for streams. I'll stress again though, that this is not an encryption technique - I am just making use of existing Framework objects and interfaces to add power to my objects.
To encode some yEnc data, your code might look like this:
MemoryStream ms = new MemoryStream();
YEncEncoder encoder = new YEncEncoder();
CryptoStream cs = new CryptoStream(ms, encoder, CryptoStreamMode.Write);
StreamWriter w = new StreamWriter(cs);
w.Write("Test string");
w.Flush();
cs.Flush();
To decode it again, the code might continue:
ms.Position = 0;
YEncDecoder decoder = new YEncDecoder();
CryptoStream cs2 = new CryptoStream(ms, decoder, CryptoStreamMode.Read);
StreamReader r = new StreamReader(cs2);
string finalText = r.ReadToEnd();
This is pretty standard code that you might write if you were encrypting your data, the only difference being that we are using the yEncEncoder and yEncDecoder instead of a system-supplied encryption algorithm.
Points of Interest
I have made use of Phil Bolduc's implementation of the CRC32 algorithm, found at http://www.codeproject.com/csharp/crc32_dotnet.asp . Unfortunately, there were some bugs in that that consumed a significant amount of my time. I had to make some modifications to make it work 100%. Other than that, the code is an original work of my own, not based off of any other implementations. You are free to use it for whatever purpose you may desire, as long as you attribute it to me in the code comments.
The dowloadable code includes a lot of NUnit tests, which test things to a point where I am comfortable that everything works. They should make the code easy to expand on for anyone who wants to add functionality.
| You must Sign In to use this message board. |
|
|
 |
|
 |
I'm using VB.NET which doesn't have a direct equivelant for the C# "unchecked" keyword so this is giving me a little bit of grief. If I simply remove it then I obviously get an OverflowException.
Should I be catching the OverflowException and truncating it manually?
Any help would be greatly appreciated.
(Fantastic component if I can get it working btw, I'm glad I'm not the only one interested in Usenet these days - although I notice this article is rather old!)
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I'm using unchecked when dealing with bytes so that they just wrap around, e.g. 255+1=0, 255+2=1, 1-2=255.
I don't think you should trap an exception, because you still need to get a good value back out - it would be better to figure out the equivalent logic in VB.
Its hard though, I do not know what it should be. Sounds like a question for stackoverflow.com
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Thanks for getting back to me.
jmcilhinney at VBForums suggested using a Short for the result and then taking the lower-order Byte (with any overflow discarded in the higher-order Byte). I'll give it a go shortly but I've just stuck the original C# version in a .dll for now.
I've been playing with it this morning but I'm struggling to get the right number of bytes after decoding. I'm expecting 228,217 but I'm getting 228,224. The most frustrating thing is that I can open the zip file and see the contents but it won't extract!
So I'm hoping it's something very simple. The input Byte array I'm giving the decoder contains the raw data received from the NNTP server. I give it the offset and length I've already calculated to be the start/end points of the raw yenc data. I expect it is the offset and/or length that are slightly off. I am giving it everything between the start of the line following the line containing the "=ybegin..." and the line before the "=yend". Should I be including the trailing \r\n, for example? I believe I'm starting the data in the correct place because when I open the output file next to the original in a hex editor, the first k or so are identical. However differences start to appear before the end of the file, so I'm not sure it's as simple.
This is my code:
Dim ms As New MemoryStream ms.Write(recvbytes, 0, received) ms.Seek(0, SeekOrigin.Begin)
Dim sr As StreamReader = New StreamReader(ms, Encoding.ASCII, False) Dim strHeaders As New StringBuilder For i As Integer = 1 To 4 strHeaders.AppendLine(sr.ReadLine) Next
Dim mymatch As Match = Regex.Match(strHeaders.ToString, "=ybegin part=(\d+) line=(\d+) size=(\d+) name=(.+\.zip)") Dim strParts As String = mymatch.Groups(1).Value Dim strLines As String = mymatch.Groups(2).Value Dim iSize As Integer = mymatch.Groups(3).Value Dim strFilename As String = mymatch.Groups(4).Value
Dim iDataStartOffset As Integer = strHeaders.Length ms.Seek(iDataStartOffset, SeekOrigin.Begin) Dim yend(4) As Byte Dim bytesread As Integer Dim iYencLength As Integer Dim enc As New ASCIIEncoding Do bytesread = ms.Read(yend, 0, 5) ms.Seek(-4, SeekOrigin.Current) iYencLength = iYencLength + 1 Loop Until enc.GetString(yend) = "=yend" Or bytesread = 0 iYencLength = iYencLength - 3 Dim fs As FileStream = File.Open(strDataPath & "downloaded\" & strFilename & ".ntx", FileMode.Create, FileAccess.Write) fs.Write(recvbytes, iDataStartOffset, iYencLength) fs.Flush() fs.Close()
' Decode Dim decoder As New YEncDecoder() Dim decoded(256000) As Byte Dim decodedbytes As Integer = decoder.GetBytes(recvbytes, iDataStartOffset, iYencLength, decoded, 0, True)
' Save the actual file fs = File.Open(strDataPath & "downloaded\" & strFilename, FileMode.Create, FileAccess.Write) fs.Write(decoded, 0, decodedbytes) fs.Flush() fs.Close()
' Warn about file size If decodedbytes <> iSize Then MsgBox("Incorrect file size... probably corrupt.")
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
It appears my code is in fact working fine.
If I encode the same file using your code or yEnc32 (yenc32.com) then I can decode it fine. It's only when I retrieve it from Usenet that it fails.
So either my sockets code is screwing it up (unlikely as all it does is stick it in a byte array) or YencPowerPost is doing something funny. This becomes more confusing when I download the same file using NewsLeecher and it works perfectly.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Using a packet sniffer I have confirmed that YencPowerPostA&A11b is generating the yEnc data I am unable to decode.
I really cannot get my head around this. Newsleecher (and I'm sure lots of other clients) are able to decode messages posted using PowerPost. I've used both for years.
So far I have not been able to find another program capable of posting binaries to test with. I guess my only option is to code my own...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Man.. I was happy to find your class but it seems to fail with some yEncoded files, but I can decode them successfully with other programs
I'll see if I can fix it..
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Forgive me.. I was wrong.. duh!!!
Guys, when you use this class make sure you use the proper "encoding" for reading the file... You know.. those ,ehem, "nfo" files...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
what encoding did you use for nfo files? I am trying to write a simple console app that will grab only nfo files from certain groups and decode them and put them in a db. For some reason so far they keep coming out just a little bit off. Only thing I can think of is that I have the encoding wrong on something.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
 |
Hi.
I have a yEnc'ed file stored in a text file. I'm trying to decode it and save it as a binary file again. However, I can't find out how. Apparently, CryptoStream doesn't support seeking (.Length doesn't work). Because of this, my code doesn't work. Can anyone point me to the right direction please? Here's my code (Working on a VB project... sorry!)
Dim decoder As New YEncDecoder Dim file as string = "myfile.yenc" Dim filename as string = "myfile.bin" ms = New MemoryStream fs = New FileStream(file, FileMode.Open) Dim b(CInt(fs.Length)) As Byte
fs.Read(b, 0, CInt(fs.Length)) ms.Write(b, 0, CInt(fs.Length))
ms.Position = 0 Dim cs As New CryptoStream(ms, decoder, CryptoStreamMode.Read)
fs2 = New FileStream(GetOutputPath(keyId, Nothing, fileName), FileMode.Create)
Dim b2(CInt(cs.Length)) As Byte cs.Read(b2, 0, CInt(cs.Length)) fs2.Write(b2, 0, CInt(cs.Length))
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
Here's how I would do it, in a sort of mangled mix of VB and C#...
fs2 = New FileStream(GetOutputPath(keyId, Nothing, fileName), FileMode.Create) Dim cs As New CryptoStream(ms, decoder, CryptoStreamMode.Read)
byte[] checkBytes = new byte[16]; bytesRead = cs.Read(checkBytes, 0, 16); while (bytesRead > 0) { fs2.Write(checkBytes, 0, bytesRead); bytesRead = cs.Read(checkBytes, 0, 16); } cs.Flush();
The "16" is arbitrary -- you should probably use a bigger number if you are working with large files.
my blog
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
|
 |
|
 |
I never wrote any code to parse messages, or to create or decode multipart messages. I would like to write code like that, but I do not yet have a clear vision of what that code might look like . If I get around to it, I'll post it here.
Until then, I have no idea how to deal with multipart posts - that would depend entirely on the application that you write to parse or create messages! Nor do I have any examples of usage, except for what you see in the unit tests.
I've never actually used the component, except in testing. I only know it works because of the unit tests. Sorry I can't help you more.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|