|
Text files generally contain several "lines" of text, separated by "linefeeds" (ASCII 10).
On some operating systems, the file will also contain "carriage returns" (ASCII 13).
The developers of the original C library decided that, for portability, the programmer shouldn't need to know the details.
So, in C you just specify '\n' and the operating system will add a '\r' if it chooses. Likewise, when reading a text file, if the file contains '\r\n', the '\r' will not be returned to the program.
And it was good.
But the designers of .net decided otherwise and think that the programmer should know the details, so we usually have to specify the '\r' or use System.Environment.NewLine which is supposed to aid portability (not that they expect .net to be ported), but which will do just the opposite in the end.
In theory, on a Windows system NewLine will equate to "\r\n" and on some other system it may be "\n" or "\r" or who knows what.
The problem, as I see it, is that this won't work on OpenVMS -- OpenVMS supports many types of files, including many types of text files.
On an OpenVMS system, you can have text files with "\r\n", "\n", or "\r" or with Fortran encoding, just about anything -- what will NewLine be set to on OpenVMS if .net gets ported to it? I suspect they will settle on "\n" and let the operating system figure it out -- just like in C.
|
|
|
|
|
Probably depends on how dynamic the newline functionality is in OpenVMS. Is it just something that is supported across the board, or is it a user setting? That is, does one go to the OpenVMS control panel (or whatever that OS has) and set the preferred newline character(s)? Or, do you just get files from various sources with various newlines and most of the OpenVMS software can support either newline? What about new files -- do they get a certain type of newline, or does it depend on the application? Then you've got to wonder if it supports mixed-mode line endings (that is, a single file with both \n and \r\n). I see a small chance that they'd modify the functionality of \n in mixed-mode line endings, but it seems unlikely otherwise. And the only reason I can think of that they'd modify \n would be so that it could match the line endings of nearby lines in a block of text. That just seems overly complicated though, so I'd guess they would stick with \n meaning \n and picking either \n or \r\n for Environment.Newline.
And then there's System.Text.RegularExpressions. I'm pretty sure newlines are (and must be) well defined in that assembly. If you set "." to recognize every character except newline, it will ignore only \n (i.e., it will pick up \r). And this is on Windows, where the standard newline is \r\n. Seems to me that they would want to keep \n unambiguous so that, at least, a regular expression written on one platform won't change when ported to another platform. By the way, here is a regular expression that recognizes newlines, in whatever form they may take (I use this little puppy quite often):
\r?\n
|
|
|
|
|
It's a per-file setting. I just did some experimenting to be sure I knew what I was talking about.
When I create a new text file, it has the attributes:
Record format: Variable length, maximum 255 bytes, longest 7 bytes
Record attributes: Carriage return carriage control
When reading it in "binary" mode I don't see \r or \n characters (I expected to see them). I believe the operating system (Record Management Services) stores the line length rather than inserting "special characters".
I can convert it to Stream_LF:
Record format: Stream_LF, maximum 255 bytes, longest 7 bytes
Record attributes: Carriage return carriage control
Then I see \n characters (as expected).
I can convert it to Stream_CR:
Record format: Stream_CR, maximum 0 bytes, longest 7 bytes
Record attributes: Carriage return carriage control
And again I don't see \r or \n characters (I expected to see \r).
When I read in "text" mode all have \n embedded.
When I FTP these to Windows, all have \r\n embedded.
The point is, what the program sees isn't necessarily what is written to disk.
All the program needs to know is that one line ends and another begins; the rest is just details.
But .net seems to want the program to know the details; this is a step backward.
|
|
|
|