Delphi CSV File and String Reader Classes






4.80/5 (14 votes)
TnvvCSVFileReader and TnvvCSVStringReader are light weighted and fast classes that resemble unidirectional data set
Introduction
Classes I present here are functionally identical to classes described in the article C# CSV File and String Reader Classes and have the same set of public
methods and properties that are explained there in detail. All said in that article is also true in this case. It is recommended to read that article first since I am not going to repeat everything here even though there are some minor differences like variable types, etc., but the relation between Delphi and C# specifics is obvious. Below, I will outline the CSV Reader features and will also provide the information related to Delphi code use.
Version 2.0 update: Version 2.0 significantly improves performance and adds encoding control to TnvvCSVFileReader
, which resulted in slight modification of public
interfaces of base TnvvCSVReader
(minor change) and derived TnvvCSVFileReader
(more significant change) classes comparing to version 1.0. Differences are explained in the section "Notable difference between Delphi and C# CSV Reader classes" below. Performance related notes are in "History" section below.
TnvvCSVFileReader
and TnvvCSVStringReader
are light weighted and fast classes that resemble unidirectional data set. They are very simple to use and have properties that allow handling number of existing variations of CSV and “CSV-like” formats.
Classes are derived from abstract TnvvCSVReader
class that does not specify data source and instead works with instance of TTextReader
class.
TnvvCSVFileReader
and TnvvCSVStringReader
accept file and string
as data sources respectively. They introduce additional “CSV source” related properties and override the abstract
method that returns instance of specific TTextReader
descendant:
function CreateDataSourceReader: TTextReader; virtual; abstract;
Classes for other CSV data sources can be created in a similar way.
CSV Reader Features
- Supports three kinds of line delimiters:
<CR>
,<CR><LF
> and<LF>
, all of which can be present in the same CSV file simultaneously. Consequently, the<LF><CR>
pair will result in an empty line. This situation can nonetheless be handled by setting propertyIgnoreEmptyLines
totrue
. - Presence of header in the very first record of file is controlled by boolean property
HeaderPresent
. - Empty lines can be ignored (by default, they are not ignored).
- Number of fields is auto-detected (by default) on the base of the first record or must be set explicitly if auto-detection is off.
- Field separator by default is comma (0x2C) but virtually any (Unicode) character can be used, for example, TAB, etc.
- Field quoting allows multi-line field values and presence of quote and field separator characters within the field. By default, it is assumed that field may or may not be enclosed in quotes but reader can be instructed not to use field quoting.
- Quote character by default is double quotes (0x22) but virtually any (Unicode) character can be used. It is assumed that quote character is also used as an escape character.
- Unicode range of the character codes is assumed by default but can be limited to ASCII only by setting corresponding property to true.
- Characters with codes below 0x20 (and above 0x7E in ASCII case) are considered to be “Special characters” and by default must not appear in the file. That requirement does not affect line delimiters and field separator and/or quote character if they are from this range. As an option, the reader can be instructed to simply ignore the special characters.
- Reader itself does not use buffering. It uses memory just enough to store field names and field values of the current record. If any buffering is happening, then standard Delphi classes like
TStreamReader
andTStringReader
are responsible for that.
Version 2.0 update: Version 2.0 does use buffering in order to significantly improve performance. Performance related notes are in below "History" section. - Reader supposedly is fast since it reads each character directly from
TTextReader
and analyzes character just once, i.e., reader does one-pass parsing. Also, parser uses minimum conditional logic.
Using the Code
Use is straightforward. Simply create an instance of corresponding class, specify the source of CSV data, modify some properties if necessary, call Open
, and iterate through records calling Next
. Within each record, iterate through the field values. Call Close
when done.
Using TnvvCSVFileReader Class
uses Nvv.IO.CSV.Delphi.NvvCSVClasses;
procedure ReadCSVFile(const ACSVFilePath: string);
var
csvReader: TnvvCSVFileReader;
i: Integer;
begin
//Constructor can have parameter that, if >0 and <>512(default), sets buffer size in chars
csvReader := TnvvCSVFileReader.Create;
try
//Specify source CSV data file using one of the three overloaded methods.
//If, for example, it is ASCII file:
csvReader.SetFile(ACSVFilePath, TEncoding.ASCII);
// Modify values of other input properties if necessary. For example:
csvReader.HeaderPresent := True;
csvReader.Open;
if (csvReader.HeaderPresent) then
for i:=0 to csvReader.FieldCount-1 do
DoSomethingWithFieldName(csvReader.Fields[i].Name);
while (not csvReader.Eof) do
begin
for i:=0 to csvReader.FieldCount-1 do
DoSomethingWithFieldValue(csvReader.Fields[i].Value);
csvReader.Next;
end;
csvReader.Close;
finally
csvReader.Free;
end;
end;
Using TnvvCSVStringReader Class
uses Nvv.IO.CSV.Delphi.NvvCSVClasses;
procedure ReadCSVString(const ACSVString: string);
var
csvReader: TnvvCSVStringReader;
i: Integer;
begin
//Constructor can have parameter that, if >0 and <>512(default), sets buffer size in chars
csvReader := TnvvCSVStringReader.Create;
try
csvReader.DataString := ACSVString; // Assign string containing CSV data
// Modify values of other input properties if necessary. For example:
csvReader.HeaderPresent := True;
csvReader.Open;
if (csvReader.HeaderPresent) then
for i:=0 to csvReader.FieldCount-1 do
DoSomethingWithFieldName(csvReader.Fields[i].Name);
while (not csvReader.Eof) do
begin
for i:=0 to csvReader.FieldCount-1 do
DoSomethingWithFieldValue(csvReader.Fields[i].Value);
csvReader.Next;
end;
csvReader.Close;
finally
csvReader.Free;
end;
end;
Notable Difference between Delphi and C# CSV Reader Classes
Delphi’s counterpart defines an event in the following way:
property OnFieldCountAutoDetectComplete : TNotifyEvent
{- This event fires from within Open if FieldCount_AutoDetect is true. Use of this event is
optional since "auto-detected" FieldCount is available upon completion of Open any way.}
Starting with version 2.0:
- Constructor of
TnvvCSVReader
and consequently constructors ofTnvvCSVFileReader
andTnvvCSVStringReader
have optional parameter that defines capacity inchar
s of buffer betweenCSVReader
and source stream. Experiment shows that increasing size (over default 512) does not give visible performance improvement.constructor Create( ABufferReadFromStreamCapacityInChars: Integer = 512 ); override;
- Instead of read-write property
FileName
,TnvvCSVFileReader
uses three overloaded methods to specify source file. Those methods correspond to three overloaded constructors ofTStreamReader
with the same sets of parameters. Meaning of parameters is also the same. Calling particular form ofSetFile
results in the call of correspondingTStreamReader
constructor whenTnvvCSVFileReader
instantiatesTStreamReader
internally to actually read the file. Note that Delphi'sTStreamReader
with default encoding settings, unlike .NET'sStreamReader
, can be not very good in automatic detection of source's encoding. SometimesTStreamReader
returns just part of the source data causingCSVReader
to generate error (like "wrong number of fields"), sometimes it looks like it just "hangs". Therefore if attempt to read some CSV data usingSetFile
with single file name parameter or with some encoding parameters generates error, then it is possible thatTStreamReader
needs more or correct information about source's encoding (AEncoding
and/orADetectBOM
parameters).
procedure SetFile( const AFileName: string ); overload;
procedure SetFile( const AFileName: string; ADetectBOM: Boolean ); overload;
procedure SetFile( const AFileName: string; AEncoding: TEncoding;
ADetectBOM: Boolean = False; AStreamReaderInternBufferSize: Integer = 1024 ); overload;
TnvvCSVFileReader
has five read-only properties that are source file related. Their values are set by above-mentionedSetFile
methods. Meaning of first four properties is obvious. PropertyStreamReader_ConstructorKind
has typeTstreamReaderConstructorKind
and its value shows what kind of constructor (with regard to set of parameters) ofTStreamReader
is called when it is instantiated.
type
TStreamReaderConstructorKind = ( srckFile, srckFileBOM, srckFileEncodingBOMBuffsize );
property FileName: string read FFileName;
property StreamReader_Encoding: TEncoding read FStreamReader_Encoding;
property StreamReader_DetectBOM: Boolean read FStreamReader_DetectBOM;
property StreamReader_InternBufferSize: Integer read FStreamReader_InternBufferSize;
property StreamReader_ConstructorKind: TStreamReaderConstructorKind
read FStreamReader_ConstructorKind;
Downloading Source Code
The following source code, which should work with Delphi 2009 (and later versions), is available for download above:
- Unit "
Nvv.IO.CSV.Delphi.NvvCSVClasses.pas
" containing classesTnvvCSVReader
,TnvvCSVFileReader
andTnvvCSVStringReader
. - Code part of main form of VCL Forms Application that tests both
TnvvCSVFileReader
andTnvvCSVStringReader
classes is in "CSVReaderTest_MainForm.pas
" file. Detailed instruction on how to quickly create test application is provided at the beginning of the file.
History
Version 2.0 (2015-03-10)
- Significantly improved performance roughly seven times for
TnvvCSVFileReader
and four times forTnvvCSVStringReader
due to the following:- Use of
TCharArray
buffer for readingchar
s from stream object in bigger chunks and after that reading singlechar
s from this new buffer. It was done mainly because of very inefficient handling by Delphi'sTStreamReader
of its internal buffer (see issue description for example here and here). - Use of dynamic
TCharArray
as buffer for currently "assembled" field value instead ofString
, Array grows in size dynamically with 128 increments to accommodate longest field value. By the way, dynamic array is at least two times more efficient here thanTStringBuilder
. - Assembling frequently called methods/procedures into big procedure at expense of code structuring and readability. Apparently time of procedural call is significant.
- Use of
- Added encoding control to the
TnvvCSVFileReader
. See above "Notable difference between Delphi and C# CSV Reader classes" section for details.
Version 1.0 (2014-06-08)
- First release