Click here to Skip to main content
Click here to Skip to main content

Minimalistic CSV Streams

, 2 Apr 2014
Rate this:
Please Sign up or sign in to vote.
Write and read CSV in few lines of code!

Introduction

MiniCSV is a small, single header library which is based on C++ file streams and is comparatively easy to use. Without further ado, let us see some code in action.

Writing

We see an example of writing tab-separated values to file using csv::ofstream class. Tab is a perfect separator to use because it seldom appear in the data. I have once encountered a comma in company name which ruined the CSV processing.

#include "minicsv.h"

struct Product
{
    Product() : name(""), qty(0), price(0.0f) {}
    Product(std::string name_, int qty_, float price_) 
        : name(name_), qty(qty_), price(price_) {}
    std::string name;
    int qty;
    float price;
};

int main()
{
    csv::ofstream os("products.txt", std::ios_base::out);
    os.set_delimiter('\t');
    if(os.is_open())
    {
        Product product("Shampoo", 200, 15.0f);
        os << product.name << product.qty << product.price << NEWLINE;
        Product product2("Soap", 300, 6.0f);
        os << product2.name << product2.qty << product2.price << NEWLINE;
    }
    return 0;
}

NEWLINE is defined as '\n'. We cannot use std::endl here because csv::ofstream is not derived from the std::ofstream. The reason is to avoid a slow std::ofstream copy during << call. We will see that soon enough in the source code section.

Reading

To read back the same file, csv::ifstream is used and std::cout is for displaying the read items on the console.

#include "minicsv.h"
#include <iostream>

int main()
{
    csv::ifstream is("products.txt", std::ios_base::in);
    is.set_delimiter('\t');
    if(is.is_open())
    {
        Product temp;
        while(!is.eof())
        {
            is >> temp.name >> temp.qty >> temp.price;
            // display the read items
            std::cout << temp.name << "," << temp.qty << "," << temp.price << std::endl;
        }
    }
    return 0;
}

The output in console is as follows.

Shampoo,200,15
Soap,300,6

ofstream

We first look at the ofstream class and its constructors and data member.

struct ostruct
{
    ostruct() : after_newline(true), delimiter(',') {}
    std::ofstream ostm;
    bool after_newline;
    char delimiter;
};
class ofstream
{
public:
    ofstream()
    {
        m_ptr = std::shared_ptr<ostruct>(new ostruct());
    }
    ofstream(const char * file, std::ios_base::openmode mode)
    {
        m_ptr = std::shared_ptr<ostruct>(new ostruct());
        m_ptr->ostm.open(file, mode);
    }
    ofstream(ofstream& other)
    {
        m_ptr = other.m_ptr;
    }
    ...
private:
    std::shared_ptr<ostruct> m_ptr;
};

The data member is only a smart pointer to ostruct. For those pre-C++11 compilers, the std::shared_ptr can be replaced by boost::shared_ptr by defining the USE_BOOST_PTR at the top of the header. The reason to keep this in a smart pointer is to avoid copying of every member defined in ostruct in << operator call. This is probably premature optimization. after_newline is for tracking whether to write a delimiter before the data, which should not be done just after a linefeed. The rest of the member functions are just calling std::ofstream functions or is mutator and accessor.

void open(const char * file, std::ios_base::openmode mode)
{
    m_ptr->ostm.open(file, mode);
}
void flush()
{
    m_ptr->ostm.flush();
}
void close()
{
    m_ptr->ostm.close();
}
bool is_open()
{
    return m_ptr->ostm.is_open();
}
void set_delimiter(char delimiter)
{
    m_ptr->delimiter = delimiter;
}
char get_delimiter() const
{
    return m_ptr->delimiter;
}
std::shared_ptr<ostruct>& get_ptr()
{
    return m_ptr;
}

What follows is the non-member << operators. The first << operator is a template so csv::ofstream supports the data types which std::ofstream can handle. So if there are custom data types to be handled, then overload the << operator for std::ofstream, not csv::ofstream! The second specialized << operator is to track the linefeed and set after_newline to true.

#define NEWLINE '\n'

template<typename T>
csv::ofstream& operator << (csv::ofstream& ostm, const T& val)
{
    if(!ostm.get_ptr()->after_newline)
        ostm.get_ptr()->ostm << ostm.get_ptr()->delimiter;

    ostm.get_ptr()->ostm << val;

    ostm.get_ptr()->after_newline = false;

    return ostm;
}
template<>
csv::ofstream& operator << (csv::ofstream& ostm, const char& val)
{
    if(val==NEWLINE)
    {
        ostm.get_ptr()->ostm << std::endl;

        ostm.get_ptr()->after_newline = true;
    }
    else
        ostm.get_ptr()->ostm << val;

    return ostm;
}

ifstream

Same as csv::ofstream, csv::ifstream holds a smart pointer to a structure, istruct. As the reader can see, even their constructors are similar.

struct istruct
{
    istruct() : str(""), pos(0), delimiter(',') {}
    std::ifstream istm;
    std::string str;
    size_t pos;
    char delimiter;
};
class ifstream
{
public:
    ifstream()
    {
        m_ptr = std::shared_ptr<istruct>(new istruct());
    }
    ifstream(const char * file, std::ios_base::openmode mode)
    {
        m_ptr = std::shared_ptr<istruct>(new istruct());
        m_ptr->istm.open(file, mode);
    }
    ifstream(ifstream& other)
    {
        m_ptr = other.m_ptr;
    }
    ...
private:
    std::shared_ptr<istruct> m_ptr;
};

All the function member delegate their calls to std::ifstream except for one heavy duty function, get_delimited_str. One main reason, get_delimited_str does not make use of strtok is because strtok has a serious bug with regards to CSV processing where consecutive delimiters is counted as one delimiter. For instance, ",," is the same one delimiter, not 2 delimiters with a empty string in between them.

void open(const char * file, std::ios_base::openmode mode)
{
    m_ptr->istm.open(file, mode);
}
void close()
{
    m_ptr->istm.close();
}
bool is_open()
{
    return m_ptr->istm.is_open();
}
bool eof() const
{
    return m_ptr->istm.eof();
}
void set_delimiter(char delimiter)
{
    m_ptr->delimiter = delimiter;
}
char get_delimiter() const
{
    return m_ptr->delimiter;
}
std::string get_delimited_str()
{
    std::string str = "";
    char ch = '\0';
    do
    {
        if(m_ptr->pos>=m_ptr->str.size())
        {
            if(!m_ptr->istm.eof())
            {
                std::getline(m_ptr->istm, m_ptr->str);
                m_ptr->pos = 0;
            }
            else
                break;

            if(!str.empty())
                return str;
        }

        ch = m_ptr->str[m_ptr->pos];
        ++(m_ptr->pos);
        if(ch==m_ptr->delimiter||ch=='\r'||ch=='\n')
            break;

        str += ch;
    }
    while(true);

    return str;
}

Now, we'll look at the >> operator. The first template operator calls get_delimited_str and use std::istringstream to convert to the data type. The second specialized form does not make use of std::istringstream as std::istringstream will delimit/split the string type if the string contains whitespace. It is advisable to switch to boost::lexical_cast by defining USE_BOOST_LEXICAL_CAST because std::istringstream is slow and data conversion is not robust. For example, during a string to integer conversion, an empty string will be silently converted to a zero!

template<typename T>
csv::ifstream& operator >> (csv::ifstream& istm, T& val)
{
    std::string str = istm.get_delimited_str();
    std::istringstream is(str);
    
    is >> val;

    return istm;
}
template<>
csv::ifstream& operator >> (csv::ifstream& istm, std::string& val)
{
    val = istm.get_delimited_str();

    return istm;
}

Conclusion

MiniCSV is a minimalistic CSV library that is based on C++ file streams. The initial decision was to based on my Elmax file library for UTF-8 but that is a monolithic library! To keep things small, file streams is chosen. File streams are not fast compared to C File APIs but MiniCSV should be adequate for most tasks which involves parsing small CSV files. MiniCSV is hosted at Google Code. Thank you for reading!

License

This article, along with any associated source code and files, is licensed under The MIT License

About the Author

SV Wong
Software Developer
Singapore Singapore

Currently into areas like 3D graphics and application security. Hoping to revisit the cryptography and design pattern topics if time permits.


Comments and Discussions

 
QuestionSome thoughts on how to improve the lib PinmemberMember 102726144-Apr-14 0:58 
AnswerRe: Some thoughts on how to improve the lib PinpremiumWong Shao Voon7-Apr-14 19:41 
Questiongreat article! PinmemberFrank Reidar Haugen2-Apr-14 6:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140721.1 | Last Updated 2 Apr 2014
Article Copyright 2014 by SV Wong
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid