Click here to Skip to main content
14,846,874 members
Articles / Programming Languages / Markdown
Article
Posted 19 Sep 2016

Stats

25.1K views
324 downloads
9 bookmarked

C++17: string_view Conversion to Integral Types

Rate me:
Please Sign up or sign in to vote.
2.94/5 (7 votes)
5 Jun 2018CPOL3 min read
Implementing string_view conversion to integral types using Boost Spirit Qi v2

Table of Contents

Rationale

Before any discussion is started on string_view, we have to revisit a C API: strtok() whose purpose is to split the char array into tokens.

  1  char * strtok ( char * str, const char * delimiters );
  1  #include <cstdio>
  2  #include <algorithm>
  3  
  4  int main()
  5  {
  6      using namespace std;
  7  
  8      char str[] = "Apple,Orange,Mango";
  9      /* dup_arr will contains str later */
 10      char dup_arr[sizeof(str)];
 11      const char s[2] = ",";
 12      char *token;
 13  
 14      /* display str address */
 15      printf("str address:%p\n", str);
 16  
 17      /* get the first token */
 18      token = strtok(str, s);
 19  
 20      /* walk through other tokens */
 21      while (token != NULL)
 22      {
 23          /* display token */
 24          printf("%s\n", token);
 25          /* display token address */
 26          printf("token address:%p\n", token);
 27  
 28          /* copy str into dup_arr */
 29          memcpy(dup_arr, str, sizeof(dup_arr));
 30          /* replace the null char with \x35 which is # */
 31          std::replace(begin(dup_arr), end(dup_arr)-1, 0, 35);
 32          /* display dup_arr */
 33          printf("%s\n", dup_arr);
 34  
 35          /* get next token */
 36          token = strtok(NULL, s);
 37      }
 38  
 39      return(0);
 40  }

The output is shown below. I added ^ to indicate where the token is pointing at in the str. I use # represents the null terminator since it is non-printable character.

  1  str address:008FF704
  2  
  3  Apple
  4  token address:008FF704
  5  Apple#Orange,Mango
  6  ^
  7  
  8  Orange
  9  token address:008FF70A
 10  Apple#Orange#Mango
 11        ^
 12  
 13  Mango
 14  token address:008FF711
 15  Apple#Orange#Mango
 16               ^

strtok has 2 problems: it modifies str parameter but its redeeming point is it is very fast as it does not have to allocate string for token. The other problem is it cannot split string with empty token: Example: ",," because that would turn into "##" and cause strtok to return null which signal to the client code prematurely that it has reached the end of str. This is where C++17 string_view comes to rescue: string_view contains a char pointer and a size as data members ,and includes many of the useful member functions which std::string has. Its length does not include null terminator, meaning a string_view does not have to be null terminated, making it a perfect candidate to write a C++17 constness correct strtok. But this article is not about writing string_view version of strtok.

This is in-place string modification is widely used in Fast XML DOM parsers like RapidXML and Pugixml. RapidJSON is a JSON parser that make use of this feature as well.

  1  // original xml text
  2  <Fruit name="Orange" type="Citrus" />
  3  // mutated xml text
  4  <Fruit#name#"Orange# type#"Citrus# />
  5   ^     ^     ^       ^     ^

XML parser cannot totally avoid string allocation if the text is strictly immutable and needs to be unescaped(shown below) or the text is modified to be longer.

  1  <Food name="Ben &amp; Jerry" type="Ice Cream" />
  2  
  3  "Ben &amp; Jerry" needs to be unescaped to "Ben & Jerry"

Conversion to float and integer

For conversion, we use Boost Spirit Qi. str_to_value is a overloaded template function which works for std::string, string_view and char array (not char pointer). For demo purpose, we use Boost string_ref because string_view is not yet available in Visual C++ yet. For simplicity, other overloads are not shown. Reader can view them in str_to_value.h. float, short, long and long long, together with their unsigned counterparts are supported.

  1  #include <string>
  2  #include <iostream>
  3  #include <boost/utility/string_ref.hpp> 
  4  #include <boost/spirit/include/qi.hpp>
  5  
  6  template<typename string_type>
  7  inline bool str_to_value(const string_type& src, double& dest)
  8  {
  9      namespace qi = boost::spirit::qi;
 10  
 11      return qi::parse(std::cbegin(src), std::cend(src), qi::double_, dest);
 12  }
 13  
 14  template<typename string_type>
 15  inline bool str_to_value(const string_type& src, int& dest)
 16  {
 17      namespace qi = boost::spirit::qi;
 18  
 19      return qi::parse(std::cbegin(src), std::cend(src), qi::int_, dest);
 20  }
 21  
 22  int main(int argc, char *argv [])
 23  {
 24      boost::string_ref srd("123.456");
 25      double d = 0.0;
 26      if (str_to_value(srd, d))
 27      {
 28          std::cout << d << std::endl; // display 123.456
 29      }
 30  
 31      boost::string_ref srn("123");
 32      int n = 0;
 33      if (str_to_value(srn, n))
 34      {
 35          std::cout << n << std::endl; // display 123
 36      }
 37  
 38      return 0;
 39  }

Boost Spirit Qi Benchmark

C++ String to Double Benchmark (Looping 1 million times)

Version 1.1.1 double benchmark

Latest double benchmark which fixes crack_atof scientific notation conversion problem and improves its performance by 10% and puts it on par with fast_atof.

  1                atof:  100ms
  2        lexical_cast:  648ms
  3  std::istringstream:  677ms <== Probably unfair comparison since istringstream instaniate a string
  4           std::stod:  109ms
  5         std::strtod:   96ms
  6          crack_atof:    7ms
  7           fast_atof:    7ms <== do not use this one because conversion is not correct.
  8        boost_spirit:   17ms <== reported to be inaccurate in some case
  9        google_dconv:   38ms
 10     std::from_chars:   71ms

C++ String to Integer Benchmark (Looping 10 million times)

  1                atol:  243ms
  2        lexical_cast:  952ms
  3  std::istringstream: 5338ms
  4          std::stoll:  383ms
  5         simple_atol:   74ms
  6          sse4i_atol:   72ms
  7        boost_spirit:   78ms
  8     std::from_chars:   59ms

Summary

  • Originally string_view is called string_ref in Boost
  • String view is not null-terminated, so atoi() cannot be used
  • Use Boost Spirit Qi
    • Can be used for std::string, char array. char ptr is not supported
    • Or any class that has cbegin() and cend()
    • Caveat: somebody report Boost Spirit Qi floating point conversion are not accurate

Related Source Code Repositories are below.

History

  • 1st Oct 2016: First release
  • 5th Apr 2017: Updated string-to-float benchmark with strtod and Google double conversion
  • 6th June 2018: Uploaded floatbench 1.1.0 which fixes crack_atof() scientific notation conversion problem and its performance improves by 10%. Thanks to Tian Bo.
  • 7th June 2018: Uploaded intbench 1.1.0 which includes std::from_chars into the benchmark. std::from_chars requires C++17 support and VC++ have problems compiling it for floating point conversion, thus it is only added in intbench for now.
  • 27th Oct 2018: Uploaded floatbench 1.1.2 which includes std::from_chars from VS 2018 Update 15.8 into the benchmark.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Shao Voon Wong
Software Developer (Senior)
Singapore Singapore
Shao Voon is from Singapore. CodeProject awarded him a MVP in recognition of his article contributions in 2019. In his spare time, he prefers to writing applications based on 3rd party libraries than rolling out his own. His interest lies primarily in computer graphics, software optimization, concurrency, security and Agile methodologies.

You can reach him by sending a message on CodeProject or at his Coding Tidbit Blog!

Comments and Discussions

 
Questionvery useful! Pin
Member 1466604223-Nov-19 19:42
MemberMember 1466604223-Nov-19 19:42 
QuestionLong Pin
degski20-Sep-16 20:17
Memberdegski20-Sep-16 20:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.