std::string, please remove all whitespace from it. How would you do it? Despite its seeming simplicity, it’s an interesting question, because it can be done in so many ways.
To start with, how do you identify whitespace? Let’s have a look at some different approaches (all of which I've seen in the wild):
bool iswhitespace1(char c)
return (c == ' ') || (c == '\t') || (c == '\r') || (c == '\n');
bool iswhitespace2(char c)
static const std::string spaces(" \t\r\n");
return (std::string::npos != spaces.find(c));
bool iswhitespace3(char c)
bool iswhitespace4(char c)
static const std::locale loc;
return std::isspace(c, loc);
If we were to run through these four functions with values of c from 0 to 255, the first two would produce the same result, and the latter two would (probably) produce the same result, but those wouldn’t be the same as for the first two.
There are two reasons for this. First of all, the C and C++
isspace functions include a couple of often forgotten whitespace characters – the vertical tab (
'\v', 0x0b) and the form feed (
'\f', 0x0c). They don’t tend to see that much use nowadays, but are still defined as whitespace in both the C and C++ standards.
The second reason that results from
isspace may differ from a hard-coded solution is that they are both dependent on what locale is in use. A changed locale will never indicate that any of the standard list of whitespace characters (
" \t\r\n\v\f") is not a whitespace character, but may indicate that some further characters are also whitespace.
Since the functions already exist in the standard, it’s rather silly of us to write our own, so let’s just use
isspace. Unless you muck about and change locales (and let’s not, if we can avoid it), both the C and C++ version behave the same way, so which you use is up to you.
Knowing how to identify whitespace characters, we only need to remove them. How do we do that? Well, that depends on whether we want to modify the string, or create a copy. In either case, let’s avoid the simplistic, completely hand-made solutions again:
std::string::size_type p = 0;
while (p < str.size())
for (std::string::size_type i = 0; i < str.size(); ++i)
output += str[i];
Both these solutions work, but there are well established and standardised ways of doing these things using algorithms:
No? Ok, let’s break it up. The functions in the C++
<algorithm> header generally work on three types of parameters: iterators, predicates and function objects (aka functors). In the code above, we’re not using any functors, so we’ll put them aside for the moment.
&::isspace – predicate. This is simply a pointer to a function that takes one parameter and returns a
bool, in this case indicating whether a given character is whitespace or not, as discussed earlier.
str.begin(), str.end() – iterators, in this case indicating where to start and stop running the algorithm. We want to go through the whole string, so we start at the beginning, and end at the, well, end.
str.erase(std::remove_if(...), str.end()); – this is the erase-remove idiom. Because the
remove_if function only takes iterators, it can’t actually remove anything. What it can do is re-shuffle, and put all the elements (or characters in the string, in this case) that match the predicate (is whitespace) at the end of the given range. It then returns an iterator that gives the first position of these predicate-fulfilling characters. This iterator is then given to the
erase member function of the
string, as the start of the characters to erase, and
str.end() as the end.
std::back_inserter – iterator. This is a handy little helper that gives an output iterator for the given container (i.e. an iterator that can be used to insert elements in a container). (Unfortunately, Microsoft’s documentation still says the container given to it must be a
std::deque, which is not true. The only thing required is that the container has the member function
std::string does. Given how popular their development tools are, it’s surprising this hasn’t been amended.)
std::remove_copy_if – this is an amazimgly poorly named function, which ought to be called
std::copy_if_not. What it does is: go through the range given (i.e.
end), call the predicate (i.e.
isspace) with each element in the range, and if the predicate returns true, don’t copy it. It doesn’t remove anything from the input range (it can’t, as it only has iterators), and in fact doesn’t change anything at all on the range it’s given. I guess that conceptually, it removes an element for which the predicate is
true from a list of elements to copy. Except, there is no such list. In short: horrible name, copies elements not fulfilling the predicate.
So, there we are. Two simple and useful functions to remove whitespace:
void remove_whitespace(std::string& str)
void remove_whitespace(const std::string& input, std::string& output)
(Of course, if you really want to use
std::locale, things start to get a bit… well, complicated. I might return to that at some later point.)
Filed under: Code