|
Just a really dumb question. You're sure you're looking at Release build, and not a Debug build? I'm not sure that that would even matter, unless MS provides an unoptimized libc for debugging?
Keep Calm and Carry On
|
|
|
|
|
I've tried building it in release under several different configurations (different architectures and optimizations) and I'm not getting much difference, leading me to believe strpbrk() is not optimized using simd unlike gcc's stdlib implementation
Real programmers use butterflies
|
|
|
|
|
You're not wrong...
Here's some code that scans through a 1GB string (finding a character at teh very end of it) with the four equivalent but different ways I could think of (std::string::find_first_of , std::string_view::find_first_of , std::find_first_of and strpbrk ):
#include <algorithm>
#include <chrono>
#include <cstring>
#include <iostream>
#include <string>
int main()
{
std::string s(size_t(1024) * 1024 * 1024, ' ');
s.back() = 'c';
auto start = std::chrono::steady_clock::now();
auto x = s.find_first_of("abc");
auto end = std::chrono::steady_clock::now();
std::cout << "std::string::find_first_of -> " << x << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
std::string_view s_as_view{s.c_str(), s.size()};
auto x1 = s_as_view.find_first_of("abc");
end = std::chrono::steady_clock::now();
std::cout << "std::string_view::find_first_of -> " << x1 << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
std::string needle{"abc"};
auto x2 = std::distance(std::begin(s), std::find_first_of(std::begin(s), std::end(s),
std::begin(needle), std::end(needle)));
end = std::chrono::steady_clock::now();
std::cout << "std::find_first_of -> " << x2 << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
auto y = std::distance(s.c_str(), strpbrk(s.c_str(), "abc"));
end = std::chrono::steady_clock::now();
std::cout << "strpbrk -> " << y << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
}
and here's the output when compiled with cl.exe -std:c++17 -Ob2 -O2 -Os -EHsc a.cpp and run on the i7-6820HQ in my work laptop:
std::string::find_first_of -> 1073741823 in 552 ms
std::string_view::find_first_of -> 1073741823 in 557 ms
std::find_first_of -> 1073741823 in 2741 ms
strpbrk -> 1073741823 in 2359 ms
That's about 1.8GB/s for the first two, and around 423MB/s for strpbrk . However, when compiled with gcc-10 (with the command g++-10 -o ./a a.cpp -O3 -std=c++17 ) on Ubuntu 18.04 (same laptop - I'm using WSL), I get this:
std::string::find_first_of -> 1073741823 in 3341 ms
std::string_view::find_first_of -> 1073741823 in 3563 ms
std::find_first_of -> 1073741823 in 715 ms
strpbrk -> 1073741823 in 122 ms
That ranges from 300MB/s for the first two to about 8.2GB/s for strpbrk ...
honey the codewitch wrote:
Does anyone know if GCC will work on Windows without some virtual env like MiniGW installed?
MinGW is actually OK - Cygwin is the 'gcc on Windows' that introduces nastiness. As this site says, "MinGW is a port of GCC to Windows. ... It produces standalone Windows executables which may be distributed in any manner." I'd use the distro from that site, or maybe one from this site
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
Naming things is hard, but really? strpbrk?
string pointer...? b? return? k? Really, what?
|
|
|
|
|
const char * strpbrk ( const char * str1, const char * str2 );
char * strpbrk ( char * str1, const char * str2 );
Locate characters in string
Returns a pointer to the first occurrence in str1 of any of the characters that are part of str2, or a null pointer if there are no matches.
The search does not include the terminating null-characters of either strings, but ends there.
"string pointer break" seems closest. The person that named it was probably drunk at the time.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: "string pointer break" seems closest Hah, amateur! I'd have gone for "spb"
|
|
|
|
|
Knowing the C stdlib it was probably already used for something.
Real programmers use butterflies
|
|
|
|
|
String Pointer BReaK. But you probably knew that. What you may not know is that this goes back to the dawn of Unix on a PDP with actual, real teletypes as I/O devices. Punching the keys on them was hard, so anything that could be abbreviated was. Thus cp, mv and ls rather than copy, move and list. Sure, only 2 chars each (abbrev, again!), but at the end of a day stabbing at the keys, it would make a difference ... if only meant you could pick up that beer without wincing.
Keep Calm and Carry On
|
|
|
|
|
|
Sander Rossel wrote: I thought all that old stuff was abbreviated to save memory.
Sort of. I seem to recall that early linkers had only 8 (or maybe 16) character limit for external identifiers, so that too played a part in the name of system functions.
Keep Calm and Carry On
|
|
|
|
|
I am sure that you are right.
My next question is how much time your typical application spends inside stpbrk(). I can imagine that you can set up testbeds where it exceeds one percent of the total CPU load. That is for a testbed.
Can you set up a true, user level, application solving a true user problem, where more than a single percent of the CPU time is spent inside stpbrk()? At a single percent, doubling the speed of spbrk() might speed up the application by a whooping half percent. Woooah!
Sure: I see that thirty or seventy-five such optimizations together might be significant, taken as a whole. So go ahead with the twenty-nine, or seventy-four, other optimizations. Then serve the pudding.
The proof of the pudding is the pudding you serve to the end user.
|
|
|
|
|
I am using real world data collected from an online repository at TMDB.com. I have 200kB of actual data from their repository, and then I synthesized 20MB of similar data in the same schema. I could have downloaded 20MB of JSON from TMDB.com. The only problem is then I'm downloading 20MB of data from tmdb.com and their rate limiting will hate me.
Now, for a real world scenario, where you're actually using TMDB's data, you'll likely end up mirroring their repository as you retrieve parts of it. For example, their repository contains every show and movie you'll find at IMDb.com, but in JSON format. Now if I only want shows from 2019, i can get those, but the point is this process is like fetch on request, and then cache. If you were to retrieve all the data then the entire repository would be mirrored locally.
It is from this mirror that i'd want to extract data.
So yes, that's a real world scenario. I even have a C# library that does this for tmdb.com already, but not using this json parser, which is in C++.
I've profiled it using the GNU profiler on linux, but nothing else.
Most of the function time is in strpbrk() at least for long scans.
More importantly, I know my actual throughput. I'm currently getting 2/3 of the throughput I got on a linux machine, on a windows machine whose hardware is maybe 10 times as fast or more.
And i know what function primarily impacts that throughput because I've already profiled.
It's skipToAny() which in the best case, uses strpbrk() - it can't on arduinos but it will on windows.
Real programmers use butterflies
|
|
|
|
|
Want to use MS VC++ under windows with VS Code?
Good luck. Microsoft in their infinite wisdom
A) Set it so you can't use MSVC without running a batch file first
B) Made the batch file completely unreadable. I can't even tell where it sets PATH. How do you even do that?
C) Is just generally is terrible.
D) Negates all the "Run VS Code here" shell extensions since they are useless because you need to launch code from the batch file in order for it to work.
Tell me: Why in the world would *anyone* think it was a good idea to install MSVC++, not put it in the PATH, and then make it near impossible for you to do it yourself? Why?
Do they *want* me to move away from Windows for all my C++ development?
Real programmers use butterflies
|
|
|
|
|
Well, possibly they want you to move away from C++ for all your Windows development ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Probably. I just opened a nastygram of an issue over at the VS Code C++ extension's github repo.
This is just unacceptable.
Real programmers use butterflies
|
|
|
|
|
Hmmm, All you need to do is execute vcvarsall.bat to setup your environment?
honey the codewitch wrote: Tell me: Why in the world would *anyone* think it was a good idea to install MSVC++, not put it in the PATH Because most C++ developers are using multiple tools and compilers (and multiple compiler versions). Keep in mind that Visual Studio allows you to compile with older versions of CL and ancient linkers.
|
|
|
|
|
Yes I know how to do that.
That is not the problem.
The problem is that that kills workflow. I can no longer click on my project folder and go "Open with VS Code" because Microsoft stinks.
If microsoft didn't stink (like that will ever happen) they would run vcvars from inside VS Code when you're using the C++ extension
Adding: "people use ancient compilers" isn't really an acceptable justification. Workflow should not be killed when using the standard compiler just because you might use an ancient one. You can OVERWRITE environment variables, after all. Linux gets it. Microsoft is clueless.
Real programmers use butterflies
|
|
|
|
|
Well,
VS Code is open sourced under MIT license so you are free to modify the behavior. Or you can open an issue to request a new feature. It sounds like a great feature to add to the VS Code C++ extension.
|
|
|
|
|
I've opened an issue already.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: I've opened an issue already. Great.
In the old days I would always manually remove the build tools from the %PATH% environment variable.
I don't know how long you've been developing with C++ on Windows but in the old days (90's- 2000's) the build tools were added to the %PATH% environment variable. But it was causing alot of problems because developers would install the Windows SDK which had it's own compiler and linker. Also device driver developers would install the DDK which yet again had it's own compiler and linker. Then there were some guys (like me) that would have VC6,VS2005,VS2008,VS2010,VS2012.NET,VS2013 all installed on the same workstation. I was so happy when VS2015 allowed me to compile with older versions. It meant that I didn't have to install 5 different versions of Visual Studio.
Best Wishes,
-David Delaune
|
|
|
|
|
Randor wrote: which had it's own compiler and linker
I think I see the problem right there. I wonder why microsoft didn't?
Randor wrote: Then there were some guys (like me) that would have VC6,VS2005,VS2008,VS2010,VS2012.NET,VS2013 all installed on the same workstation.
Sane thing (meaning not in the cards for MS): set the env vars to the latest compiler, and allow the user to run a batch file to set the env vars for the older compilers. Better yet, make the latest compiler support older compilation**
** which would have been easier if microsoft hadn't spent years ignoring the C++ standard
Insane thing: Make everyone's life harder by not having sane defaults, and by using crap compilers for years before finally deciding that standards matter.
Real programmers use butterflies
|
|
|
|
|
It's been a really long time ago but I believe with VS2005 there was a post-install step "Add build tools to environment" or some such. It caused alot of problems because developers would install VS2005 *after* VS2008/VS2010 and then they would go and compile the boost library or something and BJAM would use the older VS2005 compiler.
Anyway what you are proposing is more viable today now that VS integrates the older build tools.
|
|
|
|
|
Yeah, see I wouldn't have made that decision. I would have included "Add build tools to environment" with every version, and made it replace the old PATH variables and such. More work, but way better in the end. I want to say (but it's a guess) that this is what happens when GCC is installed, or something similar, but I could be wrong.
Real programmers use butterflies
|
|
|
|
|
gcc (on Linux) generally installs different versions as symlinks in /usr/bin to architecture/version specific executables. Each symlink has the version appended - this is what my /usr/bin contains:
lrwxrwxrwx 1 root root 22 Dec 4 2019 /usr/bin/gcc-7 -> x86_64-linux-gnu-gcc-7
lrwxrwxrwx 1 root root 22 Mar 10 2020 /usr/bin/gcc-8 -> x86_64-linux-gnu-gcc-8
lrwxrwxrwx 1 root root 22 Apr 23 2020 /usr/bin/gcc-9 -> x86_64-linux-gnu-gcc-9
And then gcc (with no version) is a symlink to one of the versioned symlinks (e.g. gcc -> gcc-8 ). All this means that using a specific (major) version of gcc is pretty simple...
If you want to compile 32-bit code rather than 64-bit, then install g++-multilib and compile with the -m32 flag.
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
Randor wrote: Then there were some guys (like me) that would have VC6,VS2005,VS2008,VS2010,VS2012.NET,VS2013 all installed on the same workstation
As far as I'm concerned, VMs are the best thing to have been invented to keep my build systems clean and manageable. Separation of concerns and that type of thing.
My understanding is that containers are intended to take things one step further. Maybe it's because I'm now set in my ways, but nothing I've read so far about containers actually seem to make things simpler still to make me want to change my methodology.
Same with multibooting. Why bother? In this day and age, as far as I'm concerned, rebooting is a sin. If someone has to abandon what they're working on to boot another OS, IMO they're doing it wrong.
|
|
|
|
|