|
This lib i wrote was originally in C# and I ported it. I originally designed it (the C# version) to do bulk loads of data - basically exactly what you're doing but perhaps a lot more of it.
Real programmers use butterflies
|
|
|
|
|
Whichever lets me stream the file into a database, without clogging any system resources.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
I would want the "text" of the JSON to be well-formed (proper braces, quotes, commas, colons, brackets, etc.) but as the to the contents, whether they map or not to the backing entity doesn't much matter, though obviously things would break of a collection is expected and it's not a collection, or vice-versa. Same with automatic data type conversion.
So, yeah, basically I would want the "defensive driver" approach.
|
|
|
|
|
So if it wasn't, you'd like to error as soon as you catch it, even if it meant a slower parse is what I'm hearing.
Real programmers use butterflies
|
|
|
|
|
|
It depends on context, of course.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
I barely ever use JSON and have never written (nor am I likely to) a parser, but what I do know is that I can't answer a question like this without knowing the context.
- Is it more important to be fast 100% of the time and permit errors 1% of the time, or to be 100% reliable at the cost of a few percentage points in speed? (i.e. how critical is the data, and how critical is speed? This is a pretty common trade-off)
- Is the data coming from another system I / we have written, or a trusted partner, or from Joe Public? Is the data machine generated or hand-crafted?
|
|
|
|
|
|
First of all this is a hypothetical. Second, hosting the .NET CLI in C++ just to use a .NET package from C++ to parse a little JSON seems heavy handed and horribly inefficient.
Plus C# won't run on arduinos.
Real programmers use butterflies
|
|
|
|
|
|
I should add that I originally wrote it in C# and then ported it to C++
Why did I write it in C#? Because I didn't know about NewtonSoft's JSON on the day I wrote it and then when i found out about it it turns out NewtonSoft's pull parser sucks and is slow.
I'm glad I did.
People are religious about never reinventing the wheel, but it's not always such a bad thing - it depends on the wheel.
Real programmers use butterflies
|
|
|
|
|
we use Newtonsoft with all of our Web APIs, etc. never had any noticeable issues with performance.
I guess if you are parsing big json files then, perhaps that is an issue, but we don't do that. so....
|
|
|
|
|
If you ever find yourself bulk loading JSON dumps into a database, you can do better. Hell, you could use my tiny JSON C# lib which is around here at CP somewhere.
Real programmers use butterflies
|
|
|
|
|
Tell me when you make a parser for XML.
I'm loading 80 GB into a database every week, and XML (or rather the built in tools) seriously isn't made for that.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
will do!
Real programmers use butterflies
|
|
|
|
|
I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.
I load 5GB of JSON with my own parser. It takes about eight minutes.
I load 80GB of JSON with my own parser -- this dataset has tripled in size over the last month. It's now taking about five hours.
These datasets are in no way comparable, I'm just comparing the size-on-disk of the files.
I will, of course, accept that my JSON loader is a likely bottleneck, but I have nothing else to compare it against. It seemed "good enough" two years ago when I had a year-end deadline to meet.
I may also be able to configure my JSON Loader to use BulkCopy, as I do for the 5GB dataset, but I seem to recall that the data wasn't suited to it.
At any rate, I'm in need of an alternative, but it can't be third-party.
Next year will be different.
|
|
|
|
|
PIEBALDconsult wrote: I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.
How much memory do you have?
Early tests of mine ran out of memory.
Or have I done something wrong?
Mine takes an hour for 85GB XML, but that uses bulkcopy. Early versions without bulkcopy indicated that it would indeed take 5-6 hours.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
I don't know what SSIS does internally, but I doubt it loads the entire XML document into memory all at once.
I don't know how much RAM or how many processors the servers have.
I ran the XML load on my laptop, 16GB of RAM and usage increased by only four percent.
|
|
|
|
|
Ok, then I had some other problem, I might take another look at SSIS then.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
|
If people didn't constantly reinvent the wheel, we'd still be using wooden wheels several feet in diameter.
Use the right wheel for the right job. Don't try to adapt to an existing wheel if it just doesn't do the job.
|
|
|
|
|
agreed!
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote:
People are religious about never reinventing the wheel, but it's not always such a bad thing - it depends on the wheel.
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
honey the codewitch wrote: hosting the .NET CLI in C++ just to use a .NET package from C++ to parse a little JSON seems heavy handed and horribly inefficient.
If you're using C++, why not use a C++ JSON library such as Modern JSON, RapidJSON or simdjson?
Or if you do develop your own library, you might be interested to look at simdjson's 'On Demand' parsing approach...
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
They use too much memory and can't target IoT. of them simdjson shows the most potential but it still isn't about 71 bytes to do an episodes query off of a tmdb.com show data dump
Real programmers use butterflies
modified 18-Dec-20 6:46am.
|
|
|
|