The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
I barely ever use JSON and have never written (nor am I likely to) a parser, but what I do know is that I can't answer a question like this without knowing the context.
- Is it more important to be fast 100% of the time and permit errors 1% of the time, or to be 100% reliable at the cost of a few percentage points in speed? (i.e. how critical is the data, and how critical is speed? This is a pretty common trade-off)
- Is the data coming from another system I / we have written, or a trusted partner, or from Joe Public? Is the data machine generated or hand-crafted?
I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.
I load 5GB of JSON with my own parser. It takes about eight minutes.
I load 80GB of JSON with my own parser -- this dataset has tripled in size over the last month. It's now taking about five hours.
These datasets are in no way comparable, I'm just comparing the size-on-disk of the files.
I will, of course, accept that my JSON loader is a likely bottleneck, but I have nothing else to compare it against. It seemed "good enough" two years ago when I had a year-end deadline to meet.
I may also be able to configure my JSON Loader to use BulkCopy, as I do for the 5GB dataset, but I seem to recall that the data wasn't suited to it.
At any rate, I'm in need of an alternative, but it can't be third-party.
I don't know what SSIS does internally, but I doubt it loads the entire XML document into memory all at once.
I don't know how much RAM or how many processors the servers have.
I ran the XML load on my laptop, 16GB of RAM and usage increased by only four percent.
People are religious about never reinventing the wheel, but it's not always such a bad thing - it depends on the wheel.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
Some people have to work on air gap networks, where you can not copy anything to the network. It comes configured with a couple of approved things like the operating system, and whatever comes bundled with say Visual Studio 2015, and that's it. Nothing else gets in. With good reason too, e.g. see supply chain poisoning like the recent SolarWinds incident.
I'm pretty trusting.
When someone says they're going to give me JSON I assume they'll give me JSON.
So I'd go for it and worry about validation when the party that should be giving me JSON isn't giving me JSON.
So far that has worked pretty well.
In practice, these kind of things rarely break.
You either get JSON or no JSON at all, but rarely (or even never) a badly formed JSON.