|
Is 925mb not quite one bit?
|
|
|
|
|
have you tried Notepad then?
same thing, just with a little less of that annoying intellisense. (after all xml is "human readable", no help should be required.)
Message Signature
(Click to edit ->)
|
|
|
|
|
It's probably to encourage smaller source files: 10MB of code in one file is probably a little too big ...
Why on earth do you want to load a 1GB XML anyway? That's far too big for me to want to read!
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
That's a problem with XML; it has to be read in its entirety before you can do anything with it.
At work I receive a 6GB XML file every stinking day and I have to use SSIS to get it into a database.
I'm beginning to prefer JSON, which I can read one object at a time (provided the outer-most value is a array of objects).
However, I have written a fairly simple XML file splitter so I can make smaller files from one big one when I need to find out where a problem (e.g. non-well-formed XML) exists.
|
|
|
|
|
That's not entirely true. You can use XmlReader , and it sequentially reads a node at a time (it's slower than XDocument , and you can't go reverse read direction, but it solves my issue).
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Combining XmlReader and LinqToXML, the memory consumption never goes above 350mb, and it takes about 45 minutes to run though the sample files (this includes adding the data to the database, one record at a time (426,000 records).
When I add a dash of TPL, it only takes about 9 minutes to process the same three files.
I think I could get it even faster if I inserted multiple records per query, but I'm tired of dickin' with it.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Well, for the most part I'm limited to built-in SSIS components.
Potentially I could write something custom, as I have for JSON and CSV files (ones which aren't stable enough for the flat-file components).
|
|
|
|
|
I just wanted to see what was in it.
They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation.
I have to parse these files and store the results in our database. Using just XDocument , I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination of XmlReader and LinqToXml .
Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable.
<rant>
I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive.
</rant>
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
John Simmons / outlaw programmer wrote: <rant>
I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive.
</rant>
It's been a while I've seen you assert yourself on CP. I kinda miss the ol' smackdowns.
|
|
|
|
|
Ouch! That's a stupid amount of data, particularly for a text-based transfer mechanism. Have these people never heard of databases?
On the bright side, at least it's not XLSX?
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: On the bright side, at least it's not XLSX?
dunno, xlsx isn't so bad and easier (ok lazier) to debug if there's bad data elements jus load into excel and scroll down to the line with the issue.
if you're suggesting interop (i.e. slower than molasses) that's a complete other issue, and there are way way faster [read & write] alternatives.
worst comes to worst can unpack the xlsx and viola, it's xml (pretty much exactly the same).
(not criticizing, just unsure why you think it's any worse.)
Message Signature
(Click to edit ->)
|
|
|
|
|
Have you ever tried to load 1GB into Excel?
(And bear in mind that XLSX is packaged, zipped, XML - and thus slower and more memory hungry than "naked" XML)
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: Have these people never heard of databases?
That's our job.
Got memory consumption down to no more than 350mb and it only takes 9 minutes to process my three sample files, for a total of 426,000 records. I'm going to look awesome on Tuesday. Upside, this app replaces a large perl script that was doing the same job, and everyone in the shop can maintain it because - well - it's not perl.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
John Simmons / outlaw programmer wrote: I'm going to look awesome on Tuesday. That only if the other morons people appreciate your work, not the first time awesome tools that are real improvements get dumped because a couple of idiots co-workers say:
- We have always done it this way
- That is not going to work (without even giving a try)
- Or similar crap arguments...
and not even give a damned "Thank you"
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
I've already gotten thank-yous for this. They're grateful that they don't have to maintain that monster perl script anymore.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
If you just want to take a peek in the file you can use the Lister that comes with Total Commander.
It's still immediate on a 20GB file. No specific support for XML though, it's treated the same as any file.
|
|
|
|
|
I'd recommend UltraEdit. You can disable the "make automatic backups when opening files" and then you are able to open and work with very large files. Fast. That feature, and built-in hex edit that allow me to see everything, including BOM bytes in files makes it worth the license fee.
Just if you didn't know it - and needed something better than notepad and notepad++ for large files
Do you know why it's important to make fast decisions? Because you give yourself more time to correct your mistakes, when you find out that you made the wrong one. Chris Meech on deciding whether to go to his daughters graduation or a Neil Young concert
|
|
|
|
|
John Simmons / outlaw programmer wrote: I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. Just like at work, other people mess up, but you get the blame!
John Simmons / outlaw programmer wrote: I have more years in the industry than most people on CP have even been alive. That's no guarantee for actually being a good programmer.
For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same
|
|
|
|
|
Sander Rossel wrote: For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same
We don't get the files from programmers - we get them from security nazis.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
John Simmons / outlaw programmer wrote: we get them from security nazis. And on the other side of that barrier there is some poor sod producing the xml. Or it was designed in the 90s and they refuse to even consider changing something that works - sort of.
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|
|
a scan tool called Nessus generates the file. I know nothing about it, or it’s configurability where file generation is concerned.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
More likely it was designed in the 90's when the log data was small and (the then new and cutting edge) XML made some sense. But ... the developer who wrote that moved on, and file formats are boring, so the new guy just tested it worked in small scale and worked on the sexier stuff.
And now ... intrusion / vulnerability data has grown like everything else and it's just a silly decision with hindsight.
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I'm sure they're suffering from the same thing we all have to deal with - management that doesn't (want to) see a reason to re-architect the app that generates the files.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
Methinks you need to rethink your XML design
CQ de W5ALT
Walt Fair, Jr., P. E.
Comport Computing
Specializing in Technical Engineering Software
|
|
|
|
|
It ain't my design, and it won't be changing to anything better.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|