Click here to Skip to main content

Welcome to the Lounge

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

Re: Adventures in Async

Richard Deeming29-Jun-20 0:30

Richard Deeming

29-Jun-20 0:30

Are you loading the entire file into an XmlDocument or XDocument? It might be better to stream the file using the XmlReader class. It's a lot more work for you, but it should improve the performance.

How to perform streaming transform of large XML documents (C#) | Microsoft Docs[^]

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Adventures in Async

Jörgen Andersson29-Jun-20 0:38

Jörgen Andersson

29-Jun-20 0:38

I'm using an XMLReader to chop up the filestream into an XDocument for every record.
Using an XMLReader all the way became to much work, handling null nodes and such stuff.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

#realJSOP29-Jun-20 2:18

29-Jun-20 2:18

I wrote a command line app that imports a NESSUS security scan XML data file - the largest I've seen to date is about 8gb. We import the data into a SQL server database. It's not multi-threaded at all that I recall. I do remember that the file was too big for XDoument to work.

I feel your pain.

".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

Re: Adventures in Async

Greg Utas29-Jun-20 0:42

29-Jun-20 0:42

If the parsing can be partitioned into n subproblems, where n is the number of cores, then I would consider creating n daemons and locking each one into its own core. If any of them block, offloading the blocking operations to thread pools might help.

Partitioning the problem will help to reduce semaphore contention and cache collisions.

But I haven't had to populate a large database this way, so I could be full of shite. Big Grin | :-D

Big Grin | :-D

Robust Services Core | Software Techniques for Lemmings | Articles

Re: Adventures in Async

Jörgen Andersson29-Jun-20 0:46

Jörgen Andersson

29-Jun-20 0:46

This is exactly what I didn't want to have to learn. Laugh | :laugh:

Laugh | :laugh:

At least all proper databases already handle parallel execution properly.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Garth J Lancaster29-Jun-20 1:04

Garth J Lancaster

29-Jun-20 1:04

yup - learn't the hard way, 1st identify where the program uses it's resources

Re: Adventures in Async

Ron Anders29-Jun-20 3:16

29-Jun-20 3:16

More proof that some people have real problems.

So stop complaining people, you could be Jörgen today.

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:52

29-Jun-20 4:52

Ron Anders wrote:
So stop complaining people, you could be Jörgen today.

...and have no toilet paper.

Jeremy Falcon

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:57

Jörgen Andersson

29-Jun-20 4:57

Isn't it enough if I'm being me?

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

dandy7229-Jun-20 5:33

29-Jun-20 5:33

I was vaguely reminded of an episode of Home Improvement, where one of the kids got himself in some trouble, so one of the brothers says "I wouldn't wanna be you right now", and the other responds with "I wouldn't wanna be you, ever".

Re: Adventures in Async

Stuart Dootson29-Jun-20 3:45

29-Jun-20 3:45

I was going to say - size of the work done in each task is key...

But the underlying technology can also have an effect, by reducing the cost of task creation. If you're using a work queue on top of a thread pool, you're not creating a thread for each task, you're pushing/popping tasks on and off a queue.

I created a little tool to detect duplicate files using that sort of parallelism. It contains two main areas of parallelism:

The file search library that I use adds a new task for each directory it sees. Each task processes just the files that are immediate children of the directory the task was created for.
The detection of duplicates is split so that each task hashes a group of files that have the same size. This is performed using a data parallelism library, which makes parallelising things very easy.

The amount of speedup I get isn't anywhere near the number of processor cores in use (I get a factor of just over two speedup on an eight core machine), but I think that the amount of IO being done serialises the processing to a certain degree. Benchmarking ripgrep, another tool that uses similar parallelism, shows that running with 8 threads (on 8 logical/4 physical cores) is just over 3x faster than using 1.

Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

Re: Adventures in Async

abmv29-Jun-20 4:31

29-Jun-20 4:31

why are u even parsing xml files and that too 80gb !!! and then saving it to the database !!! .. u could try to use the sql server bulk import tools to do this and avoid programming such stuff all together...

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:54

Jörgen Andersson

29-Jun-20 4:54

Because I want to have the data extracted into normalized tables.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

abmv29-Jun-20 5:06

29-Jun-20 5:06

if u have sql server there is SSIS anyway... Importing XML documents using SQL Server Integration Services

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:13

Jörgen Andersson

29-Jun-20 5:13

I've missed out on that possibility completely.

A bit late now, but I'll take a look at it anyway. Thumbs Up | :thumbsup:

Thumbs Up | :thumbsup:

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:23

Jörgen Andersson

29-Jun-20 5:23

I think I see the reason why I missed out on that possibility, it does not seem to exist on SQL Server 2012.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

abmv29-Jun-20 5:40

29-Jun-20 5:40

if its the dev env u have you can run the sql server setup and select the components needed to get SSIS services and vs based client tools.is on the iso or dvd etc.. .. also there is OPENROWSET Simple way to Import XML Data into SQL Server with T-SQL .....

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:24

Jörgen Andersson

29-Jun-20 5:24

I think I see the reason why I missed out on that possibility, XMLSource does not seem to exist on SQL Server 2012.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:51

29-Jun-20 4:51

Welcome to the cool club though. Ladies can't resist an async coder. #science

Jeremy Falcon

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:55

Jörgen Andersson

29-Jun-20 4:55

That's seriously the best answer today. Big Grin | :-D

Big Grin | :-D

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:57

29-Jun-20 4:57

Big Grin | :-D

Jeremy Falcon

Re: Adventures in Async

dandy7229-Jun-20 5:35

29-Jun-20 5:35

Jeremy Falcon wrote:
Ladies can't resist an async coder.

What for...certainly not her and her sister...

Re: Adventures in Async

Nelek29-Jun-20 8:54

29-Jun-20 8:54

I can't help, but reading "parse" in the body of the message...

This is clearly a case for... HONEY THE @CODE-WITCH tatatataaaaaaa Laugh | :laugh:

Laugh | :laugh:

Laugh | :laugh:

Laugh | :laugh:

Laugh | :laugh:

M.D.V.

Wink | ;)

If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.

Re: Adventures in Async

honey the codewitch29-Jun-20 10:02

honey the codewitch

29-Jun-20 10:02

are you sure the bottleneck isnt disk i/o?

Real programmers use butterflies

Re: Adventures in Async

Jörgen Andersson29-Jun-20 21:08

Jörgen Andersson

29-Jun-20 21:08

Yes, just to make sure.
I've made test runs just reading an ID from every record which goes twice as fast, and that's on a slow HDD here at home.
And when I move this to a server the disks will be considerably faster.

Wrong is evil and must be defeated. - Jeff Ello

General News Suggestion Question Bug Answer Joke Praise Rant Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.