Click here to Skip to main content
15,881,709 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more: , +
I have few folders, each of which has millions of xml files. I need to read all the xml files from these folders and update database, which has to happen very quickly.

What I have tried:

Currently I am throwing multiple threads on the folder and processing the xml files.
This works well for less volume, but not for millions of xml files.
I have few high priority xml files in between these millions of xml files, which I do not know. These needs to be updated fast. Since I do not know which are the high priority files I am forced to update everything.

The area I am looking to improve is updating the database process. When all the threads are trying to update the database simultaneously, there is a performance issue.

Any better solution is appreciated.
Posted
Updated 26-Sep-17 20:28pm
v5
Comments
Mehdi Gholam 26-Sep-17 10:36am    
Which part of the current process is slow?
Richard MacCutchan 26-Sep-17 11:07am    
Millions of files will take millions of time intervals to process. And running in multiple threads is not likely to have a significant impact on the processing time.

1 solution

Your computer only has a limited number of physical & logical cores. Only a limited number of parallel processes can be executed. Throwing more threaded processes than cores at a problem can degrade performance not improve it. You need to throw more hardware (processing cores) to improve performance. Either more expensive CPU with more cores or more physical computers. Storage and how you use it can also slow you down.

If you don't want to invest in physical hardware, then you could use a cloud service provider like Azure - pay for only the cpu time that you use. Set the type of machine, the number of machines required, then let it run...
 
Share this answer
 
Comments
Member 11936418 27-Sep-17 2:28am    
Thank You @Graeme_Grant. I totally agree with your answer. This is one of the option I can consider in near future, moving the server right now is not feasible for me.
What I am current looking at is how to speed up the process of updating database when many threads are hitting the same database.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900