If, cutting short the time was so simple, why do you think we have a subject or course titled as, "
Data Structures and Algorithms[
^]"? :-) The data is expected to grow, but you as the database administrator or system administrator are required to make sure that the logic runs in the same way it was expected to, not to continue to keep running for weeks.
There are many ways to cut short that time. I would give you all of those points in a list. But, I hope you will try to follow them because there is no "other way"! You are expected to follow these rules to improve the time required.
1. Change the language! Python is not at all faster. Did you know that Python is interpreted language? Which makes it much slower.
- Use something like C++. It has same paradigm and I am sure it would have a prebuilt library already available on CodeProject, GitHub or anywhere else.
2. Change how you arrange the data.
- Data's presentation is very much important. Small files and large files also cause a problem. After each file, program has to clean the RAM and input the next file. Find an alternate to having small chunks of files.
3. Increase the CPU speed. You don't want to do the job that a supercomputer does, using a personal computer. That doesn't make any sense.
4. Think again!
The most common thing to use is the common sense. Your data spans over 500GB. Why? Also, when you want to query the data, why do you want to query all of it? These are a few things that you should consider and think while updating the data structures, while updating your algorithms and why updating the system hardware.
Otherwise, this time can be cut short with a maximum output of 1 day less (
which is still 6 days!) and nothing more.