Click here to Skip to main content
15,904,497 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm trying to figure out the best way to read a large text file (>5GB) line by line in Python. Each line will be processed sequential too (e.g. slicing the string and pushing it to some function).

Am wondering can this be done using parallel threads / multithread in Python to make it run faster?? Also I would like to minimize the memory footprint since there are other processes running.

Any help or push in the right direction is much appreciated.

What I have tried:

python readline, readlines and streaming
Posted
Updated 3-Jan-20 7:04am

1 solution

Since a file can only be read by one thread, you'd have to have a component doing all the reading and doling out the lines to process to your processing threads.

Keep in mind, that multi-threading does not automatically make some process go faster. It depends on the problem and the amount of processing you're doing, how long each pass in the processing takes, what the bottlenecks are in the process, ...

If the major bottleneck is I/O (reading the file from whatever device or external source), threading the processing isn't going to do anything for you. Your threads are going to be stalled waiting for something to process.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900