Click here to Skip to main content
15,890,123 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have tried to use Task Parallel Library within my for each statement to search the filesystem with no luck. I know that I must be doing something wrong or do not understand how to implement the multiple process threading.

along that being said I have seen some limited solutions that use the get files method but I have moved away from that because I wish to search for more than one file extension.



can someone please give a link or quick example of how to iterate a filesystem using this method?

thank you very much!

the for each statement is as follows:

VB
Try
               Using fse As New FindFiles.FileSystemEnumerator(pathToSearchCombo.Text, fileSpecsCombo.Text, includeSubdirectoriesCheck.Checked)
                   Dim ien As IEnumerator(Of FileInfo) = fse.Matches().GetEnumerator()
                   ien.Dispose()
                   For Each fi As FileInfo In fse.Matches
Posted

I doubt you are going to get the TPL to give you much of a performance gain here. Most of the processing time is involved in generating your list. If you just do something simple with each file you won't get any benefits from the TPL and you will just add complexity. However, if you were to do something complicated or processor-intensive with each file then the TPL would really help you out.

The other option here would be to use the TPL to search each directory in a set of directories. Say, for example, you wanted to do the search in the following directories for the files you wanted:

  • C:\Files
  • C:\Temp
  • C:\Users\Tim\Documents
  • F:\
  • G:\Data
  • S:\SavedDocs


You could set these folder names in an array and then do a FindFiles method that took in the name of the folder path. Then you could do a Parallel.ForEach on the array and run the method inside each task. That would give you some performance gains, although I'm not sure you would see a lot just because the file system only has so much performance to give.

Usually the TPL does best when you are doing large calculations or are dealing with tasks that have waits in them (async for example). Then the TPL will go to the next task when one is waiting and thus you don't waste your processor. Disk access won't be sped up with the TPL.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 3-Jun-12 23:18pm    
Good points, a 5. Overall, using TPL should not be very effective for such task.
--SA
Since the disk is pretty much a serial device, it can only give you the file/directory information, or a single disk sector, for a single thread at a time, file system/disk scanning doesn't really lend itself to parrellism.

I know caching will make that statement false to a limited degree. But, it will not in the case of scanning an entire file system, as caching will only go so far before you're reading the disk again.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 3-Jun-12 23:17pm    
Agree, a 5.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900