The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
IIRC, NTFS uses a B-tree variant to store file names in a directory. This guarantees fast access to a single file, but may slow down access if you are trying e.g. to enumerate all files in the directory.
FAT32 has a limit of just under 64K entries. The search is linear. Note that a long filename takes at least two entries - one for the short name and one for the long name.
I don't know how exFAT stores directories.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage?
If performance downgrade with number of files in a directory, there is only 1 explanation:
The directory is organized as a flat list of files, unsorted.
This imply that to find a file, you have to scan the list/directory sequentially. In O(n).
If an OS can have the directory sorted in the order you look for (file name), cost of finding a file is in O(log(n))
“Everything should be made as simple as possible, but no simpler.” Albert Einstein
As already mentioned, the problems will start when you try to browse the disk in question with pretty much any existing application.
A better option would be to put the files in a database as blobs. At that point, you'll only have one file on the disk for the database itself. It wouldf also be easier to organize and manage than a complex folder hierarchy.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
We have some directories that contain that big number of files, the record I can remember right now is around 450k files in a folder.
They come from long time meassurements that trigger a data file a between 3 and 5 in a minute, each between 1 and 5 Mb.
Accessing the directory is slow, changing the order from name to timestamp is slow, moving the directory to another place is slow, getting the properties of the folder is slow, deleting the folder once is not needed anymore is slow.
Windwos 10 even slower specially the "folder properties" it needs over 15 minutes to count the files and give the size of the folder.
Windows 7 did it in 30 or 40 seconds.
We can't move that to FAT drives, due to number limitations as other said. Need to be NFTS.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
Neither Windows nor Linux do well when putting too many files in a single folder. I've tried it with a million files, it is very painful. Some operations, like simply listing the directory, or even trying to delete the files take absurdly long.
It seems to be doing some operations that are simply not designed for large numbers of files.
Like said, around 10,000 files in a folder is a reasonable max. I simply make it 1,000. So for a million files, spread them across 1,000 folders. There is a nice symmetry here, and it works like a charm.
A few years ago I worked on a system that generates around 50.000 to 100.000 files a day.
We ran in trouble right away.
Storing the files was not a problem, but retrieving them was impossible.
And a second problem was that we needed to search the contents of the files to find all files with a certain string in the text.
We eventually choose to store all files in a database. This was quite easy because the files were small. (Less than 10K)
We choose an Oracle database because of the CLOB datatype. (it allows for indexing and searching)
We had no problems since and have more the 200 million files.
I worked on a system that had to stream 1MB images to disk at 75fps. I found that once there were about 700 files in a directory, creating new files suddenly became slower and the required transfer rate was unachievable. I ended up creating a new subdirectory every 500 files.
Of course this won't be a problem if your system is purely for archive.
I don't know about access issues for a large number of files in a directory, but you might also consider security issues.
If, for example, you have several different users whose files should not be accessible by the others, creating a subfolder for each user might allow you to secure them such that only their user has access to their subfolder (plus maybe some 'admin' user that you use which can see all directories). Obvious organizational advantages as well.
It really depends on your use case for accessing/managing these files. If you're going to be enumerating the files a lot (or portions of the files) then everything in one directory/folder may not be the best. You can at least "chunk up" the enumeration by subfolder if you create those.
Also, if you break them up into subfolders in some logical way, then managing those units and/or groupings of files will become much easier. I.E. Backups, restoring, archiving, deleting.
If you are storing the path to each file in a database, then you're going to get the same performance either way (subdirectories and everyone in the pool together).
Can you explain a little more about the repository and how you'll be using it?
Consider drive corruption, backups, replication, file listeners, aging/document retention and all of the other access aspects as well. Folder per day/month/year can help out with some of those items as suggested on another post.
You might get by if using SSDs, or files are large and accessed directly & infrequently, and won't increase by orders of magnitude.
Better to spread them out.
Huge directories in NTFS:
* Accessing individual files is OK
* Adding/removing/listing/sorting gets slow (consider EnumerateFiles instead of GetFiles)
* Reading metadata (mod date) is slow (makes Explorer detail view slow)
* Network access is slower
* Defragging directories (with contig) helps some (also moving large dirs with robocopy /create)
Directories (and empty/tiny files) are stored in the MFT.
A massive number of MFT entries can be a problem.
The MFT starting size is set when (and only when) you format the disk (controlled by a registry key). It will expand if needed (but fragment), and will contract (if possible) when space is low.
Defragging MFT is possible but slow and difficult.
After a disk was full of files, or had the MFT filled by directories or tiny files -- it may be best to reformat.
How to segment depends on how sparse the file IDs will be.
About 4k entries is a good starting target.
If files have numeric IDs: Avoid bit shifts, for simplicity.
Group into 3 digits (base 10) = 1000 files + 1000 subdirs
or 3 hex chars 0xFFF = upto 4k files + 4k subdirs
A while ago I dropped half a mug of coffee over my cheap USB keyboard and of course it died. Been given a Corsair K68 RGB, a mechanical keyboard (+ non spill mug). Sturdy keys, good for bulk typing. So, few days ago, the unthinkable happened - I dropped a full mug of coffee on the new keyboard.
..and it still works, without flaws. There's an anti-spill rubber between the keys and the internals, and there's a metal covering under that which protects the electronics. I'm convinced that nothing is idiot proof and I will find a weak spot, but for now, I'm pretty impressed. It's easily taken apart and cleaned and might actually last longer than a year.
Really boring - a MS 600 I bought after my Logitech died. And I only got the Logi because after 20 years of abuse the keytop legends had nearly all worn off and Herself complained that she didn't know where the letters were ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
Last Visit: 14-Jul-20 5:32 Last Update: 14-Jul-20 5:32