Click here to Skip to main content
15,391,967 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a dataset whose directory structure is extremely abysmal.A sample of the directory structure is shown below
Data
----1
------Jpeg
----------<arbitary-name>.jpg
-----------<arbitary-name>(1).jpg
----------<arbitary-name>(2).jpg
----------<arbitary-name>(3).jpg
----2
--------Jpeg
----------<arbitary-name>.jpg
----3
-------Jpeg
--------<arbitary-name>.jpg
-------<arbitary-name>(1).jpg
-------<arbitary-name>(2).jpg
-------<arbitary-name>(3).jpg
.
.
.
.
.
.
.
67
----Pose and expression change
------<arbitary-file-name>(1).jpg
-------<arbitary-file-name>(2).jpg
-------<arbitary-file-name>(3).jpg
----Reference Image
-----<arbitary-file-name>.jpg

Note that this is not the exact data strucutre.
for example from folder 1-15 might have one sub folder JPEG in which there are 3 or more images with extremely long file names.
and then from folder 20 to 25 each folder will have two sub folders similar to the one shown in folder 67

In order to make the dataset more consitent for further processing,I want to reorganize the folder as follows
data
----1
-----caucasian_male_1_[x].jpg
------[if folder 1 contains 3 images then x belongs to [0,1,2]
----2
----3
----4
----5
---6
.
.
.
.
.
.
.
----500
------caucasian_male_500_[x].jpg
{Again if folder 500 contains 4 jpg files then x varies from 0 to 3[0,1,2,3].

MY platform is windows and I am trying to come up with a solution to automate the process in python.Any suggestions on how to reorganise the data folder will be welcome.

What I have tried:

I am currently using the following code available on github
GitHub - weisslj/dir-edit: Rename or remove files in a directory using an editor[^]

But this is not efficient as you need to specify the directory everytime ,and then open a text file to edit the file name.
if the directory has say 1000 subfolders and each folder has 3 or more images,then that is not efficient.

I have also tried the following method .
It lists all the directories in the folder and is able to print the file names,but somehow I don't think this is correct
Python
import os
import shutil

files_l=[]
for root ,dirs,files  in os.walk('D:/dataset/'):
    
    for dire in dirs:
        file=os.path.join(root,dire)
        files_l.append(file)
        
files_l=sorted(files_l)   
x='D:/synthetic_photo/synthetic_data\\1\JPEG'
images=[]
ctr=0
for x in files_l:
    for file in os.scandir(x):
        new_path=os.path.join(x,'my_image_name.jpg')
        old_path=os.path.join(x,file)
        os.rename(old_path,new_path)
Posted
Updated 29-May-22 18:34pm
v2

1 solution

You should start with a function that lists all the files and directories at the root level of your tree. For each file it finds it should call a function that renames the file according to your rule. For each directory it finds it calls itself recursively to process that directory, and any others further down the chain. The main issue to deal with is to set up the rules for renaming your files. Take a look at File and Directory Access — Python 3.10.4 documentation[^] for the built in support.
   
Comments
Member 9842745 30-May-22 0:31am
   
Hi richard,I have attached my current solution can you recommend what changes I need to make in it
With this I am able to get a list of all the directories as well as the file names in the directory.
But when I try to rename the files I end up getting the following error
```
Cannot create a file when that file already exists: 'D:/synthetic_photo/synthetic_data\\10' -> 'D:/synthetic_photo/synthetic_data\\caucasian_male.jpg
```
Also tried using glob.glob option to list all files in root directory but that is not working also.
Richard MacCutchan 30-May-22 4:06am
   
You cannot use the same filename for multiple files; you need to make each one unique. A reasonable idea would be to use the original name as a suffix, something like:
newname = new_path=os.path.join(x,'my_image_name', file, '.jpg')

This assumes that all the original files are named, 1,2,3 etc. If they are different then you may have to use some other scheme, but the principle is the same.
Member 9842745 30-May-22 6:19am
   
actually I was thinking of adding a numeric suffix.For example if there are 3 files in the folder,then suffix should be _0,_1 and _2.But when I try to print the no files in that directory.Do you think that might work?
Richard MacCutchan 30-May-22 6:54am
   
Yes, anything will work as long as all filenames in a directory are unique.
Member 9842745 30-May-22 7:32am
   
could you give me some suggestions on what changes should I make in my code to incorporate the suggested idea.
I tried introudcing the ctr variable,but the end value of ctr comes out to be 1726 which is the total no of files in the entire root folder.
Richard MacCutchan 30-May-22 7:42am
   
You just need to reset the counter each time you start a new directory.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900