Click here to Skip to main content
15,122,728 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello Everyone,

I have a question regarding for loops. I have a for loop that I would like to iterate through directories, subdirectories, and files looking for .*gz file extensions and then unpacking them to a list which I would later like to turn into a Pandas dataframe.

I'm using this loop:
Python
pdFrame = []
for dir, subdir, files in os.walk(path):
    for file in files:
        if glob2.fnmatch.fnmatch(file, '*tar.gz'):
            columns = ['Gene_ID', file[:file.find('.')]]
            df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
            df = df.set_index('Gene_ID')
            for name, value in df.items():
                pdFrame.append(value)
            data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
data_frame.to_csv('final_samples.csv', header=True)


After running the above-mentioned code, I get the following error message:

"NameError Traceback (most recent call last)
<ipython-input-5-40ae48c7e923> in <module>()
10 pdFrame.append(value)
11 data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
---> 12 data_frame.to_csv('final_samples.csv', header=True)
13
14

NameError: name 'data_frame' is not defined"

From what I found so far, it seems like one of my variables is not being defined because my if clause is not being evaluated as true. I have tried initializing the data_frame variable outside of the loop, however that hasn't fixed the problem. Can someone possibly help me with this?

What I have tried:

I have tried using the following loop to check if my value is not null as follows:

Python
pdFrame = []
data_frame = None
for dir, subdir, files in os.walk(path):
    for file in files:
        if glob2.fnmatch.fnmatch(file, '*.gz'):
            columns = ['Gene_ID', file[:file.find('.')]]
            df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
            df = df.set_index('Gene_ID')
            for name, value in df.items():
                pdFrame.append(value)
            data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
if data_frame is not None:
    data_frame.to_csv('final_samples.csv', header=True)
Posted
Updated 12-Jan-21 1:43am

1 solution

Python has scope : Python Scope & the LEGB Rule: Resolving Names in Your Code – Real Python[^]
So when you define the variable inside a for loop as you do, it is not available outside it and you get the error you are seeing, even if the if succeeds.
   
Comments
B. Copeland 12-Jan-21 6:58am
   
@OriginalGriff...thank you for your reply. If I were to create a data_frame variable outside of the loops, could I possibly leave it an empty list for example to side-step this error?
OriginalGriff 12-Jan-21 7:50am
   
Yes.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900