For loop causes name error [Python]

Question

0.00/5 (No votes)

See more:

Hello Everyone,

I have a question regarding for loops. I have a for loop that I would like to iterate through directories, subdirectories, and files looking for .*gz file extensions and then unpacking them to a list which I would later like to turn into a Pandas dataframe.

I'm using this loop:

Python

pdFrame = []
for dir, subdir, files in os.walk(path):
    for file in files:
        if glob2.fnmatch.fnmatch(file, '*tar.gz'):
            columns = ['Gene_ID', file[:file.find('.')]]
            df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
            df = df.set_index('Gene_ID')
            for name, value in df.items():
                pdFrame.append(value)
            data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
data_frame.to_csv('final_samples.csv', header=True)

After running the above-mentioned code, I get the following error message:

"NameError Traceback (most recent call last)
<ipython-input-5-40ae48c7e923> in <module>()
10 pdFrame.append(value)
11 data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
---> 12 data_frame.to_csv('final_samples.csv', header=True)
13
14

NameError: name 'data_frame' is not defined"

From what I found so far, it seems like one of my variables is not being defined because my if clause is not being evaluated as true. I have tried initializing the data_frame variable outside of the loop, however that hasn't fixed the problem. Can someone possibly help me with this?

What I have tried:

I have tried using the following loop to check if my value is not null as follows:

Python

pdFrame = []
data_frame = None
for dir, subdir, files in os.walk(path):
    for file in files:
        if glob2.fnmatch.fnmatch(file, '*.gz'):
            columns = ['Gene_ID', file[:file.find('.')]]
            df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
            df = df.set_index('Gene_ID')
            for name, value in df.items():
                pdFrame.append(value)
            data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
if data_frame is not None:
    data_frame.to_csv('final_samples.csv', header=True)

Posted 12-Jan-21 0:09am

B. Copeland

Updated 12-Jan-21 0:43am

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2021-01-12T00:43:00

Solution 1

Python has scope : Python Scope & the LEGB Rule: Resolving Names in Your Code – Real Python[^]
So when you define the variable inside a for loop as you do, it is not available outside it and you get the error you are seeing, even if the if succeeds.

Posted 12-Jan-21 0:43am

OriginalGriff

Comments

B. Copeland 12-Jan-21 6:58am

@OriginalGriff...thank you for your reply. If I were to create a data_frame variable outside of the loops, could I possibly leave it an empty list for example to side-step this error?

OriginalGriff 12-Jan-21 7:50am

Yes.