Hello Everyone,
I have a question regarding for loops. I have a for loop that I would like to iterate through directories, subdirectories, and files looking for .*gz file extensions and then unpacking them to a list which I would later like to turn into a Pandas dataframe.
I'm using this loop:
pdFrame = []
for dir, subdir, files in os.walk(path):
for file in files:
if glob2.fnmatch.fnmatch(file, '*tar.gz'):
columns = ['Gene_ID', file[:file.find('.')]]
df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
df = df.set_index('Gene_ID')
for name, value in df.items():
pdFrame.append(value)
data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
data_frame.to_csv('final_samples.csv', header=True)
After running the above-mentioned code, I get the following error message:
"NameError Traceback (most recent call last)
<ipython-input-5-40ae48c7e923> in <module>()
10 pdFrame.append(value)
11 data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
---> 12 data_frame.to_csv('final_samples.csv', header=True)
13
14
NameError: name 'data_frame' is not defined"
From what I found so far, it seems like one of my variables is not being defined because my if clause is not being evaluated as true. I have tried initializing the data_frame variable outside of the loop, however that hasn't fixed the problem. Can someone possibly help me with this?
What I have tried:
I have tried using the following loop to check if my value is not null as follows:
pdFrame = []
data_frame = None
for dir, subdir, files in os.walk(path):
for file in files:
if glob2.fnmatch.fnmatch(file, '*.gz'):
columns = ['Gene_ID', file[:file.find('.')]]
df = pd.read_csv(os.path.join(dir, file), compression='gzip', sep='\t', names=columns, header=None)
df = df.set_index('Gene_ID')
for name, value in df.items():
pdFrame.append(value)
data_frame = pd.concat(pdFrame, axis=1, ignore_index=False)
if data_frame is not None:
data_frame.to_csv('final_samples.csv', header=True)