Click here to Skip to main content
15,886,873 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Certainly! I have two Excel files, each containing multiple sheets. In my existing code, using Data frames, I've implemented logic to compare a specific sheet from both files, and I'm obtaining the desired results.

However, the issue is that my code currently only retrieves that specific sheet from both files. But I would like to modify my existing code to also include the remaining sheets from the input files in my output file. How can I achieve this?

What I have tried:

The code I tried:

Python
import pandas as pd
import numpy as np
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Border, Side

df1= pd.read_excel(r'D:\excel1.xlsx','Sheet1',na_values=['NA']).fillna('')
df2= pd.read_excel(r'D:\excel2.xlsx','Sheet2',na_values=['NA']).fillna('')

df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

df3 = pd.concat([df1,df2], sort=True, copy=True)
df3a = df3.stack().explode().groupby(level=[0,1]).apply(lambda x: ', '.join(map(str, x.unique()))).unstack(1).copy()

df3a['status'] = ""
       
df3a.loc[~df3a.index.isin(df2.index),'status'] = 'old' # if not in df2 index 
df3a.loc[~df3a.index.isin(df1.index),'status'] = 'new'     # if not in df1 index

idx = df3.stack().groupby(level=[0,1]).nunique() # get modified cells. 
df3a.loc[idx.mask(idx <= 1).dropna().index.get_level_values(0),'status'] = 'modified'
df3a['status'] = df3a['status'].fillna('same') # assume that anything not fufilled by above rules is the same.
        
reorder_columns = df1.columns.tolist() + ['status']
df3a = df3a[reorder_columns]

#-------------------------------------Highlight rows with different colors---------------------------------------------------------------------------#
with pd.ExcelWriter(r'D:\excel_output.xlsx') as writer:
        df3a.to_excel(writer, sheet_name='Sheet1', index=True)

workbook = load_workbook(r'D:\excel_output.xlsx')

sheet1 = workbook['Sheet1']

#rest of the code I implemented logic to apply formatting to that particular sheets..........

sheet1.delete_cols(13)

workbook.save(r'D:\excel_output.xlsx')
Posted
Updated 25-Oct-23 4:09am
v2
Comments
Richard MacCutchan 17-Oct-23 4:14am    
Check the documentation to see if there is a function to list all worksheets in a workbook.

You just need to iterate through the worksheets in the workbooks. So something like rhe following will list the sheet names in the workbook.
Python
workbook = load_workbook(r'D:\excel1.xlsx')
for sheet in workbook.worksheets:
    print(sheet.title)

So you need something similar for the other workbook. You can then select each pair of worksheets that you wish to compare, and pass the details to the function that processes the content.
 
Share this answer
 
Comments
CPallini 17-Oct-23 8:04am    
5.
Maciej Los 27-Oct-23 11:32am    
5ed!
You can obtain the name of sheets through panda's dataframe methods. See:

Python
import pandas
import openpyxl

# read workbook:
df = pandas.read_excel("D:\\ConflictData.xlsx", None)
# then - method 1:
print(df.keys())


As you can see, df returns a dictionary of dataframes - the names of sheets :)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900