I am comparing the source and target dataset (9000 rows, 14 columns) and highlighting the difference using below code. here using apply method I am calling the function. but the problem is, highlighting difference works till some rows, after that rows are not getting highlight differences. Not sure what is the limitation or am I missing something here?
Note: I am not sure how to attach the dataset file with this question but I am using sample csv with 9000 rows,14 columns from
https://extendsclass.com/csv-generator.html
example here in first record, its highlighting the difference,but after ID 4144 its not getting highlighted
What I have tried:
import pandas as pd
import numpy as np
from IPython.display import display, HTML
src = pd.read_csv('mysrc.csv')
tar = pd.read_csv('mytar.csv')
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other = data.xs('Source', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''),
index=data.index, columns=data.columns)
df_all = pd.concat([src.set_index('id'), tar.set_index('id')],
axis='columns', keys=['Source', 'Target'])
df_final = df_all.swaplevel(axis='columns')[src.columns[1:]]
caption_styles = {
'selector':'caption',
'props':[('font-weight', 'bold'),('margin-bottom','25px'),('font-size','25px')]
}
select_styles = {
'selector': '',
'props': [('border-collapse', 'collapse'),('margin', '100px auto')]
}
table_styles = {
'selector': 'table',
'props': [('border-collapse', 'collapse'),('margin', '100px auto')]
}
tbody_styles = {
'selector': 'tbody',
'props': [('border', '1px solid #A2DBFA')]
}
td_styles = {
'selector': 'td',
'props': [('border', '1px solid #A2DBFA'),('padding','1em'),('border-bottom','2px solid #A2DBFA')]
}
th_styles = {
'selector': 'th',
'props': [('border', '1px solid #A2DBFA'),('padding','1em'),('background-color','#39A2DB'),('border-bottom','2px solid #A2DBFA')]
}
df_out = df_final.style.set_table_styles([caption_styles,select_styles,table_styles, tbody_styles,td_styles, th_styles]).set_caption("Test Case 1").apply(highlight_diff, axis=None)
df_out