it's a little bit complicated , i have this dataframe :
ID TimeandDate Date Time
10 2020-08-07 07:40:09 2022-08-07 07:40:09
10 2020-08-07 08:50:00 2022-08-07 08:50:00
10 2020-08-07 12:40:09 2022-08-07 12:40:09
10 2020-08-08 07:40:09 2022-08-08 07:40:09
10 2020-08-08 17:40:09 2022-08-08 17:40:09
12 2020-08-07 08:03:09 2022-08-07 08:03:09
12 2020-08-07 10:40:09 2022-08-07 10:40:09
12 2020-08-07 14:40:09 2022-08-07 14:40:09
12 2020-08-07 16:40:09 2022-08-07 16:40:09
13 2020-08-07 09:22:45 2022-08-07 09:22:45
13 2020-08-07 17:57:06 2022-08-07 17:57:06
i want to create new dataframe with 2 new columns the first one is df["Check-in"] , as you can see my data doesnt have any indicator to show what time the id has checked in , so i will suppose that the first time for every id is a check-in , and the next row is a check-out and will be inserted in df["Check-out"] , also if a check-in doesnt have a check-out time it has to be registred as the check-out for the previous check-out of the same day
i tried to do the max and min for every ID and Date but i'm afraid its not efficient because it shows the first and last one imagine if ID=13 has entered at 07:40:09 and the he check out at 08:40:09 , later that day he returns at 19:20:00 and leave in the next 10 minutes 19:30:00 if i do that fonction it will show that he worked for 12 hours
Desired Result
ID Date Check-in Check-out
10 2020-08-07 07:40:09 12:40:09
10 2020-08-08 07:40:09 17:40:09
12 2020-08-07 08:03:09 10:40:09
12 2020-08-07 14:40:09 16:40:09
13 2020-08-07 09:22:45 17:57:06
What I have tried:
df=df[['ID','Date','Time']]
def check(x):
x = x.reset_index(drop=True)
if len(x)%2!=0:
x=x.drop(len(x)-2)
return x
df
df.groupby(['ID','Date'], as_index=False).agg(check)
g = df.groupby(['ID','Date'], as_index=False).agg(check).explode('Time').reset_index(drop=True)
g['in'] = np.where(g.index%2==0, g.loc[g.index,'Time'], np.nan)
g['out'] = np.where(g.index%2!=0, g.loc[g.index,'Time'], np.nan)
out = g.groupby(['ID','Date'], as_index=False).agg(list)
out['in'] = out['in'].apply(lambda x: [i for i in x if str(i) != "nan"])
out['out'] = out['out'].apply(lambda x: [i for i in x if str(i) != "nan"])
out[['ID','Date','in','out']].explode(['in','out']).reset_index(drop=True)
ALSO
new_col = []
for i in df.ID.unique():
for d in df.Date.unique():
p = df.loc[(df.ID==i)&(df.Date==d)]
suffix = sorted(list(range(1,len(p)))*2)[:len(p)]
if len(suffix)%2!=0 and len(suffix)>1:
suffix[-2]=np.nan
suffix[-1]-=1
new_col.extend(suffix)
df['new'] = new_col
df.dropna().groupby(['ID','Date','new'], as_index=False).agg({'Time':[min,max]}).drop('new', axis=1, level=0)
i get errors always
Output i desired:
ID Date Time
min max
0 10 2022-08-07 07:40:09 12:40:09
1 10 2022-08-08 07:40:09 17:40:09
2 12 2022-08-07 08:03:09 10:40:09
3 12 2022-08-07 14:40:09 16:40:09
4 13 2022-08-07 09:22:45 17:57:06