I am trying to separate a dataframe according to unique values of two columns and then calculate the difference in their timestamps. If a difference does not satisfy the limits, then that particular row is deleted.
I have done this using nested for loops but I want to use a more efficient approach as this does not scale well with larger datasets. The code is provided below:
What I have tried:
import pandas as pd
d = {'Timestamp': ['2020-09-02 07:00:00','2020-09-02 07:10:00', '2020-09-02 07:30:00', '2020-09-02 08:00:00', '2020-09-02 10:00:00', '2020-09-02 11:10:00', '2020-09-02 11:30:00'], 'type': ['A','A','B','A', 'A','A','B'], 'value': [1,2,3,1,1,2,3]}
df = pd.DataFrame(data=d)
df
df3 = pd.DataFrame()
unique_type = pd.unique(df['type']).astype(str)
for i in range(0,len(unique_type)):
df1 = df[df.type == unique_type[i]]
unique_val = pd.unique(df1['value']).astype(int)
for j in range(0, len(unique_val)):
df2 = df1[df1.value == unique_val[j]]
trange = pd.to_datetime(df2.Timestamp)
tdiff = (trange-min(trange)).dt.total_seconds()/3600
df2['tdiff'] = tdiff#.round(1)
df3 = df3.append(df2, ignore_index=True)
df4 = df3[~((df3.tdiff>0) & (df3.tdiff<3))]
print(df)
df4.sort_values(by=['Timestamp'])