Click here to Skip to main content
14,773,218 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to separate a dataframe according to unique values of two columns and then calculate the difference in their timestamps. If a difference does not satisfy the limits, then that particular row is deleted.

I have done this using nested for loops but I want to use a more efficient approach as this does not scale well with larger datasets. The code is provided below:

What I have tried:

import pandas as pd  


d = {'Timestamp': ['2020-09-02 07:00:00','2020-09-02 07:10:00', '2020-09-02 07:30:00', '2020-09-02 08:00:00', '2020-09-02 10:00:00', '2020-09-02 11:10:00', '2020-09-02 11:30:00'], 'type': ['A','A','B','A', 'A','A','B'], 'value': [1,2,3,1,1,2,3]}


df = pd.DataFrame(data=d)
df
df3 = pd.DataFrame()
    
unique_type = pd.unique(df['type']).astype(str)  

for i in range(0,len(unique_type)):
    
    df1 = df[df.type == unique_type[i]]
    unique_val = pd.unique(df1['value']).astype(int)  
    
    for j in range(0, len(unique_val)):
        df2 = df1[df1.value == unique_val[j]]
        
        trange = pd.to_datetime(df2.Timestamp)
        tdiff = (trange-min(trange)).dt.total_seconds()/3600
        df2['tdiff'] = tdiff#.round(1)
       
        
        df3 = df3.append(df2, ignore_index=True)




df4 = df3[~((df3.tdiff>0) & (df3.tdiff<3))] 
print(df)

df4.sort_values(by=['Timestamp'])
Posted
Updated 12-Dec-20 13:47pm
Comments
Richard MacCutchan 13-Dec-20 4:01am
   
Looking at your code I cannot see exactly what you are trying to achieve. If you need to go through a list of elements then some sort of loop is the only way to do it.

1 solution

Quote:
How do I move away from for loops in Python

try to read documentation
Python for Keyword[^]
   
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900