Click here to Skip to main content
15,878,430 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm programmatically trying to detect the column in a dataframe that contains dates & I'm converting the date values to the same format.

My logic is to find the column name that contains the word 'Date' either as a whole word or as a sub-word (using contains()) & then work on the dates in that column.

My dataframe:

random  joiningdate branch
1       25.09.2019  rev
8       9/16/2015   pop
98      10.12.2017  switch
65      02.12.2014  high
45      08-Mar-18   aim
2       08-12-2016  docker
0       26.04.2016  grit
9       05-03-2016  trevor
56      24.12.2016  woll
4       10-Aug-19   qerty
78      abc         yak
54      05-06-2015  water
42      12-2012-18  rance
43      24-02-2010  stream
38      2008,13,02  verge
78      16-09-2015  atom


What I have tried:

from dateutil.parser import parse
import re
from datetime import datetime
import calendar

import pandas as pd

def date_fun(filepath):

            lst_to_ser=pd.Series(filepath.columns.values)

            date_col_search= lst_to_ser.str.contains(pat = 'date')

            #print(date_col_search.columns.values)

            for i in date_col_search:

               if i is True:

                   formatted_dates=pd.to_datetime(date_col_search[i], errors='coerce')
                   print(formatted_dates)

main_path = pd.read_csv('C:/Data_Cleansing/random_dateset.csv')
fpath=main_path.copy()
    
date_fun(fpath)



The retrieved column names are stored in an array & since contains() works only on 'Series' I converted the array to series.

This is what 'date_col_search' variable contains:

0    False
1     True
2    False
dtype: bool




I want to access the column corresponding to the 'True' value. But I'm getting the following error at the line 'formatted_dates=......':

Exception has occurred: KeyError True
How should I access the column that is 'True'?
Posted
Updated 8-Jan-23 23:16pm

1 solution

The KeyError is because you are using the boolean value True as an index into the date_col_search list in line 15. So change your code from line 13 on to the following:
Python
    for i in range(date_col_search):
        if date_col_search[i] is True:
#            formatted_dates=pd.to_datetime(date_col_search[i], errors='coerce')
            print(formatted_dates)

The commented line is incorrect; you cannot convert a boolean value to a datetime. You need to be addressing the column value of the current row. So you probably need an outer loop to iterate through all the rows of the series.

[edit]
Try the following code. Note that the parameter passed in to date_fun is a pandas table, not a file path.
Python
def date_fun(table):
#    print(table)
    lst_to_ser=pd.Series(table.columns.values)
#    print(lst_to_ser)
    date_col_search= lst_to_ser.str.contains(pat = 'date')
#    print(F'{date_col_search = }')
    for index, row in table.iterrows():
        for i in range(date_col_search.size):
            if date_col_search[i] == True:
#                print(row.iloc[i])
                dt = pd.to_datetime(row.iloc[i], errors='coerce')
                print(index, dt)

You can uncomment the print statements to see what happens at each part of the code.
[/edit]
 
Share this answer
 
v4

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900