I wrote a function that converts all the dates of the specified date column to the specified format. Any missing or invalid date is replaced with a value specified by the user.
The code also takes "serial dates" like "5679" into consideration. But my code isn't working on the serial numbers. Let me know where the issue is.
Example:
Input: 45678
Expected Output: 2024-06-27
Current Output: NaN
What I have tried:
Code:
import pandas as pd
import math
def date_fun(df, col_name, date_format, replace_value):
# Convert column values to string
df[col_name] = df[col_name].astype(str)
# Check if the column contains serial dates
if df[col_name].str.isnumeric().all():
# Convert the column to integer
df[col_name] = pd.to_numeric(df[col_name], errors='coerce')
# Check if the values are within the valid range of serial dates in Excel
if df[col_name].between(1, 2958465).all():
df[col_name] = pd.to_datetime(df[col_name], unit='D', errors='coerce')
else:
if replace_value is not None:
df[col_name] = replace_value
else:
df[col_name] = "Invalid Date"
else:
df[col_name] = pd.to_datetime(df[col_name], errors='coerce')
# Convert the datetime values to the specified format
df[col_name] = df[col_name].dt.strftime(date_format)
# Replace invalid or null dates with the specified value (if any)
if replace_value is not None:
replace_value = str(replace_value) # convert to string
df[col_name] = df[col_name].fillna(replace_value)
new_data = df[col_name].to_dict()
# Handle NaN and infinity values
def handle_nan_inf(val):
if isinstance(val, float) and (math.isnan(val) or math.isinf(val)):
return str(val)
else:
return val
new_data = {k: handle_nan_inf(v) for k, v in new_data.items()}
return new_data
Input
25.09.2019
9/16/2015
10.12.2017
02.12.2014
08-Mar-18
08-12-2016
26.04.2016
05-03-2016
24.12.2016
10-Aug-19
abc
05-06-2015
12-2012-18
24-02-2010
2008,13,02
16-09-2015
23-01-1992, 7:45
2nd December 2018
45678
My output
"2019/09/25",
"2015/09/16",
"2017/10/12",
"2014/02/12",
"2018/03/08",
"2016/08/12",
"2016/04/26",
"2016/05/03",
"2016/12/24",
"2019/08/10",
"nan",
"2015/05/06",
"nan",
"2010/02/24",
"2008/02/01",
"2015/09/16",
"1992/01/23",
"nan",
"2018/12/02",
"nan"
Date Format specified: "%Y/%m/%d"