The dataframe df contains many rows and 2 columns, one of which "Key" is blank and other 'News" is populated.
Data Frame:
Key News
Tata Steel to declare results
Tesla Palns Expansion as per presse release and dividend
Oracle to give results and announce buyback
Bhart Airtel Meeting for results and dividend:
The keyword List is{ 'result', 'buyback', 'dividend')
The dataframe df is already filtered from a larger data frame by me based on list and contains at least one keyword from the list in the 'News" column.
coammnd used: df = df_large [df_large['News'].str.contains('|'.join(list))]
I want the key field in df to be populated by keyword (one or more) in the List, depending on how many times it appears in the "News' column in the df.
The resulting data frame should look like.
Key NEWS
result Tata Steel to declare results
dividend Tesla Plans Expansion as per press release and dividend
result, buyback Oracle to give results and announce buyback
result, dividend Bhart Airtel Meeting for results and dividend:
Is iteration the only way. even if yes, what is the optimum way.
an analogy with other data is posted
What I have tried:
import requests
import pandas as pd
import numpy as np
data = {'Name': ['Tom', 'Joseph','Krish', 'Mohan', 'Ram'], 'Age': [20, 21, 19, 18, 29],'Sport':['football', 'hockey football badminton', 'cricket', 'tennis football', 'hockey cricket']}
df= pd.DataFrame(data)
df= df.assign(KEY="")
print(df)
list = ['football','hockey'] # list of sports to filter
list_s = np.array(list)
print(list_s)
#Filter rows from df which are in list_s
dff = df[df['Sport'].str.contains('|'.join(list_s))]
print(dff)
#dff is filtered list and this is working fine and i wish to reetain it.
# not clear hereafter
# depending on how many times a string from list appears in 'Neme' col. of dff, the respective strings (one of more) should find way into the Field 'Key'in the df
# eg, 2nd row should contain 'hockey,footbal'l in column 'key' of dff
# may be some modification of dff["Key"] = dff.loc[(dff["Sport"].str.contains('|'.join(list_s)))] works
#df["topic"] = df.loc[(df["tags"].str.contains('|'.join(pattern), na=False)), True] = True