Click here to Skip to main content
15,886,963 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Python

The dataframe df contains many rows and 2 columns, one of which "Key" is blank and other 'News" is populated. 
Data Frame:
Key 	News
	    Tata Steel to declare results   
        Tesla Palns Expansion as per presse release and dividend
        Oracle to give results and announce buyback
	    Bhart Airtel Meeting for results and dividend:

The keyword List is{ 'result', 'buyback', 'dividend')

The dataframe df is already filtered from a larger data frame by me based on list and contains at least one keyword from the list in the 'News" column.
coammnd used: df = df_large [df_large['News'].str.contains('|'.join(list))]

I want the key field in df to be populated by keyword (one or more) in the List, depending on how many times it appears in the "News' column in the df.
The resulting  data frame should look like. 
Key 			NEWS 
result			    Tata Steel to declare results   
dividend       		Tesla Plans Expansion as per press release and dividend
result, buyback   	Oracle to give results and announce buyback
result, dividend	Bhart Airtel Meeting for results and dividend:

Is iteration the only way. even if yes, what is the optimum way. 

an analogy with other data is posted

What I have tried:

import requests
import pandas as pd
import numpy as np

data = {'Name': ['Tom', 'Joseph','Krish', 'Mohan', 'Ram'], 'Age': [20, 21, 19, 18, 29],'Sport':['football', 'hockey football badminton', 'cricket', 'tennis football', 'hockey cricket']}
df= pd.DataFrame(data)
df= df.assign(KEY="")
print(df)
list = ['football','hockey']  # list of sports to filter
list_s = np.array(list)
print(list_s)
#Filter rows from df which are in list_s
dff = df[df['Sport'].str.contains('|'.join(list_s))]
print(dff)
#dff is filtered list and this is working  fine and i wish to reetain it. 
# not clear hereafter
# depending on how many times a string from list appears in 'Neme' col. of dff,  the respective strings (one of more) should find way into the Field 'Key'in the df
# eg, 2nd row should contain  'hockey,footbal'l  in column 'key' of dff
# may be some modification of dff["Key"] = dff.loc[(dff["Sport"].str.contains('|'.join(list_s)))] works
#df["topic"] = df.loc[(df["tags"].str.contains('|'.join(pattern), na=False)), True] = True
Posted
Updated 16-Jan-22 10:55am

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900