Click here to Skip to main content
15,745,794 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have implemented this code:

dfA = pd.read_csv(args.file,index_col="Full_url",sep=",",engine='c',skipinitialspace=True, encoding='utf-8',dtype={ "City": object,"Country": object,"State": object,"Email": object,"Identifier": object,"Family": object,"Given": object,"Prefix": object,"Suffix": object,"Phone": object})

indexer = rl.Index() indexer.add(Full()) candidate_links = indexer.index(dfA) compare_cl = rl.Compare()

compare_cl.exact('Identifier', 'Identifier', label='Identifier') compare_cl.string('City', 'City', method='jarowinkler', threshold=0.85, label='City') compare_cl.string('Country', 'Country', method='jarowinkler', threshold=0.85, label='Country') compare_cl.string('State', 'State', method='jarowinkler', threshold=0.85, label='State') compare_cl.string('Email', 'Email', method='damerau_levenshtein', threshold=0.80, label='Email') compare_cl.string('Family', 'Family', method='jarowinkler', threshold=0.80, label='Family') compare_cl.string('Given', 'Given', method='jarowinkler', threshold=0.80, label='Given') compare_cl.string('Prefix', 'Prefix', method='jarowinkler', threshold=0.80, label='Prefix') compare_cl.string('Suffix', 'Suffix', method='jarowinkler', threshold=0.80, label='Suffix') compare_cl.exact('Phone', 'Phone', label='Phone')

features = compare_cl.compute(candidate_links, dfA)

However, I have a problem because the column 'Family' is a vector of names with a variable length.
For example, a register could be: Family=Daniel||Alex||John||Felix

The items in a vector always are splitted by the character "||". Can I compare the column 'Family' as a vector? How do I indicate the character of separation?


What I have tried:

I have' tried nothing because i can't find a viable solution.
Updated 14-Mar-19 1:44am

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900