Click here to Skip to main content
15,795,793 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
You are given a file called data.txt. Each line in this file contains a name and an email address separated by a comma. However, some of the lines are corrupted and might have more than one comma or might be missing an email or a name. Your task is to write a function that parses this file, filters out the corrupted lines, and returns a dictionary where names are keys and email addresses are values.

The output dictionary should have all the names in lowercase and sorted based on the names.

For example, If data.txt contains:

John Doe,
Malformed Data,anotherdata,
Bob Smith

Your function should return:

'alice': '',
'eve': '',
'john doe': ''

What I have tried:

def parse_data(filename):
    result = {}

    with open(filename, 'r') as file:
        lines = file.readlines()

    for line in lines:
        parts = line.strip().split(',')

    return result

This is all I got at the moment. The function parse_data is supposed to open a file and read its content line by line. Each line is split using the comma as a delimiter. I need to complete the function such that it:
1. Checks if the line is valid (i.e., it has exactly two parts: a name and an email).
2. Adds valid data to the result dictionary.
3. Sorts the dictionary based on names.

However, I am not sure where to of what to start with in the for loop. May I please get some hints?
Updated 24-Jul-23 22:29pm

Start by creating a 5. Data Structures — Dictionary[^] to hold your valid entries. Then you just need to read the file one line at a time, processing each line as required. You can use the split method of the Built-in Types — String[^] to find lines that have more, or fewer, than two items. If the item is valid then add it to the dictionary. When you have processed all the items you just need to sort the dictionary and print it.
Share this answer
Start with the documentation: Python String split() Method[^] and look at what it returns. You can use that to check how many parts it has.
You can then check if the first is not blank, and the second is a valid email address: Valid email address format[^]
Create a Dictionary outside the loop, and add only valid name / address pairs to it.

If you are having problems getting started at all, then this may help: How to Write Code to Solve a Problem, A Beginner's Guide[^]
Share this answer
Here's a high level approach. I don't do python, but the concept will be the same in any language.

Split at the (@) sign. Make sure you get two values. The right hand side is the email domain.
Split the left hand value on (,). Make sure there are two values. The first value is the persons name, the second value is the email name (minus the domain)
Share this answer
to catch malformed entries, put your split within a try/except block
try: additional processing... except: rejected records...

once you've split the record,
ensure it meets the following basic validation criteria:
- contains 2 non null members and that the 2nd member contains an 'at' character @
once that's done add it to your dictionary, for example:
mydictionary=[name.strip().lower()] = emailaddr.strip.lower()
Share this answer
Richard MacCutchan 25-Jul-23 4:00am    
The '@' (at sign) character is not an ampersand, that is represented by '&'.
Member 10601191 25-Jul-23 8:32am    
thks, early morning for me, have amended.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900