Remove duplicates from datatable

Question

0.00/5 (No votes)

See more:

I have a csv file which contains several columns of data, one of which is datetime and the rest are various values. There are multiple rows for which the datetime is the same and only one of the rest of the columns is different.

I need to be able to have only 1 row for each datetime and fill in each column. I have done this using Visual Studio with a data table but would like to be able to do the same in something outside of VS.

What I have tried:

I've looked for Python applications but can't find a suitable data table method.

Posted 29-Apr-22 1:09am

Member 11109279

Updated 8-May-22 2:53am

Add a Solution

Comments

Richard MacCutchan 29-Apr-22 7:56am

You can read the data into Excel and manipulate it there.

Maciej Los 29-Apr-22 14:49pm

Richard, why?
OP should define what language want to use.

Richard MacCutchan 30-Apr-22 3:29am

Why not? If it works then it is a solution.

Member 11109279 30-Apr-22 4:38am

The problem is that is works on my computer but there are so many requirements using VS that it hardly ever works when I send it to somebody else and I was looking for a solution other than VS. Preferably Python.

Member 11109279 30-Apr-22 4:43am

Were talking about hundreds, if not thousands of rows of data. The person responsible for doing this monthly is doing it that way and it's taking way too much time.

Richard MacCutchan 30-Apr-22 5:01am

Maybe you could have put that information in your original question. Remember the only information we have to work on is what you tell us.

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Answer 1 · 2022-04-29T22:49:00

Solution 1

You can manipulate Excel data with pandas or A Guide to Excel Spreadsheets in Python With openpyxl – Real Python[^].

Posted 29-Apr-22 22:49pm

Richard MacCutchan

Maciej Los · Answer 2 · 2022-05-08T02:53:00

Your requirement is not quite clear. I'd suggest to use pandas.DataFrame[^] with methods:
- read_csv[^]
- drop_duplicates[^].

For example:

Python

#dataframe creation
df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})
#removing duplicates:
df.drop_duplicates(subset=['brand'])

#before:
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

#after:
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5