Click here to Skip to main content
15,178,967 members
Please Sign up or sign in to vote.
2.50/5 (2 votes)
See more:
Hi,
I have a list of strings. I want to find index of duplicates and remove them.
How can I do this?

What I have tried:

List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var indexOf = myList.Select((value, index) => new { value, index }).GroupBy(g => g.value).Where(pair => pair.Count() > 1).Select(pair => pair.index);
Posted
Updated 24-Aug-21 14:25pm
v3
Comments
Richard MacCutchan 24-Aug-21 2:49am
   
And? What happens when you run it?
Alex Dunlop 24-Aug-21 2:56am
   
The last part is wrong (pair.index).
BillWoodruff 25-Aug-21 2:52am
   
Question up-voted to show appreciation for this poster using the appropriate folder, and making a good effort towards solving the problem.

Alex, the behavior of GroupBy, and dealing with anonymous types, are advanced topics. The Value of each IGrouping in the IEnumerable produced by GroupBy is an IEnumerable which usually needs to be evaluated (turned into a List) before using it.
BillWoodruff 25-Aug-21 4:57am
   
If you want to see how a Zen master solves a problem like this: see Richard Deeeming's comment on my post below.

Try linq remove duplicates - Google Search[^].

[edit]
This will produce a list with the name and index, so you could maybe build on that:
C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Select((value, index) => new {Value =value, Index = index});
foreach (var x in unique)
{
    Console.Write($"{x.Value}: {x.Index}, ");
}
Console.WriteLine("
");


[/edit]
   
v2
Comments
Alex Dunlop 24-Aug-21 2:58am
   
Thanks. I know that Distict() can render unique list. But I want to find the index of them.
Richard MacCutchan 24-Aug-21 3:18am
   
There does not seem to be a simple way solution. You can use IndexOf, FindIndex etc, but I am not sure they will do what you want.
Richard MacCutchan 24-Aug-21 4:00am
   
See my update.
Maciej Los 24-Aug-21 4:07am
   
5ed!
BillWoodruff 24-Aug-21 17:43pm
   
I don't see the value of this: you create an IEnumerable of anonymous Types that replicates the structure of the List.

But, a for loop index is an easier way to get the index.

The one thing I see in this that could be exploited is the fact that each instance of the anonymous Type is unique.

imho, the OP's real problem is a lack of understanding of GroupBy and anonymous Types, and the fact that string instances with identical content, but different indexes in the list, will all be "equal" if compared. Yes: FindIndex, with the right predicate, can be used in a solution.
Richard MacCutchan 25-Aug-21 3:40am
   
Hence my comment above the code.
See here: Enumerable.Distinct Method (System.Linq) | Microsoft Docs[^]
C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
var unique = myList.Distinct();
   
Quote:
I want to find index of duplicates and remove them


Well, if you want to remove duplicates from original list, take a look at below code and read comments:

C#
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4" };
//get unique values and its indexes
List<Tuple<string, int>> uniquelist = myList.Distinct()
	.Select(x => new Tuple<string, int>(x, myList.IndexOf(x)))
	.ToList();
//find indexes of duplicates
List<int> indOfDup = Enumerable.Range(0, myList.Count)
	.Where(x => !uniquelist.Any(y => y.Item2==x))
	.ToList();
//remove duplicates from original list
for(int i = indOfDup.Count()-1; i>=0; i--)
{
	myList.RemoveAt(indOfDup[i]);
}

//done!
//original list has got non duplicates ;)


Good luck!
   
v2
Comments
Richard MacCutchan 24-Aug-21 6:57am
   
+5. I had a feeling it could not be done in a single step.
Maciej Los 24-Aug-21 7:03am
   
Thank you, Richard.
BillWoodruff 24-Aug-21 20:17pm
   
+5 nice ... even though I think you are doing it the hard way :)
Maciej Los 25-Aug-21 0:19am
   
Thank you, Bill.
George Swan 25-Aug-21 11:31am
   
Does this work when there are more than 2 identical matching values?
Maciej Los 25-Aug-21 14:58pm
   
The easiest way to find out is to try it ;)

George, did i tell you what my last name means? "Łoś" (Los without polish signs) means Moose. Swan and Moose. Funny, i thought. :)
George Swan 25-Aug-21 15:45pm
   
I have tried it and it did not work for me but I am a bird of very little brain so I could be wrong. When I added another 'txt4' to the end of your list only one duplicate instance was removed and two were left.
Maciej Los 25-Aug-21 23:47pm
   
Well. I have tested it and it's working fine, unless i was missing the fact that there's more than one instance of text.
Hey! Do not say such of words. You're very smart person. Please, forgive me. It wasn't my intention to hurt you.
George Swan 26-Aug-21 2:27am
   
Thanks Maciej. On another point, I am a big fan of value tuples, they can be used in your example to simplify it and avoid the need to 'new up' objects which may save some time. Best wishes, George.
   //use named values
 List<(string value, int index)> uniquelist = myList.Distinct()
   //Simply declare the tuple rather than instantiate it
     .Select(x => ( x,  myList.IndexOf(x)))
     .ToList();
 List<int> indOfDup = Enumerable.Range(0, myList.Count)
   //reference the index by name rather than 'Item2'
     .Where(x => !uniquelist.Any(y => y.index == x))
     .ToList();
Maciej Los 26-Aug-21 2:57am
   
Good point! Value tuples are very useful. I prefer to use explicitly declared tuples.

BTW:
I've created .netFiddle with extra "txt4" at the end to prove that my code is working fine. Please, take a look at: RemoveDuplicatesFromOriginalList | C# Online Compiler | .NET Fiddle[^]

All the best, George!
Cheers!
Maciej
George Swan 26-Aug-21 4:16am
   
Maciej, you are quite correct, +5. In my code I added 'text4' instead of 'txt4' to the list. Please accept my apologies for a stupid mistake.
Maciej Los 26-Aug-21 5:02am
   
Thank you, George.
:)
A simpler way to find the duplicate indexes that also handles more than #1 duplicate entry:
C#
List<int> toremovestrs = new List<int>();

for (int i = 0; i < myList.Count; i++)
{
    int first = myList.IndexOf(myList[i]);
    string firststr = myList[first];

    int last = myList.LastIndexOf(myList[i]);

    if (first < last)
    {
        for (int j = first + 1; j <= last; j++)
        {
            if (myList[j] == firststr && ! toremovestrs.Contains(j))
            {
                toremovestrs.Add(j);
            }
        }
    }
}
Test:
List<string> myList = new List<string> { "txt1", "txt2", "txt3", "txt1", "txt4", "txt5", "txt4", "txt2", "txt3", "txt4","txt10", "txt10" };

// result
[0]: 3
[1]: 7
[2]: 8
[3]: 6
[4]: 9
[5]: 11
   
Comments
Maciej Los 25-Aug-21 0:20am
   
5ed!
Richard Deeming 25-Aug-21 4:14am
   
I can see some room for improvement there! :)

Eg:
for (int i = myList.Count - 1; i > 0 /* No need to check the first element */; i--)
{
    int index = myList.IndexOf(myList[i]);
    if (index < i) // This is not the first occurrence of the element
    {
        myList.RemoveAt(i);
    }
}
BillWoodruff 25-Aug-21 4:55am
   
Wonderful ! I hear the sound of one hand clapping :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900