no duplicates in array

Question

4.95/5 (19 votes)

See more:

Hi there

I have a file with allot of sentences. I need to make a dictionary with the words from that file. Until now I've separated the words and sort them using Split() and Sort() methods. My problem is to make a list without duplicate words. How can I do that?

C#

static int n = 0;

public static string[] NoDuplicate(string[] array)
{
    int i;
    string[] res = (string[])array.Clone();
    for (i = 0; i < array.Length-1; i++)
	{
        if (array[i + 1] != array[i])
            res[n++] = (string)array[i];
	}
    return res;
}

1) how can I do it more neat?
2) i don't like that method because is initialized using Clone() and the length is too big.

many thx

Posted 12-Nov-08 22:57pm

duta

Updated 30-Nov-11 21:53pm

Eduard Lu

v4

Add a Solution

Comments

[no name] 1-Dec-11 3:54am

EDIT: added "code" tag

13 solutions

Solution 9

Hello!Why don't you use HashTable Which has Key and Value Properties ?
Using HashTable Will solve your problem.

Regards ,
Hemanth Kumar VLN

Posted 26-Aug-10 23:00pm

hemantwithu

Comments

Christian Graus 27-Aug-10 5:03am

This question is two years old ( not sure how that is possible ) ? Why are you adding to a question that's been well and truly answered?

Solution 3

This is a method to remove the duplicates and then sort the result in the neatest way possible :)

C#

public string[] FilterAndSort(string[] array)
{
    List<string> retList = new List<string>();
    foreach (string item in array)
    {
        if (!retList.Contains(item))
        {
            retList.Add(item);
        }
    }
    retList.Sort();
    return retList.ToArray();
}

Posted 9-Jun-10 7:19am

Hesham_h4

Comments

Dimitri Witkowski 9-Jun-10 13:36pm

It's performance will be O(n*n), this is the worst way to achieve the goal

Toli Cuturicu 26-Aug-10 8:49am

Reason for my vote of 3
poor performance

Solution 12

If it were me, I wouldn't use an array. Instead, I'd use a List. Arrays can't grow/shrink to contain their content (without some manual manipulation).

However, since your spec is currently for an array, here's an extension method that you can use to add unique strings to your array. It assumes that your array is already allocated to the appropriate size, but attempts a sanity check before adding the item.

C#

public static class ExtensionMethods
{
    public static bool AddUnique(this array[] collection, string text, bool caseSensitive)
    {
        bool added = false;
        var count = (from item in collection 
                     where (caseSensitive) ? item == text : item.ToLower() == text.ToLower() 
                     select item).Count();
        if (count == 0 && collection.GetUpperBound() > collection.Length)
        {
            collection[collection.Length] = text;
            added = true;
        }
        return added;
    }
}

Posted 27-Jan-11 3:20am

#realJSOP

Updated 27-Jan-11 3:21am

v3

Comments

Nish Nishant 27-Jan-11 9:52am

Wow, what's with everyone answering a 2 year old thread?

#realJSOP 27-Jan-11 12:22pm

I didn't notice the date, It was at the top of the list so I looked at it. I think I suggested last year that questions be locked against answering/editing after they got to be a certain age so this kind of thing wouldn't happen.

Nish Nishant 27-Jan-11 12:34pm

Yeah it wasn't you, someone else replied to it bringing it to the top!

thatraja 27-Jan-11 21:19pm

I remember this movie & dialogue
"Program Alice Activated" - Resident Evil: Apocalypse (2004)
LOL :):):)

Solution 13

'Collection' is the answer for your question.

You can use any collection object with Key, Value pair like HashSet, Dictionary, etc. On finding every word, you insert that word as Key. On fetching the duplicate words, the existing key will be replaced with the newly read same word. Thus, we can avoid the duplicates automatically.

Beycnd this, you can count the number of occurences along with this operation.

Posted 3-Aug-11 22:35pm

Ganesan Senthilvel

Solution 14

Your solution is already close to the right one. You just don't have to clone the array, just edit it in-place.

C#

int i, j;

// Remove duplicates from the sorted array, by shifting the elements
for (i= 1, j= 1; i < array.Length; i++)
{
    if (array[i] != array[i - 1])
    {
        // Different element, keep it
        array[j]= array[i];
        j++;
    }
}

In the end, j contains the number of valid entries in array.

Simple and effective. You can resize the array but this involves a copy.

[I am not questioning the approach. Creating the array, sorting then purging will do the trick. This will take time proportional to N.Log(N).L for N words of L characters (on average) and require storage of N.L characters.

Using a hash table as some suggested will tend to reduce the time to N.L and the storage to N'.L, where N' denotes the number of distinct words. But there will be some overhead on time and space.

Telling the best approach would take some comparative experimentation.]

Posted 30-Nov-11 21:41pm

YvesDaoust

Updated 30-Nov-11 21:56pm

v3

Solution 7

pls check this .. am not tested

string[] testing(string[] StringArray)
{
List<string> StringList = new List<string>();
foreach (string str in StringArray)
{
if (!StringList.Contains(str))
{
StringList.Add(str);
}
}
return StringList.ToString();
}

Posted 3-Jul-10 23:20pm

ajith-k-rajagopalan

Comments

raju melveetilpurayil 25-Jul-10 12:24pm

Reason for my vote of 1
function return type is string[], but code return string

Toli Cuturicu 26-Aug-10 8:51am

Reason for my vote of 1
Does not even complile. What about making some sense?

Solution 10

If all your words are in an array you can directly get distinct words.
string[] distinctArray = myarray.Select(word => word.Trim()).Distinct().ToArray

Posted 20-Oct-10 22:11pm

prateekg110

Comments

Rajesh Anuhya 21-Oct-10 4:14am

this is question posted in year 2008 .., why are u answering now..,

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Simon P Stevens · Accepted Answer · 2008-11-13T01:14:00

Solution 1

Take a look at the HashSet<String>[^] class (.net 3.5 only). It provides an optimised hash collection and it doesn't allow duplicates, (it just ignores attempts to add duplicates), and you can call ToArray() when you are done with it if you really need a string array.

Posted 13-Nov-08 1:14am

Simon P Stevens

Updated 27-Jan-11 16:09pm

Chris Maunder772.9K

v2

Comments

Andrew Rissing 9-Jun-10 13:59pm

If you don't have .NET 3.5, you can also use Dictionary<string, object=""> and just set the value to null in all cases. As an alternative to HashSet.

Xeqtr 27-Jun-10 19:07pm

Reason for my vote of 5
Useful thing

Dalek Dave 27-Aug-10 5:02am

Agreed.

Prosanta Kundu online · Accepted Answer · 2010-06-29T23:27:00

Solution 5

Here is your code using HashSet:

public static string[] RemoveDuplicates(string[] s)
{
    HashSet<string> set = new HashSet<string>(s);
    string[] result = new string[set.Count];
    set.CopyTo(result);
    return result;
}
</string></string>

Posted 29-Jun-10 23:27pm

Prosanta Kundu online

Comments

Laxman Auti 30-Jun-10 6:34am

Reason for my vote of 5
The method provided is very efficient as compared to provided in other answers.

Dalek Dave 27-Aug-10 5:02am

Good call.

Laurence1234 27-Jan-11 8:00am

Can anyone give a more formal proof of the time complexity?

Alessandro Cislaghi · Accepted Answer · 2010-06-19T07:12:00

You can also filter the inputted array avoid defining a predicate.

If the inputted array is not null, the method allocates a List<string>.

The allocated collection is filled iteratively (maintaining the linear complexity) adding each element if it has not been not prevously inserted.

Finally, the collection is sorted and converted to string[]

public static string[] NoDuplicate(string[] inputSentences)
{
    if (inputSentences == null)
        return inputSentences;

    Debug.Assert(inputSentences != null);

    List<string> sentences = new List<string>();

    foreach (string inputSentence in inputSentences)
    {
        if (sentences.Contains(inputSentence) == true)
            continue;

        sentences.Add(inputSentence);
    }

    Debug.Assert(sentences != null);
            
    sentences.Sort();
    return sentences.ToArray();
}

Toli Cuturicu · Accepted Answer · 2010-01-16T07:36:00

You can use List<string>.RemoveAll(Predicate<string> match)
Try this (Microsoft says it is O(n)):

private static string PreviousItem;

private static bool Match(string item)
{
    bool result = (item == PreviousItem);
    PreviousItem = item;
    return result;
}

public static string[] NoDuplicates(string[] input)
{
    PreviousItem = null;
    List<string> result = new List<string>(input);
    result.Sort();
    result.RemoveAll(Match);
    return result.ToArray();
}

Hiren solanki · Accepted Answer · 2011-01-27T02:59:00

Solution 11

I don't think linq was available at that time. but now it's quite easy to do with linq.

C#

string[] str = new string[] { "Hiren", "Solanki", "Hiren" };
        List<String> lst = str.ToList<String>();
        lst = lst.Distinct().ToList<string>();
        str = lst.ToArray();

Posted 27-Jan-11 2:59am

Hiren solanki

Comments

Nish Nishant 27-Jan-11 9:53am

Why reactivate a 2 year old thread?

Nish Nishant 27-Jan-11 12:33pm

BTW the 1 vote was not me :-)

Hiren solanki 28-Jan-11 0:29am

I just seen that it isn't accepted still, BTW don't worry about 1 vote my 5 question intentionally downvoted by some platinum memeber. I am not worrying about votes now.

**Kunal Chowdhury «IN»** · Accepted Answer · 2010-07-08T18:55:00

Solution 8

The simple approach I will give you to convert the array to a List and the use the list.Contains(key) to check whether the key is present in the list and then add it to the dictionary.

Cheers...

Posted 8-Jul-10 18:55pm

Kunal Chowdhury «IN»

Comments

Toli Cuturicu 26-Aug-10 8:50am

Reason for my vote of 3
very inefficient

no duplicates in array

13 solutions

Solution 1

Solution 5

Solution 4

Solution 2

Solution 11

Solution 8

Solution 9

Solution 3

Solution 12

Solution 13

Solution 14

Solution 7

Solution 10

Add your solution here

Preview 0