Click here to Skip to main content
Click here to Skip to main content

Dictionary Sorting and Inverse Duplication Removal

, 10 Sep 2012
Rate this:
Please Sign up or sign in to vote.
A few fun Dictionary utiltities.

Introduction

I have been using Dictionary data containers since VB6. As neat as they are, there are some limits and my latest stabs revealed them in a big ugly way.

The biggest? Changing the dictionary. Do that and the internal keys are no longer enumeral. Toss in the lack of a decent sort and all sorts of sad things happen.

Let us say that you have this set of data:

AAA - VVV

BBB - XXX

CCC - WWW

VVV - AAA

We want to remove that VVV - AAA set, because it is simply an inverse of the first set. The option of doing this within the values of a dictionary removes many-to-many relationships, whereas the key value of the dictionary would throw an error.

Background

My project is based on the main idea of deduplication. There are many sub tables, with things like phone number and email address, that are very easy to match up. This way, I can get hard matches without a lot of work. The problem? Sorting them and INVERSE MATCHES!!!

Using the code

This code is simple to use. There are three separate routines. One sorts, the other two look for duplicates; the first checks for inverse duplicates in the values field, the other in the key/values fields. 

The first one is a basic sort of a dictionary, with a string as the key:

Public Function SortDictionaryKeyString(Unsorted As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim Working As List(Of String)
    Dim KeyPair As KeyValuePair(Of String, String)
    Dim KeyValue As String

    SortDictionaryKeyString = New Dictionary(Of String, String)

    Working = New List(Of String)

    For Each KeyPair In Unsorted
        KeyValue = KeyPair.Key.ToString
        Working.Add(KeyValue)
    Next

    Working.Sort()

    For Each Item As String In Working
        If Unsorted.ContainsKey(Item) Then
            SortDictionaryKeyString.Add(Item, Unsorted.Item(Item).ToString)
        End If
    Next

End Function

The next one is real clever - you have many to many relationships, so you have to use an index, but the data gets populated into the value with a colon separator. This allows manipulation of the string value to find inverse duplicates.

Public Function DeDupeDictionaryValues(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim KeyPair As KeyValuePair(Of String, String)
    Dim sValue As String
    Dim sTemp As String
    Dim iIdx As Int64
    Dim sSplit(2) As String

    DeDupeDictionaryValues = New Dictionary(Of String, String)

    For Each KeyPair In Dupe
        sValue = KeyPair.Value
        sSplit = Split(sValue, ":")
        sTemp = sSplit(1) & ":" & sSplit(0)
        If Not DeDupeDictionaryValues.ContainsValue(sTemp) Then
            iIdx = iIdx + 1
            DeDupeDictionaryValues.Add(iIdx, sValue)
        End If
    Next

End Function

The last one removes inverse duplications with a string, string dictionary:

Public Function DeDupeDictionary(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim Working As Dictionary(Of String, String)
    Dim KeyPair As KeyValuePair(Of String, String)
    Dim sValue As String
    Dim sTemp As String
    Dim sTemp2 As String

    DeDupeDictionary = New Dictionary(Of String, String)

    Working = New Dictionary(Of String, String)

    For Each KeyPair In Dupe
        sTemp = KeyPair.Key
        sTemp2 = KeyPair.Key
        sValue = KeyPair.Value
        If Not DeDupeDictionary.TryGetValue(sValue, sTemp) Then
            DeDupeDictionary.Add(sTemp2, sValue)
        End If
    Next

End Function

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

CorvetteGuru
Software Developer (Senior)
United States United States
I have been coding for over 25 years.
 
Started with Apple II BASIC, then in college dabbled in all sorts of languages:
 
COBOL
RPGII
Pascal
Fortran
GWBASIC
 
My career started with VAX BASIC and then went backwards with WANG MVP2200 BASIC!
 
From there, I transisitoned into mainframes, using mostly ADABAS/Natural and some COBOL. I was fully engulfed in mainframes for 15 years - TSO, JCL, syncsort, even subbing as an operator during special mass updates!
 
After 10 years in mainframes, I taught myself VB6. My next four jobs were a mixture of VB6 and mainframe.
 
My last job gave me some exposure to .NET, but since my VB6 was so good, I was left supporting the legacy system.
 
I left that job in 2010 to work for my wife and started to teach myself VB 2008. Which led to where I am now, working with VB 2010 and SQL 2012.

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 10 Sep 2012
Article Copyright 2012 by CorvetteGuru
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid