5,693,062 members and growing! (14,655 online)
Email Password   helpLost your password?
Languages » VB.NET » General     Intermediate

DataTable with SelectDistinct in VB

By vacen2

Allows you to perform a Distinct query against a Datatable.
SQL, VB 8.0, VB, Windows, .NET, .NET 2.0, ADO.NET, SQL 2000, SQL 2005, VS2005, SQL Server, Visual Studio, DBA, Dev

Posted: 4 Oct 2007
Updated: 11 Oct 2007
Views: 14,073
Bookmarked: 17 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
4 votes for this Article.
Popularity: 2.07 Rating: 3.44 out of 5
0 votes, 0.0%
1
0 votes, 0.0%
2
2 votes, 50.0%
3
1 vote, 25.0%
4
1 vote, 25.0%
5
Note: This is an unedited contribution. If this article is inappropriate, needs attention or copies someone else's work without reference then please Report This Article
Screenshot - Demo_VB.gif

Introduction

The ADO.Net objects provided with VS.Net allow you to retrieve data from a database or other sources and stores this data in objects called DataSets or DataTables. In fact, the DataSet is a collection of DataTables. One commonly used method available with DataTables is Select(). It allows a developer to perform simple filtering against the DataTable and returns the result as a collection of DataRows. Presently, there is no way to perform a Distinct Query against the DataTable. I just found out that there is a way to perform a distinct query in .Net. The DataView object exposes a method called ToTable(). This method expects Boolean flag to determine distinct or not and the fields you wish to distinct on. While this method actually does return distinct rows from a DataTable, it appears that it should only be used for small recordsets. When used on a DataTable with 40,000 rows, it took about 18 seconds. I would imagine that it is pretty quick with a hundred or so. But, if you wish to distinct a large table, there is no better way. Until now.

Background

I had recently encountered a problem with a server being too slow to process user requests for the large amount of data that was returned. I needed to retrieve the data and completely manipulate the data in memory to reduce the number of trips made to SQL Server. One major obstacle was the fact that I needed to apply a Distinct query against my DataTable. I quickly found out that it wasn't possible. There was an article from Microsoft that demonstrated a simple Distinct, but it wasn't robust enough to serve my needs. It only allowed for single column to be specified and I needed an unknown number of columns at different times. I worked and reworked the Microsoft example until I was finally able to handle multiple columns, filtering/conditions and sorting.

Using the code

Within the object there is only one major method called SelectDistinct(). It is overloaded about ten times to allow flexibility in coding. To use my object, simply instantiate the object and call the SelectDistinct method.

I have included a demo application that will allow you to specify a Sql Connection string and a query or stored procedure, so that you can test the performance and accuracy of the SelectDistinct Object.

dsh = New clsDSH(dt)
Dim dt2 as DataTable = dsh.SelectDistinct(Fields,Filter,Sort)
'Or
dsh = New clsDSH()
Dim dt2 as DataTable = dsh.SelectDistinct(dt,Fields,Filter,Sort)

The code examples assume that "dt" has been defined and filled already. Take note that the "dsh" object in both cases will actually retain a copy of the datatable and can be accessed via the .Table property.

After completing the object, I discovered that it could be made to run faster. I originally did the whole thing with two DataTables, taking only the unique rows and placing them into the new table and returning that table from the object. To my dismay, that was slower than anyone could have imagined. To make a very long story short, I ended up hashing each datarow and using that hash to compare and produce a distinct table. Unfortunately, the DataRow.GetHashCode() method wasn't distinct enough to be utilized. Luckily, I found that if I carefully converted each DataRow into a string and then executed String.GetHashCode() it worked. And it worked fast. Currently, the object can take 40,000 records and turn them into around 2000 distinct records in a little over a second! Maybe it could be faster, maybe not. **UPDATE** - Based on some of the code provided to me by srkinyon, I rewrote the object to incorporate some of his ideas and reorganized and stripped out anything that was no longer necessary. That includes the previously mentioned hashing. Now the object is capable of performing a distinct on a DataTable with over 40,000 rows in about 0.6 second. That's ~3 times faster than it was before. Now that's fast. I have to admit though that I had a slight redundancy in my code as well. Although, the majority of the speed increase seems to have come from the fact that I am now storing the datacolumn values as the objects they are, "object" instead of converting them to a string and then hashing that. I also have to admit that hashing isn't reliable enough to be used for distinct purposes. So, kudos to srkinyon. If anyone else can make it even faster or better, don't hesitate.

I also added a couple of extra properties for the heck of it. One is RecordCount, which should be self explanatory. Also, ElapsedTime which stores the total time the last Distinct call took. A slight waste of time, but I wanted to know how fast the object is.

Points of Interest

I found out during my development cycle that even the smallest thing can slow you down tremendously. Even a .Trim() or a .Split(). So use them wisely.

Also, I wrote the same object in C# which is also available on this site, so you can have it both ways:) Incidentally, there was no real difference in speed between the two languages.

History

None

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

vacen2



Occupation: Web Developer
Location: United States United States

Other popular VB.NET articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 11 of 11 (Total in Forum: 11) (Refresh)FirstPrevNext
GeneralCompare with LINQmemberMichael Freidgeim13:00 22 Oct '07  
GeneralIndex?memberdherrmann3:23 18 Oct '07  
General*.mdbmember-tusk-13:31 16 Oct '07  
GeneralThanks & An improved version...membersrkinyon19:40 10 Oct '07  
GeneralRe: Thanks & An improved version...membervacen29:49 11 Oct '07  
GeneralRe: Thanks & An improved version...membersrkinyon10:45 11 Oct '07  
GeneralThanks for sharingmember-tusk-17:27 7 Oct '07  
GeneralBubblememberjabryanwalmart8:06 4 Oct '07  
GeneralRe: Bubblemembervacen29:06 4 Oct '07  
GeneralRe: Bubblemembersrkinyon19:46 10 Oct '07  
Generalthanksmemberpeter gabris7:52 4 Oct '07  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 11 Oct 2007
Editor:
Copyright 2007 by vacen2
Everything else Copyright © CodeProject, 1999-2008
Web07 | Advertise on the Code Project