Click here to Skip to main content
15,891,629 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Okay, I am making an application in VB.NET and I need to solve a problem.
I am using accdb database but if necessary I could use any other type it doesn't matter as long as it does the job.

I need to search a database for a specific entry and return if there is one or not.
Now there is a problem that the database is extremely large, I am talking about 100.000.000(hundred million) maybe even 200.000.000(two hundred million) entries so if I use...

SELECT Count(*) FROM Table1 WHERE Column1=Entry1


...it can take a while for my application to do the search for e.g. 100 entries.

Does anyone know what is the fastest possible way of doing this. Thanks
Posted

Add an index on Column1. Indexes basically provide a very fast way to look up rows. Rather than scan the entire table, the index can be looked up and you can find the row in O(1) or O(log(N)) time.
 
Share this answer
 
Comments
AspDotNetDev 26-Sep-10 13:52pm    
http://www.ehow.com/how_2020413_create-index-access.html

Also, for a large database, you may want to consider SQL Server rather than Access.
Holc 27-Sep-10 3:35am    
I was considering something like that. To create a two hundred million random entry database in advance and then just simply provide the user with a next consecutive entry. This could work but i need to do the whole procedure more than once, what i mean is that the same thing is to be done let's say once a week, two hundred million entries once a week, non cumulative (i would clear the database every week). Or another solution would be to have those same 200 million entries and let the database randomize the order (if that is possible, i don't know), that would work like a charm.
It might be faster to store the entries in a temp table and join on Column1, group them and count each group. Should be something like the sql below (untested however)

SELECT Count(*) FROM Table
  INNER JOIN Temp
  ON Table.Column1=Temp.Column1
  GroupBy Table.Column1


Good luck!
 
Share this answer
 
Indexing!

In order to have faster search you need to put few indexes on your table. For start read this[^].
If needed, Google for more.
 
Share this answer
 
I know about indexing, did that prior to anything, however, the time needed to complete the operation is enormous. Here is the code (a bit altered but it is pretty much the same)

VB
Dim connString As String = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & Application.StartupPath & "\Database.accdb;Persist Security Info=False"
Dim con As New OleDb.OleDbConnection
Dim command As New OleDb.OleDbCommand
Dim sql As String


Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

con.Open()
For i = 1 To 200000000
    Randomize()
    randv = CInt(Int(200000000 * Rnd() + 1))
'test if entry exists
    sql = "SELECT Count(*) FROM Table1 WHERE Column1='" & randv & "'"
    command = New OleDb.OleDbCommand(sql, con)
    count = command.ExecuteScalar()
    If count = 0 Then
'insert non existing entry
        sql = "INSERT INTO Table1 VALUES ('" & randv & "')"
        command = New OleDb.OleDbCommand(sql, con)
        command.ExecuteNonQuery()
    Else
        i -= 1
    End If
    '   con.Close()
    Application.DoEvents()
Next i
con.Close()
End Sub


There is just this one column in the table.

Ok, now I've coded in a 5 second timer that will tell me estimated time remaining for all 200000000 entries to be inserted and it says 6 days.

To even more complicate the things the data entered into the database are not numbers but strings.

Any faster solutions or i just have to live with this.
I could try SQL but i really doubt there would be a significant change.
Is there a better(faster) way to store this kind of data other than a database (this should be a part of a web application, however this is coded for testing purposes only).
 
Share this answer
 
v2
Comments
AspDotNetDev 26-Sep-10 18:02pm    
Ah, I thought the number of records was 200 million. I didn't know it was the number of SELECT statements you are performing. In that case, I recommend you bulk insert all 200 million records into the database in a temp table then perform a join as another person suggested. I know SQL Server supports a bulk insert.. not sure if Access does. Shouldn't take more than a couple seconds or (maybe) minutes.
AspDotNetDev 26-Sep-10 18:54pm    
Also, if the strings are long, you can first do a bulk insert of the hash codes of each string. Then, once you have those results, you can rule out the false positives by bulk inserting only those records that were positive for having a match, and then do the search again. Though that would require an extra column to store and index the hash code of each string.
Holc 27-Sep-10 1:11am    
Sorry for the inconvenience, i might have mislead you with the code i posted. That is the whole problem, i can't do that, this ought to be a web based application and it should work something like this.

A visitor of the website(containing this app) would click a button on the website and the app would generate a random string (let's say a code), however, i have to guarantee that every string generated is unique, so i need to check if the string(code) exists in the database and if it doesn't then insert it into the database, if it does then generate another string. It is impossible to introduce any kind of fashion to those strings, they have to be very random. There is a reasonable thought that there would be more than 100.000.000(approaching 200.000.000) clicks within a certain time period (maybe 2-3 days). Therefore it is not possible for me to use bulk statements since i have to run the app (function within it) for every visitor that clicks on the button.
Now the code i posted roughly simulates clicks on the button, and in such state the time needed to complete this kind on operation is enormous, it would not be very nice if a visitor would have to wait 3 days for his string(code).

Perhaps this is a better explanation of what my problem is. Thanks for any help.
AspDotNetDev 27-Sep-10 2:12am    
You could make the user wait, say, 5 minutes (or whatever time period). Build up a few requests, then perform them in bulk. Not sure about Access, but SQL Server has a random function, which means you could avoid a few of those round trips by doing everything in the database with a stored procedure. Also, you might consider generating the random numbers in advance, then picking from them using a field to toggle whether or not the number was already used (you'd put an index on that field). Or you could use a GUID. Or when you make a call to the database, return 100 valid random numbers and that way you can reduce the number of calls to the server. Let me know how some of that works out for you (FYI, I don't get notified via email when you comment on your own answer... comment on mine instead).
E.F. Nijboer 27-Sep-10 11:32am    
I would definitely use the session id, current server time and maybe some other simple generated value to eliminate the need for those excessive search operations. You could also store those 3 values separately and use those 3 as primary key. You can then also identify different users and their activity if you would like. To do a search is a waste of time and a disaster waiting to happen as the database will keep growing and slow down with it. You also might want to think about some compression/efficiency scheme because it will otherwise explode anyhow. Maybe nice if you could somehow explain the goal of storing all those unique values by clicking some button.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900