Originally I made a compare statement that works well. It Full joins my tables based on their table ID's. then I import an excel sheet into its own table and then compare both. However, I have not imported all my data yet. When doing research I found that if i am filling a table or tables in a data set then each table could only consist of 16,777,216 rows. However, the issue is my database is for genealogy and the SSDI (Social Security Death Index) is over 90 million names alone.
I need to be able to Full join all my tables that are in the database so i can make a true comparison when matching. mainly, I need to take the external table, the excel import, and compare this against the entire database. How can I compare my data up against hundreds of millions of individuals?
temptable.Reset()
'first make a connection string to your excel file, if you want to search for multiple excel files to pick from.
Dim excelConnection As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + files.FileName + ";Extended Properties=Excel 12.0;")
Dim excmd As New OleDbCommand("Select * from [Sheet1$]", excelConnection)
excelConnection.Open()
Dim sheet1 As String = excelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing).Rows(0)("TABLE_NAME").ToString()
'Dim dtExcelData As New DataTable()
'dtExcelData.Reset()
Dim oda As New OleDbDataAdapter((Convert.ToString("SELECT * FROM [") & sheet1) + "]", excelConnection)
oda.Fill(temptable)
excelConnection.Close()
'dipose
excelConnection.Dispose()
oda.Dispose()
excmd.Dispose()
then I insert using sqlbulkinsert from the temp table, but looking at this now since its a temp table I do not think I need this. I would just use the in memory table and not make this part of the database.
next I Full join all my tables
'connect to
Dim sqlconn1 As New SqlConnection("SQLCON")
Dim sqladaptor = New SqlDataAdapter
sqlconn1.Open()
sqladaptor.SelectCommand = New SqlCommand("Select Profile.Individual_ID, First_name, Middle_name, Last_name, Sex, Race, Place_of_birth, County_of_birth, State_of_birth, Date_of_birth, death.Death_ID, Place_of_death, State_of_death, County_of_death, Date_of_death, Cause_of_death, Cemetery_reference.Cemetery_ID, Cemetery_name, Cemetery_state, Cemetery_county, Cemetery_address, Cemetery_Section, Cemetery_Row, Cemetery_Lot, Grave, Burial_date from Profile full join Death On profile.Individual_ID = death.individual_ID full join Burial On Profile.Individual_ID = Burial.Individual_ID full join Cemetery_reference On Burial.Cemetery_ID = Cemetery_reference.Cemetery_ID ORDER BY Last_name ASC, First_name ASC, Date_of_birth ASC", sqlconn1)
sqladaptor.SelectCommand.CommandTimeout = False
sqladaptor.SelectCommand.ExecuteNonQuery()
sqladaptor.Fill(dt)
sqlconn1.Close()
sqlconn1.Dispose()
sqladaptor.Dispose()
next I rearrange my view of the temp table
Dim view As New DataView(temptable)
view.Sort = "Last_name ASC, First_name ASC, Date_of_birth ASC"
Dim temptab As DataTable = view.ToTable
temptable.Clear()
temptable = temptab
then i do my comparisons row by row
For I = 0 to temptable.rows.count
For i2 = 0 to dt.rows.count
'match code
next
next
one issue that arises is the fact that a person can be buried multiple times. I am working on showing these exhumations and reburials but each of these with an inner join would create a new row. however, the burial date will now be different than the original burial date because they were reburied unless cremated after.
so far the only constants are Date of birth and date of death. you can only be dead once and be alive once.
First_name Middle_name Last_name Gender Date_of_death Date_of_birth Place_of_birth
WILLIAM THAMES HAMMES M 1/1/1957 1/1/1900 New York
JAMES W ODONNELL M 1/2/1957 1/2/1900 New York
the code will determine if the temp table has any relevant column to any of the tables that are used in the join statement. If a record exists for that individual in the table it will update missing information. If a record does not exist then a new row will be added with the relevant information related to each table.
hopefully this additional information will help out.
What I have tried:
As of right now I have done a inner join all database tables and then compared them to a bulksql inserted temp excel table. I used sqldataadapter.fill method for both the database tables and the external data. I just need help being pushed in the right direction that would show the best method to achieve what I am looking for.