Click here to Skip to main content
15,896,154 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi.


One of our large customers have a database design where they have a classical one to many relation between to table loosely connected. It is a data table (PK incremental bigint and one VARBINARY Data columns) and a lookup table = PK incremental int, 8 columns bigint referring to the PK in the data table.


The relations is handled by the business layer. This works flawlessly.
In some rare situations they need to run through all the data in these tables mainly in order of the data table.
The design causes the reference columns in the lookup table will reference the PK in the data table with further and further distance.


They select 8 rows at a time from the data table like:
SQL
SELECT Data FROM DataTable WHERE Identifier IN (1,2,3,4,5,6,7,8)  
SELECT Data FROM DataTable WHERE Identifier IN (1,2,3,4,1000,10001,2000,2001)

The issue is that the further apart the PK in the data table is, the longer it takes so during the process things slow down.


Can any design/tricks solve this and make a boost in performance?


Regards Thomas




Posted
Updated 4-Jan-12 1:37am
v2

If you are using raw queries as you show, then you should ensure that the PK fields are properly indexed.
 
Share this answer
 
Comments
tguttesen 4-Jan-12 16:18pm    
Hi Marcus.

Identifier column is a BIGINT incremental value and the PK.
Everything is optimized.

Regards

Thomas
If the PK is created using primary key constraint, the indexing should be fine but what about the statistics? Have they been updated lately?

This could also be an optimization flaw so forcing an index scan could correct the situation especially if the actual values are passed to the dbms as paramters, not as literals.
 
Share this answer
 
Comments
tguttesen 4-Jan-12 17:34pm    
Hi Mika

BTW it is MS SQL 2005 we are talking about.
Yes the PK is created using primary key constraint, We use the IDENTITY property to ensure an incremental value.
The execution plan shows that is uses a clustered seek.

Regards

Thomas
Wendelius 4-Jan-12 18:40pm    
Ok, when have you last updated the statistics? Also do you use parameters or is the in-list muilt with literals?
[no name] 4-Jan-12 20:24pm    
Sorry to put my oar in on this one but I definitely second Mika's comment on this, we've had a similar issue with bigint columns in the past month and the 'solve' for it was to remove the stats on the table and manually build them for each individual join column, this is just an educated guess but in addition to doing this for your data table's pk column I suppose in your case it would be on the 8 bigints in the lookup table.
Wendelius 5-Jan-12 15:28pm    
No problem at all, all comments are welcome :)
tguttesen 5-Jan-12 15:17pm    
Hi.

The lookup table is loaded by the business layer that in return calls the data table.So there are no SQL built in relations.

We have updated the statistics and it has no effect :(.
We changed the call so we use parameters like and call only one row like:
SELECT Data FROM DataTable WHERE Identifier = @Identifier
This parameter option has some positive effect, but things still slow down over time.

It appears like the further from the first row the query wants to select, the longer it takes.

I must say we are talking descent times but we are running about 160.000.000 passes on the data table, so good ideas is really welcome

Regards Thomas
Hi Mika.

Thank's for your time. I have wasted some of it, sorry.
We have found the problem. It turns out that while processing the lookup table, we process it in chunks of 100.000 to conserve memory. These chunks were not sorted sequential as expected but were fragmented. This fragmentation was getting worse the more chunks we processed. So this is why the increase of time in data fetching.

But I learned to use parameters instead of literals. :)
I mentioned earlier that I tried to use a stored procedure for the calls, setting the query directly in SqlDBCommand.CommandText is more efficient. Any reason for that?

Regards Thomas
 
Share this answer
 
Comments
Wendelius 10-Jan-12 14:28pm    
No problem at all, even though the root cause was different than first suspected it's always nice to do some actual 'research' :)

Without seeing the procedure I'd guess that using the SqlDbCommand better utilizes the internal cache of plans and compiled version of the statement. But as said, just a guess :)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900