MS SQL large table PK lookup performance boost question

Question

0.00/5 (No votes)

See more:

Hi.

One of our large customers have a database design where they have a classical one to many relation between to table loosely connected. It is a data table (PK incremental bigint and one VARBINARY Data columns) and a lookup table = PK incremental int, 8 columns bigint referring to the PK in the data table.

The relations is handled by the business layer. This works flawlessly.
In some rare situations they need to run through all the data in these tables mainly in order of the data table.
The design causes the reference columns in the lookup table will reference the PK in the data table with further and further distance.

They select 8 rows at a time from the data table like:

SQL

SELECT Data FROM DataTable WHERE Identifier IN (1,2,3,4,5,6,7,8)  
SELECT Data FROM DataTable WHERE Identifier IN (1,2,3,4,1000,10001,2000,2001)

The issue is that the further apart the PK in the data table is, the longer it takes so during the process things slow down.

Can any design/tricks solve this and make a boost in performance?

Regards Thomas

Posted 4-Jan-12 1:29am

tguttesen

Updated 4-Jan-12 1:37am

Mehdi Gholam

v2

Add a Solution

3 solutions

Solution 1

If you are using raw queries as you show, then you should ensure that the PK fields are properly indexed.

Posted 4-Jan-12 10:12am

fjdiewornncalwe

Comments

tguttesen 4-Jan-12 16:18pm

Hi Marcus.

Identifier column is a BIGINT incremental value and the PK.
Everything is optimized.

Regards

Thomas

Solution 2

If the PK is created using primary key constraint, the indexing should be fine but what about the statistics? Have they been updated lately?

This could also be an optimization flaw so forcing an index scan could correct the situation especially if the actual values are passed to the dbms as paramters, not as literals.

Posted 4-Jan-12 11:19am

Wendelius

Comments

tguttesen 4-Jan-12 17:34pm

Hi Mika

BTW it is MS SQL 2005 we are talking about.
Yes the PK is created using primary key constraint, We use the IDENTITY property to ensure an incremental value.
The execution plan shows that is uses a clustered seek.

Regards

Thomas

Wendelius 4-Jan-12 18:40pm

Ok, when have you last updated the statistics? Also do you use parameters or is the in-list muilt with literals?

[no name] 4-Jan-12 20:24pm

Sorry to put my oar in on this one but I definitely second Mika's comment on this, we've had a similar issue with bigint columns in the past month and the 'solve' for it was to remove the stats on the table and manually build them for each individual join column, this is just an educated guess but in addition to doing this for your data table's pk column I suppose in your case it would be on the 8 bigints in the lookup table.

Wendelius 5-Jan-12 15:28pm

No problem at all, all comments are welcome :)

tguttesen 5-Jan-12 15:17pm

Hi.

The lookup table is loaded by the business layer that in return calls the data table.So there are no SQL built in relations.

We have updated the statistics and it has no effect :(.
We changed the call so we use parameters like and call only one row like:
SELECT Data FROM DataTable WHERE Identifier = @Identifier
This parameter option has some positive effect, but things still slow down over time.

It appears like the further from the first row the query wants to select, the longer it takes.

I must say we are talking descent times but we are running about 160.000.000 passes on the data table, so good ideas is really welcome

Regards Thomas

Wendelius 5-Jan-12 15:37pm

Ok, could you try to rebuild the clustered index. Don't worry about possible non-clustered indexes since they are also rebuilt. If the rebuilding helps, consider doing it again with proper fillfactor because the actual problem may be splitted index pages.

If it didn't help, could you run the query in SSMS and before running set:

set statistics io on
set statistics time on
set showplan_text on

and post all the output from messages.

Wendelius 5-Jan-12 17:57pm

:) I'd really appreciate if you could post the findings even if the rebuild helps :)

tguttesen 6-Jan-12 4:31am

Hi Mika.

We have rebuilded the PK index, but think it was already fine. It has no effect.

The queries outputs:

Table 'PartsData'. Scan count 0, logical reads 4, physical reads 1, read-ahead reads 0, lob logical reads 3, lob physical reads 1, lob read-ahead reads 0.

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 14 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 14 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 18 ms.

exec sp_executesql N'SET NOCOUNT ON SELECT Data FROM PartsData WHERE Identifier = @Identifier',N'@Identifier bigint',@Identifier=-9223372036844560875

Can you explain why the parameters option is better than literals?
We have also tried to use a stored procedure, but it seems slower too.

Regards Thomas

Wendelius 6-Jan-12 15:06pm

Don't get it... the read amounts are extremely low and they match to direct index lookup. Could the LOBs be causing the problem if they are huge? You could check outside the SQL server, is the Windows suffering from paging, is the I/O controller or disk overloaded etc.

To your question about parameters, in equality comparisons they work best since the plan is both reasonable and reused.

Another question, does a single row fetch slow over time?

tguttesen 6-Jan-12 6:26am

Hi Mika.

It turns out that the Data column is an IMAGE datatype, not a VARBINARY(MAX) as first implied. Does it matter?

Reagrds Thomas

Wendelius 6-Jan-12 15:07pm

In my opinion, it doesn't matter, but consider moving to varbinaries since image data type isn't going to be supported somewhere in the future...

tguttesen 6-Jan-12 17:52pm

Hi Mika.

Thanks. The size is 97% below 4k

We are looking for other bottlenecks. It has puzzled us for some time. It might be disk. I will keep you posted.

Thomas

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

tguttesen · Accepted Answer · 2012-01-10T02:10:00

Hi Mika.

Thank's for your time. I have wasted some of it, sorry.
We have found the problem. It turns out that while processing the lookup table, we process it in chunks of 100.000 to conserve memory. These chunks were not sorted sequential as expected but were fragmented. This fragmentation was getting worse the more chunks we processed. So this is why the increase of time in data fetching.

But I learned to use parameters instead of literals. :)
I mentioned earlier that I tried to use a stored procedure for the calls, setting the query directly in SqlDBCommand.CommandText is more efficient. Any reason for that?

Regards Thomas