Re: When to NOT normalize? - Database Discussion Boards

Re: Compare table data on two different databases

Corporal Agarn19-Jan-11 6:05

Corporal Agarn

19-Jan-11 6:05

What database server are you using?

Re: Compare table data on two different databases

vanikanc19-Jan-11 6:59

vanikanc

19-Jan-11 6:59

sql server 2005.

Re: Compare table data on two different databases

Corporal Agarn19-Jan-11 7:18

Corporal Agarn

19-Jan-11 7:18

Example of finding records in D1 that are not in D2

SELECT D1.[DepartmentID]
      ,D1.[Name]
      ,D1.[GroupName]
      ,D1.[ModifiedDate]
FROM [AdventureWorks].[HumanResources].[Department] D1
LEFT JOIN [AdventureWorks2008R2].[HumanResources].[Department] D2
ON D1.DepartmentID = D2.DepartmentID 
WHERE D2.DepartmentID IS NULL

Notice how the table is referenced by Database.Schema.Table.

Use the where clause to compare individual columns.

Re: Compare table data on two different databases

Eddy Vluggen19-Jan-11 7:20

Eddy Vluggen

19-Jan-11 7:20

vanikanc wrote:
Is there a way to compare data of the same table in two different databases?

Yes! Multiple! What have you Googled?

On a more serious note, RedGate has a nice tool that could help out.

vanikanc wrote:
Thanks!

You're welcome Smile | :)

bcp command

vanikanc19-Jan-11 3:22

vanikanc

19-Jan-11 3:22

Hello,

When I try the followinf in sql 2005 i get an error message:
---
declare @rc varchar(8000)
select @rc = 'bcp hqsqlsrvr.web_ref.dbo.anytable_USA out c:\anytable_usa.dat -c -T'
exec xp_cmdshell @rc
---

an error occured while processing command line.

Any pointers?

thanks.

Re: bcp command

Pravin Patil, Mumbai19-Jan-11 3:58

Pravin Patil, Mumbai

19-Jan-11 3:58

Try this one :

declare @rc varchar(8000)
select @rc = 'bcp "select col1, col2 from hqsqlsrvr.web_ref.dbo.anytable_USA" out c:\anytable_usa.dat -c -T'
exec xp_cmdshell @rc

Putting the query in double quote may solve the problem..

When to NOT normalize?

Sander Rossel18-Jan-11 10:48

Sander Rossel

18-Jan-11 10:48

Hi all,

A bit of quick info before I come to my question:
My company uses SQL Server 2008, two databases for one application and almost no schemas.
Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big').
One of the databases is for storing production data and the other is for some data our application needs and user settings.

Now I recently had a discussion with my boss about one particular table in one of our databases.
It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame).
I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize?

One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table.

I know there are some rules for normalizing and denormalizing, but rules are meant to be broken Wink | ;)

So I was wondering if my boss' argument about having lesser tables is valid at all?
And I would like to see some real-life examples of people who chose not to normalize and for what reason.

Re: When to NOT normalize?

dasblinkenlight18-Jan-11 12:03

dasblinkenlight

18-Jan-11 12:03

This is a very good question - in fact, there's a lengthy discussion on the subject in my favorite SQL tuning book[^].

To me, denormalization is always an exercise in query optimization. I start off with my "ideal" (i.e. normalized) schema, populate its key parts with random but realistic data, quickly prototype my queries, and then look for particularly ugly execution plans.

What your boss had in mind is probably minimizing the number of joins: if you always look for all (or a large subset of all) settings at once, and you keep doing it over and over again, using a single table for all settings will reduce the number of joins, and therefore improve your speed. However, I would not make a decision one way or the other without running a query analyzer: optimizing without a profiler is like driving in the dark without a map.

Re: When to NOT normalize?

Mycroft Holmes18-Jan-11 13:46

Mycroft Holmes

18-Jan-11 13:46

Your boss is right and you are right, there is no magic rule that can be applied, it all comes down to "depends on requirements".

If you have a single table with an expanding number of columns to hold the settings then I would change the design. I often use a SettingType, SettingValue table and store the lot in there. Hoewever you cannot use FKs in that scenario so it is less than ideal.

I do have 1 general rule and it is based on the database usage, if it is a transactional database for supporting applications I normalise as much as is reasonable. If it is a reporting database, this includes BI support DBs then normalisation goes out the window in favour of query optimisation. Try servicing Oracle BI, it will drive a relational database person nuts.

Never underestimate the power of human stupidity
RAH

Re: When to NOT normalize?

Yusuf18-Jan-11 14:15

Yusuf

18-Jan-11 14:15

There is no specific rule as to when to normalize/denormalize. The answer is "it depends"

In general it is a trade off between Query response vs Cost of data inconsistency. Normalized data keep the data consistent at a cost of query time. See Database normalization[^] and Denormalization[^] for some descriptions in wikipedia.

Yusuf
May I help you?

Re: When to NOT normalize?

PIEBALDconsult18-Jan-11 18:39

PIEBALDconsult

18-Jan-11 18:39

Yep, it depends on the situation. I try to normalize as much as I can and then deal with troublesome processes.

One technique I have used is when it comes time to perform a bunch of queries against complex data, I make a denormalized copy of the rows I need (I hesitate to say "temporary table"). This allows me to perform the queries without tying up the actual tables -- a little like a mini reporting database that gets created on the fly I guess. This has been useful in systems with a lot of updates that require frequent display to multiple users.

Re: When to NOT normalize?

Sander Rossel19-Jan-11 4:28

Sander Rossel

19-Jan-11 4:28

Thanks for the answers Big Grin | :-D

So basically just normalise as much as possible until performance issues occur?
Since we use a relational database with lots of read AND write (orders, stock, etc.) it would be best to normalize where possible?

One thing I didn't get from the denormalizing DB article on wikipedia was the usage of views. Is making a view a way of denormalizing?
Because in a view you use joins too, but in the end you can just select from one table(view) Smile | :)

Re: When to NOT normalize?

PIEBALDconsult19-Jan-11 17:49

PIEBALDconsult

19-Jan-11 17:49

Naerling wrote:
the usage of views

Temporary denormalization, kinda like I described, but more temporary.

Re: When to NOT normalize?

Eddy Vluggen19-Jan-11 7:33

Eddy Vluggen

19-Jan-11 7:33

Naerling wrote:
One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table.

That's one argument. Where's the rest?

Naerling wrote:
I know there are some rules for normalizing and denormalizing, but rules are meant to be broken

That in itself isn't a reason to break the rules in a production environment. You can always test it on a VM, if required.

First up, I'd collect a list of arguments on why it shouldn't be normalized. You bosses' argument isn't really an argument to skip the normalization-process, a database doesn't perform better if the table-count is lower. Quite the opposite is true; if the datacollection within that table has several fields that different parts of the system uses, then updating a setting could become a costly operation since it would also have to update several indexes - possibly blocking other updates.

I tend to normalize every entity in the database, so help me Codd, to BNF. The only reason to denormalize that I stumbled across was partitioning Smile | :)

I are Troll Suspicious | :suss:

Re: When to NOT normalize?

Sander Rossel19-Jan-11 11:00

Sander Rossel

19-Jan-11 11:00

'The rest' of his arguments are: "it's easy that all settings are in one table", "you always know where to find a setting", "normalizing isn't always a good thing" and so on Poke tongue | ;-P

In practice this isn't even true, because the 'type of setting' is stored as a varchar which isn't really documented anywhere. So user settings can be found "where [category] = 'usersetting'" or "where [category] = 'setting'" and maybe "where [category] = options". And then there's a column called sub which specifies what setting it is.
Don't use a 'like' in this table, because 'setting' could also give settings on stock or sales. But if you do use '=' for 'sales' you won't find 'sales_statusses' etc... D'Oh! | :doh:

Then there's the employee column which does not hold a FK to the employee table because in most cases there is no employee for a particular setting. So I recently found lots of settings for non-existant employee's... It's really great Smile | :)

Actually there is no key or constraint at this table at all!

There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad' Smile | :)

Re: When to NOT normalize?

Eddy Vluggen21-Jan-11 4:56

Eddy Vluggen

21-Jan-11 4:56

Naerling wrote:
"it's easy that all settings are in one table"

For whom? The computer doesn't care.

Naerling wrote:
"you always know where to find a setting"

He wouldn't be able to find them if they were divided over two tables?

<religious rant>
By that same logic, let's put everything in a *single* table; that way you always know in what table your data resides. Let's put everything under a single schema, for a single user and use a single password to access that single table. How about using a single VARCHAR-column in that table? That way you know exactly what column to search on and what column to retrieve from that table.
</religious rant>

Naerling wrote:
"normalizing isn't always a good thing"

That's not even an argument, it's an opinion.

Naerling wrote:
Actually there is no key or constraint at this table at all!

..then why use a database at all?

Naerling wrote:
There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad'

On Error Resume Next Laugh | :laugh:

I are Troll Suspicious | :suss:

Re: When to NOT normalize?

Sander Rossel21-Jan-11 8:35

Sander Rossel

21-Jan-11 8:35

I couldn't agree with you more!
Luckily not every table in our DB is like that. It's really the worst example I could think of by far Wink | ;)

Yesterday I made some DB design with 9 tables and 2-3 columns per table. I had expected my boss to say I should use less tables. Instead he said I should've made 1 extra (maybe he reads this forum too?) Laugh | :laugh:

Don't start about on On Error Resume Next...
I know some very nice .NET examples of that. But that's another topic Poke tongue | ;-P

Re: When to NOT normalize?

Eddy Vluggen22-Jan-11 0:23

Eddy Vluggen

22-Jan-11 0:23

Naerling wrote:
Instead he said I should've made 1 extra (maybe he reads this forum too?)

Would've been quite the surprise if he'd mixed into the discussion Smile | :)

Naerling wrote:
Don't start about on On Error Resume Next..

$Shucks | :-\$

I are Troll Suspicious | :suss:

Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

Brian C Hart18-Jan-11 5:19

Brian C Hart

18-Jan-11 5:19

So here I have a table, TABLE1, which I copied over from a different server:

SELECT COUNT(*) FROM TABLE1 gives me, e.g., 83977 records.

However, there's another table, TABLE2, which lives on the current server, and I want to know if TABLE1 and TABLE2 have identically the same data. That's kind of a different question than I am asking here, because that question depends on how the tables are indexed, keyed etc. For now, however, I do a

SELECT COUNT(*) FROM TABLE2 and it gives 83911 records.

So the two tables differ by 66 records. How do I say to the database, "OK, what are those 66 records?"

I tried a LEFT OUTER JOIN between TABLE1 (on the left) and TABLE2 (on the right) and I join on the tables' keys and filter on the right table's keys being NULL. I get 130 rows. I'm confused, shouldn't I have 66 rows?

Brian

Sincerely Yours,
Brian Hart

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

David Skelly18-Jan-11 6:08

David Skelly

18-Jan-11 6:08

Assume table A contains keys:

a
b
c
d

and table B contains keys:

a
b
e
f

Both tables contain 4 rows.

However, an oouter join of the kind you describe would return 2 rows (c and d).

In your case, table 1 contains 130 keys which are not in table 2. Table 2 contains 64 rows which are not in table 1. Hence, the difference in size is 66 rows (130 - 64).

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

Chris Meech18-Jan-11 7:07

Chris Meech

18-Jan-11 7:07

What you are really after is a union of what is in TABLE1, that's not in TABLE2, along with what is in TABLE2, that's not in TABLE1. I'd try the following

select * from TABLE1 t1
 where not exists ( select * from TABLE2 t2 where t1.key = t2.key )
union all
select * from TABLE2 t2
 where not exists ( select * from TABLE1 t1 where t2.key = t1.key )

You might want to create a view for this, depending upon the circumstances. Smile | :)

Chris Meech
I am Canadian. [heard in a local bar]

In theory there is no difference between theory and practice. In practice there is. [Yogi Berra]

posting about Crystal Reports here is like discussing gay marriage on a catholic church’s website.[Nishant Sivakumar]

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

Brian C Hart18-Jan-11 7:18

Brian C Hart

18-Jan-11 7:18

Nice try, but TABLE1 and TABLE2 have completely different record counts (right, they are off by 66) so the database complains:

Msg 205, Level 16, State 1, Line 1<br />
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

Sincerely Yours,
Brian Hart

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

Chris Meech18-Jan-11 7:35

Chris Meech

18-Jan-11 7:35

If that is the error message, that means the the * which is being selected in each case is different. Replace the * in my select with just the key identifier columns and it should work. Smile | :)

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

David Skelly19-Jan-11 1:55

David Skelly

19-Jan-11 1:55

It's not complaining because the number of rows don't match. It's complaining because the number of columns don't match.

See Chris's previous answer for how a suggestion on how to correct the query.

Re: Bringing in a New Table to Refresh an Existing One; Record Counts are Different; How do I say "Gimme the discrepant records?"

Mycroft Holmes18-Jan-11 13:53

Mycroft Holmes

18-Jan-11 13:53

[Product plug]
We use Red-Gate SQL data compare for this job and it is excellent but a bit expensive for a single requirement.
[/Product plug]

Do an inner join between the 2 tables using EVERY column, this will identify the identical records, then select from the(s) where the Ids are not in the inner join.

Inspect this result and discard the joins that are not relevant and repeat.

When you have the critical records decide what you need to do to merge them.

Never underestimate the power of human stupidity
RAH

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.