JOIN with probable GROUP clause

Question

0.00/5 (No votes)

See more:

Hi,

I've got two tables (P and R) joined by an ID column. Table P also has a TYPE column which indicates the TYPE (1 of about 100) of the entry indicated by the ID.
Table R also have a column called REF which indicates the files which the ID was loaded from.

What I need to know is how I can join P and R in order to give me the files (REFs) where more than one TYPE of ID was in the file.
Note that there can be more than one of the same ID in Table R but there will only be one of it in Table P.

Its a friday afternoon and my brain is fried....HELP!!!!

I need this to be as efficient as possible.
Bear in mind Table R has over 35,000,000 rows and Table P has about 30,000

E.g.
Table P
ID TYPE
1 a
2 a
3 b
4 b
5 b

Table R
ID REF
1 aa
1 aa
2 aa
2 aa
3 aa
4 bb
5 bb
1 cc
1 cc
1 dd

In the above example I would be returned REF 'aa' because it contained IDs 1, 2 and 3 which are of TYPEs 'a' and 'b'. Whereas REF 'bb' only contained IDs with a TYPE 'b' and I want the ones where there is more than one TYPE in a a file(REF).

Does this SQL give me what I want?
SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(p.TYPE) > 1

Posted 14-Dec-12 5:15am

kevin.goodman

Updated 14-Dec-12 5:37am

v4

Add a Solution

Comments

[no name] 14-Dec-12 11:38am

Can you share your code that you have tried so far...
And what should be the o/p you want from your query?

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Kschuler · Accepted Answer · 2012-12-14T11:21:00

Solution 1

If I'm understanding it correctly...I think you just need to add the keyword DISTINCT to your count function. So it would be:

SQL

SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(distinct p.TYPE) > 1

See google[^] for more info.

------------ Update ------------

For performance issues, perhaps using temporary tables would help? Since R is the table with the most records, perhaps first pulling it with distinct values and using THAT table to join to P would be better? Something like this:

SQL

SELECT r.Ref FROM (SELECT DISTINCT ID, REF FROM R as r) INNER JOIN P as p ON p.ID=r.ID GROUP BY r.Ref HAVING COUNT(DISTINCT p.Type) > 1

Hope this helps.

Posted 14-Dec-12 11:21am

Kschuler

Updated 17-Dec-12 3:35am

v2

Comments

kevin.goodman 15-Dec-12 11:19am

Kschuler, althought I think you are technically right unfortunately the request times out after several hours! I knew I was close but thanks for your help.

As I said the R table has over 35m rows and P table about 30k.

Partially using your solution and introducing a third table I am able to chunk the data but this isn't really a solution to my bigger problem.
If anyone out there is able to help with the performance of the request it would be greatly appreciated.
This is my solution to chunking the requests where the table fileSubmissions JOINS table R ON the REF column

SELECT r.FREF FROM R asr
JOIN filesubmission as fs ON fs.REF=r.REF
JOIN P as p ON p.ID=r.ID
where fs.id > 1000 AND fs.id < 2000
group by r.REF
HAVING count(DISTINCT p.TYPE) > 1;

Kschuler 17-Dec-12 9:35am

Not sure how much it will help, but I updated my solution with another thing you might want to try. Using temporary tables.

Jörgen Andersson 18-Dec-12 7:05am

Kschulers answer seems to be the correct one. If the query takes hours there must be a problem with the indexing. The database is probably joining both tables into a temp table and aggregate from there, which is highly inefficient.
I'd suggest that you add two composite indexes, one on P(id,type) and one on R(ref,id), then the database can do the joining and aggregation on the fly.
<edit>reversed the indexes</edit>