Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: SQL MySQL
Hi,
 
I've got two tables (P and R) joined by an ID column. Table P also has a TYPE column which indicates the TYPE (1 of about 100) of the entry indicated by the ID.
Table R also have a column called REF which indicates the files which the ID was loaded from.
 
What I need to know is how I can join P and R in order to give me the files (REFs) where more than one TYPE of ID was in the file.
Note that there can be more than one of the same ID in Table R but there will only be one of it in Table P.
 
Its a friday afternoon and my brain is fried....HELP!!!!
 
I need this to be as efficient as possible.
Bear in mind Table R has over 35,000,000 rows and Table P has about 30,000

E.g.
Table P
ID TYPE
1 a
2 a
3 b
4 b
5 b
 
Table R
ID REF
1 aa
1 aa
2 aa
2 aa
3 aa
4 bb
5 bb
1 cc
1 cc
1 dd
 
In the above example I would be returned REF 'aa' because it contained IDs 1, 2 and 3 which are of TYPEs 'a' and 'b'. Whereas REF 'bb' only contained IDs with a TYPE 'b' and I want the ones where there is more than one TYPE in a a file(REF).
 
Does this SQL give me what I want?
SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(p.TYPE) > 1
Posted 14-Dec-12 6:15am
Edited 14-Dec-12 6:37am
v4

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

If I'm understanding it correctly...I think you just need to add the keyword DISTINCT to your count function. So it would be:
 
SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(distinct p.TYPE) > 1 
 
See google[^] for more info.
 
------------ Update ------------
 
For performance issues, perhaps using temporary tables would help? Since R is the table with the most records, perhaps first pulling it with distinct values and using THAT table to join to P would be better? Something like this:
SELECT r.Ref FROM (SELECT DISTINCT ID, REF FROM R as r) INNER JOIN P as p ON p.ID=r.ID GROUP BY r.Ref HAVING COUNT(DISTINCT p.Type) > 1
 
Hope this helps.
  Permalink  
v2
Comments
kevin.goodman at 15-Dec-12 11:19am
   
Kschuler, althought I think you are technically right unfortunately the request times out after several hours! I knew I was close but thanks for your help.
 
As I said the R table has over 35m rows and P table about 30k.
 
Partially using your solution and introducing a third table I am able to chunk the data but this isn't really a solution to my bigger problem.
If anyone out there is able to help with the performance of the request it would be greatly appreciated.
This is my solution to chunking the requests where the table fileSubmissions JOINS table R ON the REF column
 
SELECT r.FREF FROM R asr
JOIN filesubmission as fs ON fs.REF=r.REF
JOIN P as p ON p.ID=r.ID
where fs.id > 1000 AND fs.id < 2000
group by r.REF
HAVING count(DISTINCT p.TYPE) > 1;
Kschuler at 17-Dec-12 9:35am
   
Not sure how much it will help, but I updated my solution with another thing you might want to try. Using temporary tables.
Jörgen Andersson at 18-Dec-12 7:05am
   
Kschulers answer seems to be the correct one. If the query takes hours there must be a problem with the indexing. The database is probably joining both tables into a temp table and aggregate from there, which is highly inefficient.
I'd suggest that you add two composite indexes, one on P(id,type) and one on R(ref,id), then the database can do the joining and aggregation on the fly.
<edit>reversed the indexes</edit>

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



Advertise | Privacy | Mobile
Web04 | 2.8.150302.1 | Last Updated 17 Dec 2012
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100