Click here to Skip to main content
15,891,529 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

I've got two tables (P and R) joined by an ID column. Table P also has a TYPE column which indicates the TYPE (1 of about 100) of the entry indicated by the ID.
Table R also have a column called REF which indicates the files which the ID was loaded from.

What I need to know is how I can join P and R in order to give me the files (REFs) where more than one TYPE of ID was in the file.
Note that there can be more than one of the same ID in Table R but there will only be one of it in Table P.

Its a friday afternoon and my brain is fried....HELP!!!!

I need this to be as efficient as possible.
Bear in mind Table R has over 35,000,000 rows and Table P has about 30,000

E.g.
Table P
ID TYPE
1 a
2 a
3 b
4 b
5 b

Table R
ID REF
1 aa
1 aa
2 aa
2 aa
3 aa
4 bb
5 bb
1 cc
1 cc
1 dd

In the above example I would be returned REF 'aa' because it contained IDs 1, 2 and 3 which are of TYPEs 'a' and 'b'. Whereas REF 'bb' only contained IDs with a TYPE 'b' and I want the ones where there is more than one TYPE in a a file(REF).

Does this SQL give me what I want?
SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(p.TYPE) > 1
Posted
Updated 14-Dec-12 5:37am
v4
Comments
[no name] 14-Dec-12 11:38am    
Can you share your code that you have tried so far...
And what should be the o/p you want from your query?

1 solution

If I'm understanding it correctly...I think you just need to add the keyword DISTINCT to your count function. So it would be:

SQL
SELECT r.REF FROM R as r JOIN P as p ON p.ID=r.ID group by r.REF HAVING count(distinct p.TYPE) > 1 


See google[^] for more info.

------------ Update ------------

For performance issues, perhaps using temporary tables would help? Since R is the table with the most records, perhaps first pulling it with distinct values and using THAT table to join to P would be better? Something like this:
SQL
SELECT r.Ref FROM (SELECT DISTINCT ID, REF FROM R as r) INNER JOIN P as p ON p.ID=r.ID GROUP BY r.Ref HAVING COUNT(DISTINCT p.Type) > 1


Hope this helps.
 
Share this answer
 
v2
Comments
kevin.goodman 15-Dec-12 11:19am    
Kschuler, althought I think you are technically right unfortunately the request times out after several hours! I knew I was close but thanks for your help.

As I said the R table has over 35m rows and P table about 30k.

Partially using your solution and introducing a third table I am able to chunk the data but this isn't really a solution to my bigger problem.
If anyone out there is able to help with the performance of the request it would be greatly appreciated.
This is my solution to chunking the requests where the table fileSubmissions JOINS table R ON the REF column

SELECT r.FREF FROM R asr
JOIN filesubmission as fs ON fs.REF=r.REF
JOIN P as p ON p.ID=r.ID
where fs.id > 1000 AND fs.id < 2000
group by r.REF
HAVING count(DISTINCT p.TYPE) > 1;
Kschuler 17-Dec-12 9:35am    
Not sure how much it will help, but I updated my solution with another thing you might want to try. Using temporary tables.
Jörgen Andersson 18-Dec-12 7:05am    
Kschulers answer seems to be the correct one. If the query takes hours there must be a problem with the indexing. The database is probably joining both tables into a temp table and aggregate from there, which is highly inefficient.
I'd suggest that you add two composite indexes, one on P(id,type) and one on R(ref,id), then the database can do the joining and aggregation on the fly.
<edit>reversed the indexes</edit>

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900