Click here to Skip to main content
15,938,218 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello All,

Am trying to work out on best possible algorithm for "who read this also read that". Basically there is a table with all the data i.e. user id and article id.

1. Say I go to article ID 11.
2. Now the system needs to find out all the users who viewed article 11.
3. Now take those user ids and find out the other articles they viewed and populate the best 3 articles for "who read this also read that"

Step 2 is easy. But there might be 1000s of users who viewed article 11. So say now system got the information 2222 users viewed article 11.

Now the confusing part is how to calculate step 3??

Any suggestions??

You may take reference from my example below:

First, you need 3 tables in your mysql database with minimum fields as follows:
1. 'reader' table with 'readerid' and 'readername' ('readid' is primary key)
2. 'article' table with 'articleid' and 'articlename' ('articleid' is primary key)
3. 'reader_article' table with 'readerid' and 'articleid' (both composite primary key)

The answer to step 3 is like this:
SELECT a.articlename, COUNT(ra1.articleid) FROM reader_article ra1 INNER JOIN article a ON ra1.articleid = a.articleid WHERE ra1.readerid IN
(SELECT  ra2.readerid  FROM reader_article ra2 WHERE ra2.articleid="11")
GROUP BY ra1.aricleid ORDER BY COUNT(ra1.articleid) DESC LIMIT 3
Share this answer

I can give you example how to do something like this in SQL Server (should be very similar in MySQL too).

First, let’s say that you have table named UserAndArticle with the following data:
ID	UserID	ArticleID
1	10	101
2	10	102
3	10	103
4	20	101
5	20	103
6	30	101
7	30	103
8	40	101
9	40	102
10	40	103
11	50	102
12	50	103
13	60	101
14	60	102
15	70	101
16	70	102
17	70	104
18	70	105
19	70	106

SQL Server script would be something like this (in MySQL you need to change TOP with LIMIT clause):
DECLARE @ArticleID INT = 102

SELECT TOP 3 t2.ArticleID,
             COUNT(t2.ArticleID) AS NoOfViews
FROM   UserAndArticle AS t1
       INNER JOIN UserAndArticle AS t2
               ON t1.UserID = t2.UserID
WHERE  t1.ArticleID = @ArticleID
       AND t2.ArticleID <> @ArticleID
GROUP  BY t2.ArticleID

In this example, I am searching the best 3 articles by the number of views who read article 102.
ArticleID	NoOfViews
101	4
103	3
105	1

Also, you may want to eliminate current user ArticleIDs from the result.

I hope this was helpful.
Share this answer
AndyInUK 2-Jan-14 5:40am    
Hello Andrius,

Thanks for your reply. I am just trying to understand your solution. So basically what you did here is -
1. Searched the users who viewed article 102
2. Then scanned all the users viewed article and on the basis of maximum views the best 3 result is displayed and the same article id should not be displayed.

If this is how this query works then great.. i wasn't expecting such a simple solution to the problem. And also how can I remove current user ArticleIDs from the result?

Andrius Leonavicius 2-Jan-14 11:03am    

It basically works as you described. You can remove current user ArticleIDs by adding this line:
AND t2.UserID <> @UserID
AndyInUK 2-Jan-14 6:14am    
I guess just adding this line should remove the current user -
AND t2.UserID <> '@UserID'
Andrius Leonavicius 2-Jan-14 11:04am    
That's right (parameter should not be in quotes).
AndyInUK 8-Jan-14 11:20am    
I think there is a small problem here. by adding AND t2.UserID <> '@UserID', it will remove that particular UserID from the list but it will not removed all the articles that particular user viewed.

It's suggesting the same articles again and again and just remove the article it's coming from.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900