Design and Architecture Discussion Boards

View All Threads

First Prev Next

How to find the similarity between users in Twitter ? How to design a good and efficient idea?

ldaneil27-Dec-12 7:26

ldaneil

27-Dec-12 7:26

SQL

I am working on a project about data mining. my company has given me 6 million dummy customer info of twitter. I was assigned to find out the similarity between any two users. can anyone could give me some ideas how to deal with the large community data? Thanks in advance

Problem : I use the tweets & hashtag info(hashtags are those words highlighted by user) as the two criteria to measure the similarity between two different users. Since the large number of users, and especially there may be millions of hastags & tweets of each user. Can anyone tell me a good way to fast calculate the similarity between two users? I have tried to use FT-IDF to calculate the similarity between two different users, but it seems infeasible. can anyone have a very super algorithm or good ideas which could make me fast find all the similarities between users?

For example:
user A's hashtag = {cat, bull, cow, chicken, duck}
user B's hashtag ={cat, chicken, cloth}
user C's hashtag = {lenovo, Hp, Sony}

clearly, C has no relation with A, so it is not necessary to calculate the similarity to waste time, we may filter out all those unrelated user first before calculate the similarity. in fact, more than 90% of the total users are unrelated with a particular user. How to use hashtag as criteria to fast find those potential similar user group of A? is this a good idea? or we just directly calculate the relative similarity between A and all other users? what algorithm would be the fastest and customized algorithm for the problem?

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

Pete O'Hanlon27-Dec-12 7:39

Pete O'Hanlon

27-Dec-12 7:39

Is your company going to give your salary to anyone here for solving this? It's your job after all, not ours.

*pre-emptive celebratory nipple tassle jiggle* - Sean Ewington

"Mind bleach! Send me mind bleach!" - Nagy Vilmos

CodeStash - Online Snippet Management | My blog | MoXAML PowerToys | Mole 2010 - debugging made easier

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

ldaneil27-Dec-12 7:50

ldaneil

27-Dec-12 7:50

No, I am a University student, and I did not get any salary. I am just want to discuss with some coding Pro and those smart guy. I will be very appreciated if someone could give me some ideas. I think the forum is to discuss programming question, we could help each other and enhance our programming skills. I hope those capable coding Pro give me some hints. Thanks.

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

jschell27-Dec-12 9:21

jschell

27-Dec-12 9:21

You should eliminate trivial words like 'a', 'and', etc.

And then research matching algorithms, I would start with the following google string.

algorithms for set matching -string

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

ldaneil28-Dec-12 8:31

ldaneil

28-Dec-12 8:31

yes, definitely have to use String and array to process the data. However, I don't know how exactly to do it. The idea is not clear yet. Thanks very much for your reply. Wink | ;)

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

April Fans27-Dec-12 15:43

April Fans

27-Dec-12 15:43

Well - you could try find the similarities or "document distance" of and between the Twitter users by matching their tweets against each other - kind of like the way one search for plagiarism, perhaps that might work. You could start by out by searching the tweets of a particular Twitter user - using some sort of application. If I am not mistaken - I believe Twitter does have something like this available - furthermore, comparisons between and of the groups against each other can be carried out, therefore that way we can get a comparison of the similarity or "document distance" of Twitter users.

April

Comm100 - Leading Live Chat Software Provider

modified 27-May-14 8:34am.

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

ldaneil28-Dec-12 8:36

ldaneil

28-Dec-12 8:36

Thanks very much for your suggestion. I will try to do some research about document distance. To process so huge amount of data like this, normal way is definitely infeasible, have to find a good idea on how to implement it. The project's focus is the idea, the coding should be very simple, but if the idea is very lousy, the whole project will become useless. I am very appreciated for your suggestion.

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

April Fans3-Jan-13 16:46

April Fans

3-Jan-13 16:46

You're very welcome! It was what initially popped into my head - though I believe there is probably a stronger and ideal way to carry such a project out with regards to the large amounts of data you will be dealing with.

I find your project quite interesting!

Best of Luck!

With Kind Regards,

April

Comm100 - Leading Live Chat Software Provider

modified 27-May-14 8:33am.

Re: How to find the similarity between users in Twitter ? How to design a good and efficient idea?

Marc Koutzarov29-Aug-14 23:54

Marc Koutzarov

29-Aug-14 23:54

Take a look at the Levenshtein distance

Last Visit: 31-Dec-99 18:00 Last Update: 24-Apr-24 23:20

Refresh

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.