Click here to Skip to main content
15,884,425 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I know in SQL, you need to specify what are the attributes you want to select and find out if those attributes are similar, like you can count all records that have same values in column a, b and c.

What I'm asking is ,it possible to search all records to see what records have similar attributes values and sort of like group them as group 1 (count them like n results found) then show what are those column/fields they have similar values with? Hope you get my point in my crazy question.

My dataset is composed mostly of textual data and time.

Most like this is what is looks like:
http://img14.imageshack.us/img14/6015/84e2.png[^]

Thanks in advanced.
Posted
Updated 6-Sep-13 7:02am
v3
Comments
phil.o 6-Sep-13 12:44pm    
Not really clear :)
Maybe you could give us a short example of what you are looking for based on a limited example set of data ?
Virtually everything is possible in SQL, the big thing is to define exactly what we want to get.
ZurdoDev 6-Sep-13 13:08pm    
Can you just do your grouping in SQL?
Gynehs 6-Sep-13 13:40pm    
Sorry about my weird question, i was just wondering if is it possible to cluster similar records by their attributes in SQL.

Do you mean that you want to find the number of rows (groups) where you have two or more attributes that are identical? As below?

SQL
-- Some dummy data to demonstrate.
create table test
(
 a varchar(10) null,
 b varchar(10) null,
 c varchar(10) null,
 d varchar(10) null,
 e varchar(10) null,
 f varchar(10) null
)


insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','albert','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')

-- Count the number of rows having identical values of a,b,c,d,e,f
select 
    count(*) as groupCount,
    a,b,c,d,e,f
from test
group by 
  a,b,c,d,e,f

groupCount  a          b          c          d          e          f
----------- ---------- ---------- ---------- ---------- ---------- ----------
1           fred       albert     sheila     wibble     wubble     womble
2           fred       ethel      sheila     wibble     wobble     womble
4           fred       jim        sheila     wibble     wobble     womble
1           fred       jim        sheila     wibble     wubble     womble

(4 row(s) affected)

select 
    count(*) as groupCount,
    b,c,d
from test
group by 
   b,c,d

groupCount  b          c          d
----------- ---------- ---------- ----------
1           albert     sheila     wibble
2           ethel      sheila     wibble
5           jim        sheila     wibble

(3 row(s) affected)
 
Share this answer
 
v2
Well, your question is vague but any way it is interpreted, the answer is "yes".

I believe you want to ask Google about "Map reduce" and apply it to your data while reading each of the records.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900