I need to insert one row at a time into my target table. Before each insert, I need to scan the target table and see if the value already exists. If it does, then the uniqueID of the inserted row should match the uniqueID of the matched row in the target table. If the uniqueID does not exist, then increment the MAX uniqueID in the target table by 1 and use this as the uniqueID.
I've been told I can do this with loops and variables etc but Amazon Redshift does not support these at the moment.
My source table (tbl_source) has the fields anonID, userID and rowNum.
My target table (tbl_target) has fields anonID, userID and uniqueID
My insert is very simple in essence
INSERT into tbl_target
(select anonID, userID, XXXX
from tbl_source)
The XXX is where I need help. XXX is uniqueID.
SAMPLE DATA
╔════════╦════════╦════════╗
║ rownum ║ anonID ║ userID ║
╠════════╬════════╬════════╣
║ 1 ║ A ║ 1 ║
║ 2 ║ A ║ 2 ║
║ 3 ║ A ║ 3 ║
║ 4 ║ B ║ 5 ║
║ 5 ║ B ║ 6 ║
║ 6 ║ C ║ 7 ║
║ 7 ║ D ║ 8 ║
║ 8 ║ D ║ 9 ║
║ 9 ║ E ║ 1 ║
║ 10 ║ E ║ 2 ║
║ 11 ║ E ║ 3 ║
║ 12 ║ F ║ 9 ║
╚════════╩════════╩════════╝
To show you the logic I need, I will take this table row by row and show you what should be computed;
rowNum 1:
Search for "A" and "1" in the target table -> Neither exist in the target table (since it is empty, this is the first row bring inserted), therefore set uniqueID in the target table to 1
TARGET TABLE
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+========+========+==========+
rowNum 2:
Search for "A" and "2" in the target table -> A exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+========+========+==========+
rowNum 3:
Search for "A" and "3" in the target table -> A exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1.
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+========+========+==========+
rowNum 4:
Search for "B" and "5" in the target table -> neither exist. Therefore find the MAX uniqueID in the target table (1) and increment by 1.
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+========+========+==========+
rowNum 5:
Search for "B" and "6" in the target tale -> "B" exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+========+========+==========+
rowNum 6:
Search for "C" and "7". Neither found
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+--------+--------+----------+
| C | 7 | 3 |
+========+========+==========+
.....
.....
.....
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+--------+--------+----------+
| C | 7 | 3 |
+--------+--------+----------+
| D | 8 | 4 |
+--------+--------+----------+
| D | 9 | 4 |
+========+========+==========+
rowNum 9:
Search for "E" and "1" in the target table. "1" already exists! Therefore set the uniqueID to the same uniqueID as the row which already exists with "1" - Which is uniqueID 1.
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+--------+--------+----------+
| C | 7 | 3 |
+--------+--------+----------+
| D | 8 | 4 |
+--------+--------+----------+
| D | 9 | 4 |
+--------+--------+----------+
| E | 1 | 1 |
+========+========+==========+
RowNum 10:
Search for "E" and "2" in the target table. Both "E" and "2 already exist. In this case just return the uniqueID for the first one it finds (the uniqueID will be the same for either one).
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+--------+--------+----------+
| C | 7 | 3 |
+--------+--------+----------+
| D | 8 | 4 |
+--------+--------+----------+
| D | 9 | 4 |
+--------+--------+----------+
| E | 1 | 1 |
+--------+--------+----------+
| E | 2 | 1 |
+--------+--------+----------+
....
ROwNum 12
Search for "F" and "9" in the target table -> 9 exists. Therefore set uniqueID for 9 to the same uniqueID where 9 exists -> 4
THE FINAL TABLE then should look like this;
+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A | 1 | 1 |
+--------+--------+----------+
| A | 2 | 1 |
+--------+--------+----------+
| A | 3 | 1 |
+--------+--------+----------+
| B | 5 | 2 |
+--------+--------+----------+
| B | 6 | 2 |
+--------+--------+----------+
| C | 7 | 3 |
+--------+--------+----------+
| D | 8 | 4 |
+--------+--------+----------+
| D | 9 | 4 |
+--------+--------+----------+
| E | 1 | 1 |
+--------+--------+----------+
| E | 2 | 1 |
+--------+--------+----------+
| E | 3 | 1 |
+--------+--------+----------+
| F | 9 | 4 |
+========+========+==========+
If you wish to use my data;
CREATE TABLE tbl_source
(
rownum integer,
anonid varchar(8),
userid integer
);
insert into tbl_source
values
(1,'A',1), (2,'A',2), (3,'A',3),
(4,'B',5), (5,'B',6), (6,'C',7),
(7,'D',8), (8,'D',9), (9,'E',1),
(10,'E',2), (11,'E',3), (12,'F',9)
;
CREATE TABLE tbl_target
(
anonid varchar(8),
userid integer,
uniqueID integer
);
What I have tried:
I've been told I can do this with loops and variables etc but Amazon Redshift does not support these at the moment.