Conditional row by row insert SQL - if value exists in the target table, tag the inserted row - redshift

Question

0.00/5 (No votes)

See more:

I need to insert one row at a time into my target table. Before each insert, I need to scan the target table and see if the value already exists. If it does, then the uniqueID of the inserted row should match the uniqueID of the matched row in the target table. If the uniqueID does not exist, then increment the MAX uniqueID in the target table by 1 and use this as the uniqueID.

I've been told I can do this with loops and variables etc but Amazon Redshift does not support these at the moment.

My source table (tbl_source) has the fields anonID, userID and rowNum.
My target table (tbl_target) has fields anonID, userID and uniqueID

My insert is very simple in essence

SQL

INSERT into tbl_target
    (select anonID, userID, XXXX
     from tbl_source)

The XXX is where I need help. XXX is uniqueID.

SAMPLE DATA

╔════════╦════════╦════════╗
║ rownum ║ anonID ║ userID ║
╠════════╬════════╬════════╣
║      1 ║ A      ║      1 ║
║      2 ║ A      ║      2 ║
║      3 ║ A      ║      3 ║
║      4 ║ B      ║      5 ║
║      5 ║ B      ║      6 ║
║      6 ║ C      ║      7 ║
║      7 ║ D      ║      8 ║
║      8 ║ D      ║      9 ║
║      9 ║ E      ║      1 ║
║     10 ║ E      ║      2 ║
║     11 ║ E      ║      3 ║
║     12 ║ F      ║      9 ║
╚════════╩════════╩════════╝

To show you the logic I need, I will take this table row by row and show you what should be computed;

rowNum 1:

Search for "A" and "1" in the target table -> Neither exist in the target table (since it is empty, this is the first row bring inserted), therefore set uniqueID in the target table to 1

TARGET TABLE

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+========+========+==========+

rowNum 2:

Search for "A" and "2" in the target table -> A exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+========+========+==========+

rowNum 3:

Search for "A" and "3" in the target table -> A exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1.

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+--------+--------+----------+
| A      | 3      | 1        |
+========+========+==========+

rowNum 4:

Search for "B" and "5" in the target table -> neither exist. Therefore find the MAX uniqueID in the target table (1) and increment by 1.

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+--------+--------+----------+
| A      | 3      | 1        |
+--------+--------+----------+
| B      | 5      | 2        |
+========+========+==========+

rowNum 5:

Search for "B" and "6" in the target tale -> "B" exists. Therefore set the uniqueID of the new row to the SAME uniqueID as in the target table = 1

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+--------+--------+----------+
| A      | 3      | 1        |
+--------+--------+----------+
| B      | 5      | 2        |
+--------+--------+----------+
| B      | 6      | 2        |
+========+========+==========+

rowNum 6:

Search for "C" and "7". Neither found

    +========+========+==========+
    | anonID | userID | uniqueID |
    +========+========+==========+
    | A      | 1      | 1        |
    +--------+--------+----------+
    | A      | 2      | 1        |
    +--------+--------+----------+
    | A      | 3      | 1        |
    +--------+--------+----------+
    | B      | 5      | 2        |
    +--------+--------+----------+
    | B      | 6      | 2        |
    +--------+--------+----------+
    | C      | 7      | 3        |
    +========+========+==========+
.....
.....
.....

    +========+========+==========+
    | anonID | userID | uniqueID |
    +========+========+==========+
    | A      | 1      | 1        |
    +--------+--------+----------+
    | A      | 2      | 1        |
    +--------+--------+----------+
    | A      | 3      | 1        |
    +--------+--------+----------+
    | B      | 5      | 2        |
    +--------+--------+----------+
    | B      | 6      | 2        |
    +--------+--------+----------+
    | C      | 7      | 3        |
    +--------+--------+----------+
    | D      | 8      | 4        |
    +--------+--------+----------+
    | D      | 9      | 4        |
    +========+========+==========+

rowNum 9:

Search for "E" and "1" in the target table. "1" already exists! Therefore set the uniqueID to the same uniqueID as the row which already exists with "1" - Which is uniqueID 1.

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+--------+--------+----------+
| A      | 3      | 1        |
+--------+--------+----------+
| B      | 5      | 2        |
+--------+--------+----------+
| B      | 6      | 2        |
+--------+--------+----------+
| C      | 7      | 3        |
+--------+--------+----------+
| D      | 8      | 4        |
+--------+--------+----------+
| D      | 9      | 4        |
+--------+--------+----------+
| E      | 1      | 1        |
+========+========+==========+

RowNum 10:

Search for "E" and "2" in the target table. Both "E" and "2 already exist. In this case just return the uniqueID for the first one it finds (the uniqueID will be the same for either one).

    +========+========+==========+
    | anonID | userID | uniqueID |
    +========+========+==========+
    | A      | 1      | 1        |
    +--------+--------+----------+
    | A      | 2      | 1        |
    +--------+--------+----------+
    | A      | 3      | 1        |
    +--------+--------+----------+
    | B      | 5      | 2        |
    +--------+--------+----------+
    | B      | 6      | 2        |
    +--------+--------+----------+
    | C      | 7      | 3        |
    +--------+--------+----------+
    | D      | 8      | 4        |
    +--------+--------+----------+
    | D      | 9      | 4        |
    +--------+--------+----------+
    | E      | 1      | 1        |
    +--------+--------+----------+
    | E      | 2      | 1        |
    +--------+--------+----------+
....

ROwNum 12

Search for "F" and "9" in the target table -> 9 exists. Therefore set uniqueID for 9 to the same uniqueID where 9 exists -> 4

THE FINAL TABLE then should look like this;

+========+========+==========+
| anonID | userID | uniqueID |
+========+========+==========+
| A      | 1      | 1        |
+--------+--------+----------+
| A      | 2      | 1        |
+--------+--------+----------+
| A      | 3      | 1        |
+--------+--------+----------+
| B      | 5      | 2        |
+--------+--------+----------+
| B      | 6      | 2        |
+--------+--------+----------+
| C      | 7      | 3        |
+--------+--------+----------+
| D      | 8      | 4        |
+--------+--------+----------+
| D      | 9      | 4        |
+--------+--------+----------+
| E      | 1      | 1        |
+--------+--------+----------+
| E      | 2      | 1        |
+--------+--------+----------+
| E      | 3      | 1        |
+--------+--------+----------+
| F      | 9      | 4        |
+========+========+==========+

If you wish to use my data;

SQL

CREATE TABLE tbl_source
    (
       rownum  integer,
       anonid  varchar(8),
       userid  integer
    );

    insert into tbl_source
     values
    (1,'A',1), (2,'A',2), (3,'A',3),
    (4,'B',5), (5,'B',6), (6,'C',7),
    (7,'D',8), (8,'D',9), (9,'E',1), 
    (10,'E',2), (11,'E',3), (12,'F',9)   
    ;
 
    CREATE TABLE tbl_target
    (
       anonid  varchar(8),
       userid  integer,
       uniqueID integer
    );

What I have tried:

I've been told I can do this with loops and variables etc but Amazon Redshift does not support these at the moment.

Posted 27-Feb-18 5:31am

Member 10104822

Updated 28-Feb-18 1:34am

Maciej Los

v2

Add a Solution

Comments

CHill60 27-Feb-18 12:43pm

I would avoid loops but I have some serious doubts about your logic here - are you really saying that if there had been an entry B,1 then just choose the "first" 1 to assign to the E1 row. How do you think you are going to determine that "first" without some ordering. What are you really trying to do or this just a contrived exercise?

Member 10104822 27-Feb-18 14:40pm

This is not a contrived exercise. The data supplied is just analogous dummy data. All I'm saying is that for what i need this for, if there are conflicts then i dont need to add some additional logic to determine the correct value. It can be ordered first but for simplicities sake its not needed

Jörgen Andersson 28-Feb-18 3:23am

How would you handle (a,1),(b,1),(b,2),(c,2),(c,3) ?
Shall they all get the same unique ID?

Member 10104822 28-Feb-18 4:46am

Correct

Jörgen Andersson 28-Feb-18 4:50am

So, how would you handle this case:
First you add (a,1) it gets UniqueID 1
Then add (b,2), this will get a new UniqueID 2
How do you handle the next row containing (a,2)?

Or is all data supposed to be added as a set in one go?

Member 10104822 28-Feb-18 5:53am

This scenario can't occur. There will always be on overlap.
it would go from
A1 -> A2 -> B2 or vice versa.
You'll always have an overlap.

Jörgen Andersson 28-Feb-18 5:55am

How about the series (a,1), (b,2), (b,1) is that also impossible?

Member 10104822 28-Feb-18 6:09am

It is

Jörgen Andersson 28-Feb-18 6:22am

One last question, do you start with an empty target table or not?

Member 10104822 28-Feb-18 6:24am

It is.empty

2 solutions

Solution 1

INSERT INTO #tbl_target
         SELECT anonid,
                userid,
                COALESCE((SELECT uniqueID FROM #tbl_target WHERE anonid=@anonID),
                        (SELECT MAX(uniqueID)+1 FROM #tbl_target),1)AS uniqueID
           FROM #tbl_source
                       WHERE anonid=@anonID AND userid=@userID;

Posted 27-Feb-18 18:49pm

Santosh kumar Pithani

Updated 27-Feb-18 18:51pm

v2

Comments

Member 10104822 28-Feb-18 7:00am

unfortunately this does not work. I get the error;

Invalid operation: operator does not exist: character varying =@ character varying;

Are you using variables? Redshift does not support variables unfortunately

Santosh kumar Pithani 28-Feb-18 7:22am

If you insert all records at a time in target table then how you compare with source table?

Member 10104822 28-Feb-18 7:26am

That's the thing. I need to be able to insert one row at a time into the target table and query that table before each new row is inserted.

That is the crux of the problem really

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Accepted Answer · 2018-02-28T01:35:00

Assuming you can use temporary tables, then this appears to work:

Firstly get a list of the unique anonIDs

SQL

create table #t1 (id integer identity(1,1), anonid varchar(8))
insert into #t1
SELECT DISTINCT anonid FROM tbl_source

Results:

1	A
2	B
3	C
4	D
5	E
6	F

Then get an intial starting position based purely on anonID

SQL

select B.ID as uniqueID, A.*
INTO #T2
FROM #tbl_source A
INNER JOIN #t1 B ON A.anonid = B.anonid

Results:

Then adjust that position based on whether or not userID has already appeared:

SQL

UPDATE A SET uniqueId= b.uniqueID
FROM #T2 A
inner JOIN #T2 B ON A.userid = B.userid and a.anonid <> b.anonid
where B.uniqueid < A.Uniqueid

Results:

SQL

You can then just query from #t2 to get your target

SQL

insert into tbl_target
	SELECT anonID, userID, uniqueID FROM #t2

Conditional row by row insert SQL - if value exists in the target table, tag the inserted row - redshift

2 solutions

Solution 2

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us