16,015,973 members

See more:

I have the following code:

I want to create a new column c2 based on the values in "c1" and "orderdate." The column "index" counts the number of entries for each "customerid."

For each customerid, we look at each result in c1 (pos or neg) by order date from earliest to latest.

1. If c1 is "pos", then that row in c2 is NA and all later rows in c2 for that customerid are "No."

2. If c1 is "neg", then that row in c2 is NA. From here, there are 2 options:

A) All later rows with orderdate **≤ 500 days** from the earliest order date within a customerid are "No".

B) Alternatively, the first row with orderdate **> 500 days** from the earliest order date within a customerid is "Yes". This entry becomes the new "earliest result" and we repeat steps 1-2AB (depending on whether it is "pos" or "neg") with it.

3. Of note, if a row already has a value from the 1st condition, the 1st condition takes precedence and it is left alone. For example, if the first result was "pos" for a customer, all of that customerid's c2 entries would be "No". Their entries in c1 and orderdates won't matter.

Each customerid is independent of each other. The result should be:

c2 <- c(NA, "No", NA, "No", NA, "No", "Yes", NA)

**What I have tried:**

I am not sure how to approach this problem. I tried various solutions using lead, lag, dplyr, +(any...) and am not getting the correct output. Thanks!

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4") index <- c("1", "2", "1", "2", "1", "2", "3", "1") c1 <- c("pos", "neg", "neg", "pos", "neg", "neg", "pos", "neg") orderdate <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10") df <- data.frame(customerid, index, c1, orderdate)

I want to create a new column c2 based on the values in "c1" and "orderdate." The column "index" counts the number of entries for each "customerid."

For each customerid, we look at each result in c1 (pos or neg) by order date from earliest to latest.

1. If c1 is "pos", then that row in c2 is NA and all later rows in c2 for that customerid are "No."

2. If c1 is "neg", then that row in c2 is NA. From here, there are 2 options:

A) All later rows with orderdate **≤ 500 days** from the earliest order date within a customerid are "No".

B) Alternatively, the first row with orderdate **> 500 days** from the earliest order date within a customerid is "Yes". This entry becomes the new "earliest result" and we repeat steps 1-2AB (depending on whether it is "pos" or "neg") with it.

3. Of note, if a row already has a value from the 1st condition, the 1st condition takes precedence and it is left alone. For example, if the first result was "pos" for a customer, all of that customerid's c2 entries would be "No". Their entries in c1 and orderdates won't matter.

Each customerid is independent of each other. The result should be:

c2 <- c(NA, "No", NA, "No", NA, "No", "Yes", NA)

I am not sure how to approach this problem. I tried various solutions using lead, lag, dplyr, +(any...) and am not getting the correct output. Thanks!

Comments

Richard Andrew x64
7-Jul-23 10:50am

Have you tried writing down a step-by-step method of how you would perform this by hand? Sometimes that really gives you insight into how to have the computer do it.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject,
20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8
+1 (416) 849-8900