I have the following code:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
index <- c("1", "2", "1", "2", "1", "2", "3", "1")
c1 <- c("pos", "neg", "neg", "pos", "neg", "neg", "pos", "neg")
orderdate <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10")
df <- data.frame(customerid, index, c1, orderdate)
I want to create a new column c2 based on the values in "c1" and "orderdate." The column "index" counts the number of entries for each "customerid."
For each customerid, we look at each result in c1 (pos or neg) by order date from earliest to latest.
1. If c1 is "pos", then that row in c2 is NA and all later rows in c2 for that customerid are "No."
2. If c1 is "neg", then that row in c2 is NA. From here, there are 2 options:
A) All later rows with orderdate **≤ 500 days** from the earliest order date within a customerid are "No".
B) Alternatively, the first row with orderdate **> 500 days** from the earliest order date within a customerid is "Yes". This entry becomes the new "earliest result" and we repeat steps 1-2AB (depending on whether it is "pos" or "neg") with it.
3. Of note, if a row already has a value from the 1st condition, the 1st condition takes precedence and it is left alone. For example, if the first result was "pos" for a customer, all of that customerid's c2 entries would be "No". Their entries in c1 and orderdates won't matter.
Each customerid is independent of each other. The result should be:
c2 <- c(NA, "No", NA, "No", NA, "No", "Yes", NA)
What I have tried:
I am not sure how to approach this problem. I tried various solutions using lead, lag, dplyr, +(any...) and am not getting the correct output. Thanks!