dplyr - r + keeping first observation of time series group -
a follow-up on this question (i want keep threads separate): want @ each user , fruits ate. i'm interested in first time eat fruit. there, want rank order fruits eaten time.
some data:
set.seed(1234) library(dplyr) data <- data.frame( user = sample(c("1234","9876","4567"), 30, replace = true), fruit = sample(c("banana","apple","pear","lemon"), 30, replace = true), date = rep(seq(as.date("2010-02-01"), length=10, = "1 day"),3)) data <- data %>% arrange(user, date)
in case, can see that, example, user 1234 ate banana on 2010-02-01, again on 02-03, 02-04, , 02-05.
user fruit date 1 1234 banana 2010-02-01 2 1234 lemon 2010-02-02 3 1234 banana 2010-02-03 4 1234 apple 2010-02-03 5 1234 lemon 2010-02-03 6 1234 banana 2010-02-04 7 1234 banana 2010-02-05
i don't want change relative order of fruits time, want remove subsequent instances of "banana" after first 1 (and likewise every other fruit).
for user 1234 in case, i'm looking for:
user fruit date 1 1234 banana 2010-02-01 2 1234 lemon 2010-02-02 4 1234 apple 2010-02-03
one way can think of going arranging dataframe user > fruit > date, keeping first unique observation of "fruit" user grouping. i'm getting hung on how in dplyr. thoughts?
here approach using duplicated
function.
data %>% group_by(user) %>% filter(!duplicated(fruit)) # user fruit date # 1 1234 apple 2010-02-01 # 2 1234 banana 2010-02-01 # 3 1234 pear 2010-02-03 # 4 1234 lemon 2010-02-10 # 5 4567 pear 2010-02-01 # 6 4567 banana 2010-02-05 # 7 4567 lemon 2010-02-08 # 8 9876 apple 2010-02-02 # 9 9876 pear 2010-02-02 # 10 9876 lemon 2010-02-06
Comments
Post a Comment