dplyr - r + keeping first observation of time series group -


a follow-up on this question (i want keep threads separate): want @ each user , fruits ate. i'm interested in first time eat fruit. there, want rank order fruits eaten time.

some data:

set.seed(1234) library(dplyr)  data <- data.frame(     user = sample(c("1234","9876","4567"), 30, replace = true),     fruit = sample(c("banana","apple","pear","lemon"), 30, replace = true),     date = rep(seq(as.date("2010-02-01"), length=10, = "1 day"),3))  data <- data %>% arrange(user, date) 

in case, can see that, example, user 1234 ate banana on 2010-02-01, again on 02-03, 02-04, , 02-05.

   user  fruit       date 1  1234 banana 2010-02-01 2  1234  lemon 2010-02-02 3  1234 banana 2010-02-03 4  1234  apple 2010-02-03 5  1234  lemon 2010-02-03 6  1234 banana 2010-02-04 7  1234 banana 2010-02-05 

i don't want change relative order of fruits time, want remove subsequent instances of "banana" after first 1 (and likewise every other fruit).

for user 1234 in case, i'm looking for:

   user  fruit       date 1  1234 banana 2010-02-01 2  1234  lemon 2010-02-02 4  1234  apple 2010-02-03 

one way can think of going arranging dataframe user > fruit > date, keeping first unique observation of "fruit" user grouping. i'm getting hung on how in dplyr. thoughts?

here approach using duplicated function.

data %>% group_by(user) %>% filter(!duplicated(fruit)) #    user  fruit       date # 1  1234  apple 2010-02-01 # 2  1234 banana 2010-02-01 # 3  1234   pear 2010-02-03 # 4  1234  lemon 2010-02-10 # 5  4567   pear 2010-02-01 # 6  4567 banana 2010-02-05 # 7  4567  lemon 2010-02-08 # 8  9876  apple 2010-02-02 # 9  9876   pear 2010-02-02 # 10 9876  lemon 2010-02-06 

Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -