python - Sorting Pandas dataframe data within Groupby groups -
i have large pandas dataframe can represented structurally as:
id date status 0 12 2015-05-01 0 1 12 2015-05-22 1 2 12 2015-05-14 1 3 12 2015-05-06 0 4 45 2015-05-03 1 5 45 2015-05-12 1 6 45 2015-05-02 0 7 51 2015-05-05 1 8 51 2015-05-01 0 9 51 2015-05-23 1 10 51 2015-05-17 1 11 51 2015-05-03 0 12 51 2015-05-05 0 13 76 2015-05-04 1 14 76 2015-05-22 1 15 76 2015-05-08 0
and can created in python 3.4 using:
tempdf = pd.dataframe({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76], 'date': ['2015-05-01','2015-05-22','2015-05-14','2015-05-06','2015-05-03','2015-05-12','2015-05-02','2015-05-05','2015-05-01','2015-05-23','2015-05-17','2015-05-03','2015-05-05','2015-05-04','2015-05-22','2015-05-08'], 'status': [0,1,1,0,1,1,0,1,0,1,1,0,0,1,1,0]}) tempdf['date'] = pd.to_datetime(tempdf['date'])
i divide dataframe groups based on variable 'id', sort within groups based on 'date' , last 'status' value within each group.
so far, have:
tempgrouped = tempdf.groupby('id') tempgrouped['status'].last()
which produces:
id 12 0 45 0 51 0 76 0
however, status should 1 in each case (the value associated latest date). can't work out how sort groups date before selecting last value. it's i'm little snow-blind after trying work out while, apologise in advance if solution obvious.
you can sort , group :
tempdf.sort(['id','date']).groupby('id')['status'].last()
Comments
Post a Comment