如何解决熊猫:如何使用.agg
我有一个数据框,其中包含顾客对他们去过的餐馆的评价以及其他一些属性。
- 我要做的是计算去年的平均星级与该年度的平均星级之间的差异 餐馆的第一年。
data = {'rating_id': ['1','2','3','4','5','6','7'],'user_id': ['56','13','56','99','12'],'restaurant_id': ['xxx','xxx','yyy','zzz','zzz'],'star_rating': ['2.3','3.7','1.2','5.0','1.0','3.2','1.0'],'rating_year': ['2012','2012','2020','2001','2015','2000'],'first_year': ['2012','2000','last_year': ['2020','2015'],}
df = pd.DataFrame (data,columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year'])
df.head()
df['star_rating'] = df['star_rating'].astype(float)
# calculate the average of the stars of the first year
ratings_mean_firstYear= df.groupby(['restaurant_id','first_year']).agg({'star_rating':[np.mean]})
ratings_mean_firstYear.columns = ['avg_firstYear']
ratings_mean_firstYear.reset_index()
# calculate the average of the stars of the last year
ratings_mean_lastYear= df.groupby(['restaurant_id','last_year']).agg({'star_rating':[np.mean]})
ratings_mean_lastYear.columns = ['avg_lastYear']
ratings_mean_lastYear.reset_index()
# merge the means into a single table
ratings_average = ratings_mean_firstYear.merge(
ratings_mean_lastYear.groupby('restaurant_id')['avg_lastYear'].max(),on='restaurant_id'
)
ratings_average.head(20)
我的问题是,第一年和最后一年的平均值完全相同,没有任何意义,我真的不知道自己在这里的思考过程做错了什么。.我怀疑{ {1}},因为这是我第一次使用pandas lib。
有什么建议吗?
解决方法
您提供的数据以这样的方式提供:每个用户/餐厅对具有单个评分,并且您在第一年和去年的汇总中都使用它-因此,自然来说,这两年都是相等的。我首先使用rating_year == first_year条件过滤数据,然后应用groupby和agg。然后对去年重复相同的操作,然后合并2个结果。在您的示例中,没有一条评论的数据与任何餐厅的第一年或去年匹配。因此,显示适当的示例将需要更多数据。我假设您在更大的数据框中有它。 –
这里是一个示例,我添加了更多行并更改了年份以具有更多匹配项:
data = {'rating_id': ['1','2','3','4','5','6','7','8','9'],'user_id': ['56','56','99','99'],'restaurant_id': ['xxx','xxx','yyy','xxx'],'star_rating': ['2.3','3.7','1.2','5.0','1.0','3.2','4.0','2.5','3.0'],'rating_year': ['2012','2020','2001','2012','2019'],'first_year': ['2012','2012'],'last_year': ['2020','2020'],}
df = pd.DataFrame (data,columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year'])
df['star_rating'] = df['star_rating'].astype(float)
ratings_mean_firstYear = df[df.rating_year == df.first_year].groupby('restaurant_id').agg({'star_rating':'mean'})
ratings_mean_firstYear.columns = ['avg_firstYear']
ratings_mean_lastYear= df[df.rating_year == df.last_year].groupby('restaurant_id').agg({'star_rating':'mean'})
ratings_mean_lastYear.columns = ['avg_lastYear']
结果:
ratings_mean_firstYear.merge(ratings_mean_lastYear,left_index=True,right_index=True)
avg_firstYear avg_lastYear
restaurant_id
xxx 1.65 3.45
yyy 2.60 3.75
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。