如何在R中对具有重复名称的行进行分组？

如何解决如何在R中对具有重复名称的行进行分组？

我对R还是很陌生，并且正在努力处理子集数据集。这是数据集的来源以及如何清除它。

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    defaultConfig:
      tracing:
        sampling: 50

这是我的代码，试图提取两列：类别和平均复杂度。

board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")

#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"),sep = ",",direction = "long")

我的最终意图：创建一个只有两列的数据框：类别和平均复杂度，并取相同类别名称下的平均复杂度的平均值。

我遇到的事情：我有5行类别，但有30行平均复杂度。如何取相同类别名称下所有平均复杂度的平均值？所有帮助将不胜感激！谢谢！

解决方法

filter的前5个类别的值，然后是group_by category，并取mean中的average_complexity。

library(dplyr)

board_game_tidy %>% 
  filter(category %in% names(top_5_category)) %>%
  group_by(category) %>%
  summarise(average_complexity = mean(average_complexity))

# category           average_complexity
#  <fct>                           <dbl>
#1 Abstract Strategy               0.844
#2 Action / Dexterity              0.469
#3 Adventure                       1.25 
#4 Age of Reason                   1.95 
#5 American Civil War              1.68

您非常亲密。您需要dplyr::summarise()

complexity_top_5_category <- board_game_tidy %>% 
        group_by(category) %>%
        dplyr::summarise(mean_average_complexity = mean(average_complexity,na.rm=TRUE)) %>% 
        top_n(5,mean_average_complexity) 
        #select(average_complexity) %>% # you don't need this
        #filter(category == c("Abstract Strategy Action / Dexterity","Adventure","Age of Reason","American Civil War "))
complexity_top_5_category

您没有没有在dplyr::之前包括summarise()。但是，其他一些常见的软件包也有其summarise（）的版本，因此更安全一些。

您可以使用top_n()自动选择顶部的 n 类别，而不用使用filter()。

如何在R中对具有重复名称的行进行分组？

如何解决如何在R中对具有重复名称的行进行分组？

解决方法

相关推荐