了解dplyr管道和汇总功能

如何解决了解dplyr管道和汇总功能

我正在寻找一些帮助,以帮助您理解使用dplyr的管道系统和功能汇总。我觉得我的编码有点冗长,可以简化。所以这里有两个问题,因为我知道我缺少一些概念,但是我不确定在哪里缺乏知识。我在底部包含了完整的代码。预先感谢,因为这个问题要大一些。

1a。根据下面的示例数据并使用dplyr,有没有一种方法可以在不使用中间表的情况下计算每个团队的比赛(日期)?

1b。我已经包含了计算n_games无效的原始方法。为什么?

set.seed(123)
shot_df_ex <- tibble(Team_Name = sample(LETTERS[1:5],250,replace = TRUE),Date = sample(as.Date(c("2019-08-01","2019-09-01","2018-08-01","2018-09-01","2017-08-01","2017-09-01")),size = 250,Type = sample(c("shot","goal"),replace = TRUE,prob = c(0.9,0.1))
)

# count shots per team per game(date)
n_shots_per_game <- shot_df_ex %>% 
  count(Team_Name,Date)

n_shots_per_game

# count games (dates) per team [ISSUES!!!]
# is there a way to do this piping from the shot_df_ex tibble instead of 
#  using an intermediate tibble?

# count number of games using the tibble created above [DOES NOT WORK--WHY?]
n_games <- n_shots_per_game %>% 
  count(Team_Name)

n_games #what is this counting? It should be 6 for each.

# this works,but isn't count() just a quicker way to run
#  group_by() %>% summarise()? 
n_games <- n_shots_per_game %>% 
  group_by(Team_Name) %>% 
  summarise(N_Games=n())

n_games
  1. 以下是我创建摘要表的过程。我知道管道是为了减少一些中间变量/表的创建。我在哪里可以结合以下步骤以最少的中间步骤创建最终表。
# load librarys ------------------------------------------------
library(tidyverse)

# build sample shot data ---------------------------------------
set.seed(123)
shot_df_ex <- tibble(Team_Name = sample(LETTERS[1:5],0.1))
)

# calculate data ----------------------------------------------
# since every row is a shot,the following function counts shots for ea. team
n_shots <- shot_df_ex %>% 
  count(Team_Name) %>% 
  rename(N_Shots = n)

n_shots

# do the same for goals for each team
n_goals <- shot_df_ex %>% 
  filter(Type == "goal") %>% 
  count(Team_Name,sort = T) %>% 
  rename(N_Goals = n) %>% 
  arrange(Team_Name)

n_goals

# count shots per team per game(date)
n_shots_per_game <- shot_df_ex %>% 
  count(Team_Name,Date)

n_shots_per_game

# count games (dates) per team [ISSUES!!!]
# is there a way to do this piping from the shot_df_ex tibble instead of 
#  using an intermediate tibble?

# count number of games using the tibble created above [DOES NOT WORK]
n_games <- n_shots_per_game %>% 
  count(Team_Name)

n_games #what is this counting? It should be 6 for each.

# this works,but isn't count() just a quicker way to run
#  group_by() %>% summarise()? 
n_games <- n_shots_per_game %>% 
  group_by(Team_Name) %>% 
  summarise(N_Games=n())

n_games

# combine data ------------------------------------------------
# combine columns and add average shots per game
shot_table_ex <- n_games %>% 
  left_join(n_shots) %>% 
  left_join(n_goals)

# final table with final average calculations
shot_table_ex <- shot_table_ex %>% 
  mutate(Shots_per_Game = round(N_Shots / N_Games,1),Goals_per_Game = round(N_Goals / N_Games,1)) %>% 
  arrange(Team_Name)

shot_table_ex

解决方法

对于1a,您可以直接从tibble()函数直接传递到count()。即。

tibble(Team_Name = sample(LETTERS[1:5],250,replace = TRUE),Date = sample(as.Date(c("2019-08-01","2019-09-01","2018-08-01","2018-09-01","2017-08-01","2017-09-01")),size = 250,Type = sample(c("shot","goal"),replace = TRUE,prob = c(0.9,0.1))) %>%
count(Team_Name,Date)

在1b中,count()使用您的列n(即射门次数)作为权重变量,因此将每个团队的总射门次数而不是行数相加。它会显示一条消息告诉您:

Using `n` as weighting variable i Quiet this message with `wt = n` or count rows with `wt = 1`

使用count(Team_Name,wt=n())将提供您想要的行为。

修改:第2部分

shot_table_ex <- tibble(Team_Name = sample(LETTERS[1:5],0.1))) %>%
     group_by(Team_Name) %>%
     summarise(n_shots = n(),n_goals = sum(Type == "goal"),n_games = n_distinct(Date)) %>%
     mutate(Shots_per_Game = round(n_shots / n_games,1),Goals_per_Game = round(n_goals / n_games,1))
,

1a。根据下面的示例数据并使用dplyr,有没有一种方法可以在不使用中间表的情况下计算每个团队的比赛(日期)?

这就是我要做的:

shot_df_ex %>% 
  distinct(Team_Name,Date) %>% #Keeps only the cols given and one of each combo
  count(Team_Name)

您还可以使用唯一的:

shot_df_ex %>% 
  group_by(Team_Name) %>%
  summarize(N_Games = length(unique(Date))

1b。我已经包含了计算n_games的原始方法 工作。为什么?

您的代码对我有用。您是否保存了中间表?它正在计算每个团队预期的6人。

  1. 以下是我创建摘要表的过程。我知道管道是为了减少某些中间产物的产生 变量/表。我在哪里可以结合以下步骤来创建 决赛桌的中间步骤最少?
shot_df_ex %>% 
  group_by(Team_Name) %>% 
  summarize(
    N_Games = length(unique(Date)),N_Shots = sum(Type == "shot"),N_Goals = sum(Type == "goal")
  ) %>% 
  mutate(Shots_per_Game = round(N_Shots / N_Games,Goals_per_Game = round(N_Goals / N_Games,1))

只要您不需要更改分组,就可以一次使用多个汇总步骤。我们在这里(在sum调用中利用True为1并将False为0的解释。length当然将为我们提供unique产生的向量的长度

此(计数)有效,但是count()并不是运行group_by()更快的方法%>%summarise()吗?

count只是group_by(col) %>% tally()的组合,而tally本质上是summarize(x=n()),所以是的。 :)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-