如何在 R 中使用从长到宽的分类变量重塑 DF?

如何解决如何在 R 中使用从长到宽的分类变量重塑 DF?

我是重塑数据框的新手。我有一个 df,我想让它变宽,以便我可以在集群和 NMDS 等分析中使用它。我发现了几个与如何重塑主要包含定量数据(使用聚合函数)的数据相关的问题和(答案),但就我而言,我的变量都是分类的。

由于我的 df 有一千行和几十列,我创建了一个玩具 df 作为示例。它看起来像这样:

df <- data.frame(
  id=c("a","c","a","b","d","e","d"),color=c("red","blue","gray","yellow","green","purple","black","red","gray"),fruit=c("apple","orange","avocado","strawberry","banana","apple","watermelon","lemon","lemon" ),country = c("Italy","Spain","Brazil","Australia","Italy","Japan","India","USA","Mexico","France","France"),animal=c("alligator","camel","alligator","bat","dolphin","elephant","dolphin")) 

我希望“id”列是我重塑的数据框中的第一个列,“animal”列是第二个,然后是“color”、“fruit”和“country”的级别。这里的重点是我想让他们分开。

下面的代码显示了我所做的一些尝试:

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df,id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df,id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df,id + animal ~ fruit,id ~ country,value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df,id ~ color + country,value.var = "id") #Not the output what I wanted.

dcast(df,id + animal~ color + country,value.var = "id") #Not the output I wanted.

dcast(df,id + animal~ color + country + fruit,value.var = "id") #Not the output I wanted.

我预期的重塑 df 应如下所示:

Expected reshape data frame

为了实现这一点,我尝试了以下所有命令,但没有一个效果很好:

dcast(df,id + animal ~ color + country + fruit,fun.aggregate=length)

dcast(df,id + animal ~ c(color,country,fruit),id + animal ~ c("color","country","fruit"),id + animal ~ color:fruit,fun.aggregate=length)

我也尝试过使用 tidyr::pivot_wider 来做到这一点,但没有成功。

有没有办法使用 reshape2::dcast 或 tidyr::pivot_wider 或 R 中的任何其他函数来实现我的目标? 如果你们能帮助我,我将不胜感激。提前致谢。

解决方法

首先,您必须pivot_longer 将所需的列名称放入列中。然后我按照未来的列名排列它,所以单词会被分组,就像你的图像一样,然后我使用了pivot_wider。它去掉了动物栏,所以我把它放回去,然后按 id 排列,这样它们就会和你的图像处于相同的观察顺序。

pivot_longer(df,cols = color:country,names_to = "variable",values_to = "value") %>%                       # column names to rows
  arrange(variable,value) %>%                              # organize future column names
  pivot_wider(!variable,names_from = value,values_from = animal,values_fn = list(animal = length),values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id,animal,everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

enter image description here

更新根据您的评论 - 按颜色、水果和国家/地区排序

添加了 mutate 并修改了第一个 arrangepivot_wider

pivot_longer(df,values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color",1,# create organizer variable
                           ifelse(variable == "fruit",2,3))) %>% 
  arrange(ordering,value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),# make it wide
              names_from = value,5)])) %>%               # add the animals back
  select(id,everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations 

检查一下: enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-