如何解决如何在 R 中使用从长到宽的分类变量重塑 DF?
我是重塑数据框的新手。我有一个 df,我想让它变宽,以便我可以在集群和 NMDS 等分析中使用它。我发现了几个与如何重塑主要包含定量数据(使用聚合函数)的数据相关的问题和(答案),但就我而言,我的变量都是分类的。
由于我的 df 有一千行和几十列,我创建了一个玩具 df 作为示例。它看起来像这样:
df <- data.frame(
id=c("a","c","a","b","d","e","d"),color=c("red","blue","gray","yellow","green","purple","black","red","gray"),fruit=c("apple","orange","avocado","strawberry","banana","apple","watermelon","lemon","lemon" ),country = c("Italy","Spain","Brazil","Australia","Italy","Japan","India","USA","Mexico","France","France"),animal=c("alligator","camel","alligator","bat","dolphin","elephant","dolphin"))
我希望“id”列是我重塑的数据框中的第一个列,“animal”列是第二个,然后是“color”、“fruit”和“country”的级别。这里的重点是我想让他们分开。
下面的代码显示了我所做的一些尝试:
df <- dplyr::mutate_if(df,is.character,as.factor)
attach(df)
dcast(df,id ~ color,value.var = "id") #The output is exactly what I wanted!
dcast(df,id + animal ~ color,value.var = "id") #Exactly what I wanted!
dcast(df,id + animal ~ fruit,id ~ country,value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?
dcast(df,id ~ color + country,value.var = "id") #Not the output what I wanted.
dcast(df,id + animal~ color + country,value.var = "id") #Not the output I wanted.
dcast(df,id + animal~ color + country + fruit,value.var = "id") #Not the output I wanted.
我预期的重塑 df 应如下所示:
为了实现这一点,我尝试了以下所有命令,但没有一个效果很好:
dcast(df,id + animal ~ color + country + fruit,fun.aggregate=length)
dcast(df,id + animal ~ c(color,country,fruit),id + animal ~ c("color","country","fruit"),id + animal ~ color:fruit,fun.aggregate=length)
我也尝试过使用 tidyr::pivot_wider 来做到这一点,但没有成功。
有没有办法使用 reshape2::dcast 或 tidyr::pivot_wider 或 R 中的任何其他函数来实现我的目标? 如果你们能帮助我,我将不胜感激。提前致谢。
解决方法
首先,您必须pivot_longer
将所需的列名称放入列中。然后我按照未来的列名排列它,所以单词会被分组,就像你的图像一样,然后我使用了pivot_wider
。它去掉了动物栏,所以我把它放回去,然后按 id 排列,这样它们就会和你的图像处于相同的观察顺序。
pivot_longer(df,cols = color:country,names_to = "variable",values_to = "value") %>% # column names to rows
arrange(variable,value) %>% # organize future column names
pivot_wider(!variable,names_from = value,values_from = animal,values_fn = list(animal = length),values_fill = 0) %>%
left_join(distinct(df[,c(1,5)])) %>% # add animals back
select(id,animal,everything()) %>% # rearrange columns
arrange(id) # reorder observations
更新根据您的评论 - 按颜色、水果和国家/地区排序
添加了 mutate
并修改了第一个 arrange
和 pivot_wider
:
pivot_longer(df,values_to = "value") %>% # future col names to rows
mutate(ordering = ifelse(variable == "color",1,# create organizer variable
ifelse(variable == "fruit",2,3))) %>%
arrange(ordering,value) %>% # organize future column order
pivot_wider(!c(variable,ordering),# make it wide
names_from = value,5)])) %>% # add the animals back
select(id,everything()) %>% # move animals to 2nd position
arrange(id) # reorder observations
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。