如何在 R 中使用从长到宽的分类变量重塑 DF？

如何解决如何在 R 中使用从长到宽的分类变量重塑 DF？

我是重塑数据框的新手。我有一个 df，我想让它变宽，以便我可以在集群和 NMDS 等分析中使用它。我发现了几个与如何重塑主要包含定量数据（使用聚合函数）的数据相关的问题和（答案），但就我而言，我的变量都是分类的。

由于我的 df 有一千行和几十列，我创建了一个玩具 df 作为示例。它看起来像这样：

df <- data.frame(
  id=c("a","c","a","b","d","e","d"),color=c("red","blue","gray","yellow","green","purple","black","red","gray"),fruit=c("apple","orange","avocado","strawberry","banana","apple","watermelon","lemon","lemon" ),country = c("Italy","Spain","Brazil","Australia","Italy","Japan","India","USA","Mexico","France","France"),animal=c("alligator","camel","alligator","bat","dolphin","elephant","dolphin"))

我希望“id”列是我重塑的数据框中的第一个列，“animal”列是第二个，然后是“color”、“fruit”和“country”的级别。这里的重点是我想让他们分开。

下面的代码显示了我所做的一些尝试：

df <- dplyr::mutate_if(df,is.character,as.factor) 
attach(df)

dcast(df,id ~ color,value.var = "id") #The output is exactly what I wanted! 

dcast(df,id + animal ~ color,value.var = "id") #Exactly what I wanted!

dcast(df,id + animal ~ fruit,id ~ country,value.var = "id") #Not the output I wanted. Only "works well" if I specify "fun.aggregate=length". Why?

dcast(df,id ~ color + country,value.var = "id") #Not the output what I wanted.

dcast(df,id + animal~ color + country,value.var = "id") #Not the output I wanted.

dcast(df,id + animal~ color + country + fruit,value.var = "id") #Not the output I wanted.

我预期的重塑 df 应如下所示：

Expected reshape data frame

为了实现这一点，我尝试了以下所有命令，但没有一个效果很好：

dcast(df,id + animal ~ color + country + fruit,fun.aggregate=length)

dcast(df,id + animal ~ c(color,country,fruit),id + animal ~ c("color","country","fruit"),id + animal ~ color:fruit,fun.aggregate=length)

我也尝试过使用 tidyr::pivot_wider 来做到这一点，但没有成功。

有没有办法使用 reshape2::dcast 或 tidyr::pivot_wider 或 R 中的任何其他函数来实现我的目标？如果你们能帮助我，我将不胜感激。提前致谢。

解决方法

首先，您必须pivot_longer 将所需的列名称放入列中。然后我按照未来的列名排列它，所以单词会被分组，就像你的图像一样，然后我使用了pivot_wider。它去掉了动物栏，所以我把它放回去，然后按 id 排列，这样它们就会和你的图像处于相同的观察顺序。

pivot_longer(df,cols = color:country,names_to = "variable",values_to = "value") %>%                       # column names to rows
  arrange(variable,value) %>%                              # organize future column names
  pivot_wider(!variable,names_from = value,values_from = animal,values_fn = list(animal = length),values_fill = 0) %>%
  left_join(distinct(df[,c(1,5)])) %>%                      # add animals back
  select(id,animal,everything()) %>%                      # rearrange columns
  arrange(id)                                               # reorder observations

更新根据您的评论 - 按颜色、水果和国家/地区排序

添加了 mutate 并修改了第一个 arrange 和 pivot_wider：

pivot_longer(df,values_to = "value") %>%                # future col names to rows
  mutate(ordering = ifelse(variable == "color",1,# create organizer variable
                           ifelse(variable == "fruit",2,3))) %>% 
  arrange(ordering,value) %>%                       # organize future column order
  pivot_wider(!c(variable,ordering),# make it wide
              names_from = value,5)])) %>%               # add the animals back
  select(id,everything()) %>%               # move animals to 2nd position
  arrange(id)                                        # reorder observations

检查一下：

如何在 R 中使用从长到宽的分类变量重塑 DF？

如何解决如何在 R 中使用从长到宽的分类变量重塑 DF？

解决方法

相关推荐