在R中整理具有多个重复变量的数据

如何解决在R中整理具有多个重复变量的数据

我有一个如下数据框。有几个变量（例如“ c”和“ z”）用于衡量健康，动物，环境和金钱。在实际的数据框中，还有许多其他列不遵循此模式，而是散布在整个列中。

id  c_health  c_animals  c_enviro  c_money  z_health  z_animals  z_enviro  z_money
1   3         2          4         5        7         9          6         8
2   2         3          5         4        8         7          6         9
3   4         1          2         3        9         6          8         7

我正在尝试重新排列数据以使其“整洁”。当当前数据集中有多个变量时，我不确定该怎么办。我最终想得到的结果就是这样的：

id  c  z  message
1   3  7  health
1   2  9  animals
1   4  6  enviro
1   5  8  money
2   2  8  health
2   3  7  animals
2   5  6  enviro
2   4  9  money
3   4  9  health
3   1  6  animals
3   2  8  enviro
3   3  7  money

如果数据框仅包含以下列，则可以通过以下方式使其整洁：

id  c_health  c_animals  c_enviro  c_money
1   3         2          4         5
2   2         3          5         4
3   4         1          2         3

df <- df %>%
   gather(.,key = "question",value = "response",2:5)

解决方法

您可以使用tidyr软件包和pivot_longer进行此操作：

let str = "['Test',1,'Test 2'],['Test 3',2,'Test 4']";
let arr = JSON.parse(`[${str}]`.replaceAll("'",'"'));

使用gather处在正确的轨道上，但是需要一些其他步骤才能将前缀从列名中分离出来。请尝试以下操作：

library(dplyr)
library(tidyr)

df = data.frame(
  id = c(1,3),c_health = c(3,4),c_animals = c(2,3,1),z_health = c(7,8,9),z_animals = c(9,7,6),stringsAsFactors = FALSE
)

output = df %>%
  # gather on all columns other than id
  gather(key = "question",value = "response",-all_of("id")) %>%
  # split off prefix and rest of column name
  mutate(prefix = substr(question,desc = substr(question,nchar(question))) %>%
  # keep just the columns of interest
  select(id,prefix,desc,response) %>%
  # reshape wider
  spread(prefix,response)

更新-我对不同前缀长度的评论未返回正确答案。因为[]索引无法在mutate中那样工作。想法相同，但语法如下：

output = df %>%
  # gather on all columns other than id
  gather(key = "question",-all_of("id")) %>%
  # split off prefix and rest of column name
  mutate(split = strsplit(question,"_")) %>%
  mutate(prefix = sapply(split,function(x){x[1]}),desc = sapply(split,function(x){x[2]})) %>%
  # keep just the columns of interest
  select(id,response)

在R中整理具有多个重复变量的数据

如何解决在R中整理具有多个重复变量的数据

解决方法

相关推荐