如何解决字符串拆分,合并和堆叠多列
我有以下原始数据,如下所示:
rawData <- data.frame(ID = c(1,2,3),Name = c("Company B; Company A; Company C","Company A; Company D","Company E"),Name_location = c("Company A (USA (Primary)); Company B (Japan(Primary)); Company C (Korea,South (Primary))","Company A (USA (Primary)); Company D (USA (Primary))","European (Primary)" ))
ID Name Name_location
1 Company B; Company A;Company C Company A (USA (Primary)); Company B (Japan(Primary)); Company C (Korea,South (Primary))
2 Company A; Company D Company A (USA (Primary)); Company D (USA (Primary))
3 Company E European (Primary)
我需要将数据转换为如下所示:
“名称_位置”字段在“名称”字段中具有每个公司的位置数据,但是可能会乱序。另外,如果“名称”字段中只有1家公司,则“名称_位置”字段将仅具有该位置,而如果“名称”字段中有多个公司,则“名称_位置”字段将遵循语法“公司(位置(主要));公司(位置) (主要))”
我需要一种将公司及其位置分隔为可通过ID识别的单独行的方法。
IdealData <- data.frame(ID = c(1,1,Name = c("Company B","Company A","Company C","Company D",Location = c("Japan","USA","Korea,South","European"))
ID Name Location
1 Company B Japan
1 Company A USA
1 Company C Korea,South
2 Company A USA
2 Company D USA
3 Company E European
希望在R中做到这一点
解决方法
使用separate_rows
之后,我们可以使用str_extract
library(stringr)
library(dplyr)
library(tidyr)
rawData %>%
separate_rows(c(Name,Name_location),sep=";\\s*") %>%
separate(Name_location,into = c('Name1','Location'),sep= "\\s+(?=\\()",extra = "merge") %>%
mutate(Location = case_when(Name1 == 'European' ~ Name1,TRUE ~ trimws(str_extract(Location,"(?<=\\()[^(]+"))[match(Name,Name1)])) %>%
select(-Name1)
# A tibble: 6 x 3
# ID Name Location
# <dbl> <chr> <chr>
#1 1 Company B Japan
#2 1 Company A USA
#3 1 Company C Korea,South
#4 2 Company A USA
#5 2 Company D USA
#6 3 Company E European
,
如果要在没有包和库的情况下进行操作,则可以遍历条目并创建一个新的data.frame:
rawData <- data.frame("ID" = c(1,2,3),"Name" = c("Company B; Company A; Company C","Company A; Company D","Company E"),"Name_location" = c("Company A (USA (Primary)); Company B (Japan(Primary)); Company C (Korea,South (Primary))","Company A (USA (Primary)); Company D (USA (Primary))","European (Primary)" ))
rawData$Name = as.character(rawData$Name)
rawData$Name_location = as.character(rawData$Name_location)
idealData = list("ID"=c(),"Company"=c(),"Location"=c())
for(i in 1:length(rawData$ID)){
print(strsplit(rawData$Name[i],";"))
ncomp = length(strsplit(rawData$Name[i],";")[[1]])
print(ncomp)
if(ncomp==1){
idealData[["ID"]]=c(idealData[["ID"]],rawData$ID[i])
idealData[["Company"]]=c(idealData[["Company"]],rawData$Name[i])
idealData[["Location"]]=c(idealData[["Location"]],strsplit(rawData$Name_location[i]," \\(")[[1]][1])
}else{
vcomp = strsplit(rawData$Name[i],"; ")[[1]]
for(compi in 1:ncomp){
idealData[["ID"]]=c(idealData[["ID"]],rawData$ID[i])
idealData[["Company"]]=c(idealData[["Company"]],vcomp[compi])
loc = strsplit(rawData$Name_location[i],";")[[1]]
print(loc)
loc = loc[grep(vcomp[compi],loc)][1]
idealData[["Location"]]=c(idealData[["Location"]],strsplit(loc,"\\(")[[1]][2])
}
}
}
idealData = as.data.frame(idealData)
哪个给出输出:
> idealData
ID Company Location
1 1 Company B Japan
2 1 Company A USA
3 1 Company C Korea,South
4 2 Company A USA
5 2 Company D USA
6 3 Company E European
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。