如何解决联合和映射的整洁评估,以取消嵌套列表数据集的影响
我正在尝试取消使用ivot_wider生成的数据集的嵌套, 其中有多个列需要取消嵌套。 在完整的数据集上,unnest函数不起作用(我收到一个错误:> Error:不兼容的长度:3、2。) 所以我尝试了一种解决方法 数据集的一部分:
my_data <- structure(list(RNAcentral_id = c("URS000000C731","URS000000C731","URS000001F3AA","URS0000023ED8","URS0000050C72","URS00000527A6","URS000007CAC8","URS000007DA54","URS000007F1D7","URS0000088F47","URS00000B589B","URS00000B589B"),Database = c("ENSEMBL","ENSEMBL","GENCODE","LNCIPEDIA","GENECARDS","LNCBOOK","NONCODE","ENA","GENCODE"),RNA_type = c("lncRNA","lncRNA","snoRNA","snoRNA"),gene_name = c("ENSG00000250666.1","ENSG00000281830.1","ENSG00000281377.1","LINC01596","ENSG00000242086.8","ENSG00000280512.2","ENSG00000281603.2","ENSG00000281060.2","ENSG00000281794.2","ENSG00000281915.2","ENSG00000280993.2","ENSG00000282953.1","MUC20-OT1","lnc-MUC20-67","ENSG00000235273.1","ENSG00000233950.1","ENSG00000230089.1","ENSG00000225188.1","LOC101929006","HSALNG0049045","lnc-OR14J1-2","NONHSAG043350.2","NONHSAG045640.2","NONHSAG045830.2","NONHSAG046018.2","NONHSAG046538.2","ENSG00000231860.1","ENSG00000224328.1","ENSG00000236766.1","ENSG00000224508.1","ENSG00000236522.1","ENSG00000229681.1","ENSG00000233883.1","MDC1-AS1","HSALNG0049184","NONHSAG043427.2","NONHSAG045580.2","NONHSAG045701.2","NONHSAG045891.2","NONHSAG046074.2","NONHSAG046228.2","NONHSAG046589.2","ENSG00000249981.1","ENSG00000276297.1","ENSG00000280619.1","AC145141.1","LOC107987420","LOC107987434","HSALNG0042531","lnc-BDP1-1","NONHSAG040656.2","NONHSAG037073.2","HSALNG0031832","ENSG00000224835.1","ENSG00000227198.1","ENSG00000233169.1","ENSG00000225390.1","C6orf47-AS1","HSALNG0049305","NONHSAG043504.2","NONHSAG046125.2","NONHSAG046270.2","NONHSAG046461.2","ENSG00000272566.1","ENSG00000280590.1","ENSG00000280853.1","ENSG00000281916.1","AF250324.1","ENSG00000272566","lnc-FRG2-13","ACA38 snoRNA","ENSG00000200816.1","ENSG00000266847.1","ENSG00000263994.1","ENSG00000264153.1","ENSG00000263879.1","SNORA38")),row.names = c(NA,-88L),class = c("tbl_df","tbl","data.frame"),spec = structure(list(cols = list(RNAcentral_id = structure(list(),class = c("collector_character","collector")),Database = structure(list(),external_id = structure(list(),NCBI_taxon_id = structure(list(),class = c("collector_double",RNA_type = structure(list(),gene_name = structure(list(),"collector"))),default = structure(list(),class = c("collector_guess",delim = "\t"),class = "col_spec"))
出现错误的那个
my_data %>%
pivot_wider(names_from = Database,values_from = c(gene_name)) %>%
unnest()
我的解决方法尝试:
mynested_data <- my_data %>%
pivot_wider(names_from = Database,values_from = c(gene_name))
c("ENSEMBL","LNCIPEDIA") %>%
set_names(.) %>%
map(~ mynested_data %>%
unnest_wider(.x,names_sep = "_") %>%
unite(col = !!.x,vars(starts_with(!!quo(.x))),sep = ";"))
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type `quosures`.
\u2139 It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.
我也曾尝试同时使用col = .x
或col = !!quo(.x)
,但遇到相同的错误。
Edit1我期望得到的结果 我这样做是为了获得一个小标题,该小标题的每行(条目)都有一个RNAcentral_id,并且列表“ columns”使字符串具有多个条目,并带有分隔符“;”。 ENSEMBL一栏,GENCODE一栏等
解决方法
我们可以在此处直接使用pivot_wider
:
tidyr::pivot_wider(my_data,names_from = Database,values_from = gene_name,values_fn = toString)
或在data.table
中与dcast
一起使用:
library(data.table)
dcast(setDT(my_data),RNA_type + RNAcentral_id~ Database,value.var = 'gene_name',fun.aggregate = toString)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。