将XML文件转换为R

如何解决将XML文件转换为R

我有一个来自HMDB Saliva Metabolites数据的数据集。此数据是XML文件。我想做的就是将此XML文件转换为R中的数据帧列表,但是,我不希望列表中的所有节点。

导入文件并转换为列表:

require(XML)
library("methods")
data <- xmlParse("D:/rout/to/my/downloaded/file/saliva_metabolites/saliva_metabolites.xml")

xml_data <- xmlToList(data)

现在,不确定如何选择特定节点。 意思是,我的目标是创建一个代谢物列表,列表中的每个代谢物都会有一个数据框列表。说<metabolite>,然后说<accession>作为字符串,然后说<name>作为字符串,<synonym>说所有同义词作为数据帧。

使用这个问题More direct way to create a list of data frames from XML file? 但是问题中指向数据的链接不起作用,而且我不知道如何在代码中实现它。

我尝试使用此问题代码xml to R dataframe选择特定的节点,但是没有用

x <- lapply(data["//metabolite"],XML:::xmlAttrsToDataFrame)

但这给了我一个空列表

> x
list()

任何提示,参考或帮助将不胜感激

解决方法

不确定这是否是您要寻找的,但这是前三个代谢物及其两个子节点的代码示例。

library( xml2 )
library( magrittr )  #for pipe operator %>%

doc <- read_xml( "./temp/saliva_metabolites.xml" )
#get metabolite nodes (only first three used in this sample)
met.nodes <- xml_find_all( doc,".//d1:metabolite" )[1:3]
#list of data.frames with secondary accessions
# only two in this sample
xpath_child.v <- c( "./d1:secondary_accessions/d1:accession","./d1:synonyms/d1:synonym" )
#what names should they get in the list?
child.names.v <- c( "secondary_accessions","synonyms" )
#first,loop over the met.nodes
L.sec_acc <- lapply( met.nodes,function(x) { 
  #second,loop over the xpath desired child-nodes
  temp <- lapply( xpath_child.v,function(y) { 
    xml_find_all(x,y ) %>% xml_text() %>% data.frame( value = .)
    })
  #set their names
  names(temp) = child.names.v
  return(temp)
  }) 
#set names of metabolites
names(L.sec_acc) <- xml_find_first( met.nodes,".//d1:name ") %>% xml_text()

输出

# $`1-Methylhistidine`
# $`1-Methylhistidine`$secondary_accessions
# value
# 1   HMDB00001
# 2 HMDB0004935
# 3 HMDB0006703
# 4 HMDB0006704
# 5   HMDB04935
# 6   HMDB06703
# 7   HMDB06704
# 
# $`1-Methylhistidine`$synonyms
# value
# 1  (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid
# 2                                         1-Methylhistidine
# 3                                        Pi-methylhistidine
# 4      (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate
# 5                                         1 Methylhistidine
# 6                                        1-Methyl histidine
# 7                                        1-Methyl-histidine
# 8                                      1-Methyl-L-histidine
# 9                                                    1-MHis
# 10                                   1-N-Methyl-L-histidine
# 11                                      L-1-Methylhistidine
# 12                                    N1-Methyl-L-histidine
# 13                        1-Methylhistidine dihydrochloride
# 
# 
# $`2-Ketobutyric acid`
# $`2-Ketobutyric acid`$secondary_accessions
# value
# 1   HMDB00005
# 2 HMDB0006544
# 3   HMDB06544
# 
# $`2-Ketobutyric acid`$synonyms
# value
# 1                  2-Ketobutanoic acid
# 2                    2-Oxobutyric acid
# 3                3-Methyl pyruvic acid
# 4                   alpha-Ketobutyrate
# 5               alpha-Ketobutyric acid
# 6             alpha-oxo-N-Butyric acid
# 7                      2-Ketobutanoate
# 8                       2-Ketobutyrate
# 9                        2-Oxobutyrate
# 10                   3-Methyl pyruvate
# 11                      a-Ketobutyrate
# 12                  a-Ketobutyric acid
# 13                      a-ketobutyrate
# 14                  a-ketobutyric acid
# 15                    a-oxo-N-Butyrate
# 16                a-oxo-N-Butyric acid
# 17                alpha-oxo-N-Butyrate
# 18                    a-oxo-N-butyrate
# 19                a-oxo-N-butyric acid
# 20                     2-oxo-Butanoate
# 21                 2-oxo-Butanoic acid
# 22                      2-oxo-Butyrate
# 23                  2-oxo-Butyric acid
# 24                    2-oxo-N-Butyrate
# 25                2-oxo-N-Butyric acid
# 26                      2-Oxobutanoate
# 27                  2-Oxobutanoic acid
# 28                    3-Methylpyruvate
# 29                3-Methylpyruvic acid
# 30                   a-keto-N-Butyrate
# 31               a-keto-N-Butyric acid
# 32                       a-Oxobutyrate
# 33                   a-Oxobutyric acid
# 34               alpha-keto-N-Butyrate
# 35           alpha-keto-N-Butyric acid
# 36               alpha-Ketobutric acid
# 37                   alpha-Oxobutyrate
# 38               alpha-Oxobutyric acid
# 39                     Methyl-pyruvate
# 40                 Methyl-pyruvic acid
# 41                   Propionyl-formate
# 42               Propionyl-formic acid
# 43 alpha-Ketobutyric acid,sodium salt
# 
# 
# $`2-Hydroxybutyric acid`
# $`2-Hydroxybutyric acid`$secondary_accessions
# value
# 1 HMDB00008
# 
# $`2-Hydroxybutyric acid`$synonyms
# value
# 1                               2-Hydroxybutanoic acid
# 2                           alpha-Hydroxybutanoic acid
# 3                            alpha-Hydroxybutyric acid
# 4                                   2-Hydroxybutanoate
# 5                                    2-Hydroxybutyrate
# 6                                   a-Hydroxybutanoate
# 7                               a-Hydroxybutanoic acid
# 8                               alpha-Hydroxybutanoate
# 9                                   a-hydroxybutanoate
# 10                              a-hydroxybutanoic acid
# 11                                   a-Hydroxybutyrate
# 12                               a-Hydroxybutyric acid
# 13                               alpha-Hydroxybutyrate
# 14                                   a-hydroxybutyrate
# 15                               a-hydroxybutyric acid
# 16                              (RS)-2-Hydroxybutyrate
# 17                          (RS)-2-Hydroxybutyric acid
# 18                                 2-Hydroxy-butanoate
# 19                             2-Hydroxy-butanoic acid
# 20                               2-Hydroxy-DL-butyrate
# 21                           2-Hydroxy-DL-butyric acid
# 22                                2-Hydroxy-N-butyrate
# 23                            2-Hydroxy-N-butyric acid
# 24                                a-Hydroxy-N-butyrate
# 25                            a-Hydroxy-N-butyric acid
# 26                            alpha-Hydroxy-N-butyrate
# 27                        alpha-Hydroxy-N-butyric acid
# 28                               DL-2-Hydroxybutanoate
# 29                           DL-2-Hydroxybutanoic acid
# 30                                DL-a-Hydroxybutyrate
# 31                            DL-a-Hydroxybutyric acid
# 32                            DL-alpha-Hydroxybutyrate
# 33                        DL-alpha-Hydroxybutyric acid
# 34                   2-Hydroxybutyric acid,(R)-isomer
# 35              2-Hydroxybutyric acid,monosodium salt
# 36                  2-Hydroxybutyric acid,(+-)-isomer
# 37 2-Hydroxybutyric acid,monosodium salt,(+-)-isomer
,

另一个选择:

### packages

library(XML)
library(data.table)
library(dplyr)

### xml parse

xml <- xmlTreeParse("C://Users/.../saliva_metabolites/saliva_metabolites.xml",useInternalNode=TRUE)

### get the context nodes

ns <- getNodeSet(xml,"//*[local-name()='metabolite']")

### rbind the results of a function which extracts the data in order to construct the df

df=rbindlist(lapply(ns,function(x) {
  nm = xpathSApply(x,"(.//*[local-name()='name'])[1]",xmlValue)
  acc = xpathSApply(x,"(.//*[local-name()='accession'])[1]",xmlValue)
  syn = xpathSApply(x,"(.//*[local-name()='synonyms'])[1]/*",xmlValue)
  data.frame(name=nm,accession=acc,synonyms = paste(syn,collapse = '¤'))}),fill=TRUE)

### put synonyms of each row in a list (not mandatory)

df$synonyms = lapply(strsplit(as.character(df$synonyms),split='¤'),trimws)

## adding NA where the result is blank for synonyms and export the dfs (1 for each metabolite)

outp = df %>% mutate(synonyms=na_if(synonyms,"")) %>% group_split(xml_pos=row_number())

输出(前3个结果):

[[1]]
# A tibble: 1 x 4
  name              accession   synonyms   xml_pos
  <chr>             <chr>       <list>       <int>
1 1-Methylhistidine HMDB0000001 <chr [13]>       1

[[2]]
# A tibble: 1 x 4
  name               accession   synonyms   xml_pos
  <chr>              <chr>       <list>       <int>
1 2-Ketobutyric acid HMDB0000005 <chr [43]>       2

[[3]]
# A tibble: 1 x 4
  name                  accession   synonyms   xml_pos
  <chr>                 <chr>       <list>       <int>
1 2-Hydroxybutyric acid HMDB0000008 <chr [37]>       3

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-