|
XMLStarlet or another XPath engine is the correct tool for this job. For instance,with <root> <item><title>15:54:57 - George:</title><description>Diane DeConn? You saw Diane DeConn!</description></item>15:55:17 - Jerry:Something huh?</root> ...you can extract only the first title with the following: xmlstarlet sel -t m '//title[1]' v .n <dataxml Trying to use sed for this job istroublesome. For instance,the regex-based approaches won't work if the title has attributes; won't handle CDATA sections; won't correctly recognize namespace mappings; can't determine whether a portion of the XML documented is commented out; won't unescape attribute references (such as changing
Do you reallyhave touse only those tools? They're not designed for XML processing,and although it's possible to get something that works OK most of the time,it will fail on edge cases,like encoding,line breaks,etc. I recommend xml_grep: xml_grep 'job' jobsxml --text_only Which gives the output: programming On ubuntu/debian,xml_grep is in the xml-twig-tools package. |
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。