如何解决拆分字符串包含html标记
我有这个html字符串:
this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too
我想将其拆分成这样的结果数组:
this simple
the<b>html string<b>
text test
that<b>need</b>to<b>spl</b>it
it too
我尝试过这种方式:
var string ='this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
var regex = XRegExp('((?:[\\p{L}\\p{Mn}]+|)<\\s*.*?[^>]*>.*?<\/.*?>(?:[\\p{L}\\p{Mn}]+|))',"g");
result = string.split(regex);
没有用,我不想逐字分割...
解决方法
使用
string.split(/\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/).filter(Boolean);
捕获组将使您可以将匹配项保存为结果数组的一部分。
正则表达式
NODE EXPLANATION
--------------------------------------------------------------------------------
\s* whitespace (\n,\r,\t,\f,and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n,and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\s<>]+ any character except: whitespace (\n,and " "),'<','>' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group,but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n,and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[^\s<>]+ any character except: whitespace (\n,'>' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n,and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\s* whitespace (\n,and " ") (0 or
more times (matching the most amount
possible))
JavaScript:
const string = 'this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
const regex= /\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/;
console.log(string.split(regex).filter(Boolean));
输出:
[
"this simple","the<b>html string</b>","text test","that<b>need</b>to<b>spl</b>it","it too"
]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。