使用 SED 替换括号内的特定模式？

如何解决使用 SED 替换括号内的特定模式？

我遇到了一些问题... 我正在尝试使用 Bash 脚本（特别是 Sed）来处理以下文本。当然，也欢迎其他方法！但我希望它可以成为 Bash 解决方案...

棘手的输入：

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"

所需的输出：

("a"|"b"|"c")."ABC".("e"|"f")."EF"

主要是，我想我想做的是将字符串 "|" 替换为空，但将更改范围限制在括号中的任何现有文本之外。

对于我拥有的数据集的不同形式的文本输入，问题变得更加疯狂。如在，带括号和非括号的块（由 . 分隔）的组合是多种多样的。

提前致谢。

我用 SED 尝试过的东西：

gsed -E "s/(\.\"[[:graph:]]+)\"\|\"/\1/g" input.txt

我得到的输出是：

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."EF"

看起来我只得到了部分想要的输出......只针对有限的范围......

解决方法

请您尝试以下操作：

#!/bin/bash

awk 'BEGIN {FS = OFS = "."}                     # use "." as a field separator
{
    for (i = 1; i <= NF; i++) {                 # loop over the fields
        if ($i !~ "^\\(.+\\)$") {               # if the fields is not enclosed with "(" and ")"
            gsub("\"\\|\"","",$i)             # then remove "|"s
        }
    }
    print
}' <<< '("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"'

输出：

("a"|"b"|"c")."ABC".("e"|"f")."EF"

[说明]

BEGIN {} 块在处理输入之前只执行一次文件。初始化变量很有用。
由于 awk 变量 FS 被分配给“.”，输入行是自动的在“.”上拆分。然后将 $1 分配给第一个字段 ("a"|"b"|"c")， $2 被分配给第二个 "A"|"B"|"C" .. 依此类推。 awk 变量 NF 设置为字段数（本例中为 4）。
for 循环 for (i = 1; i <= NF; i++) 遍历字段以依次检查 $1、$2、...。
如果变量 "^\$.+\$$"，则正则表达式 $i 匹配，第 i 个字段值，以 ( 开头，以 ) 结尾。运算符 !~ 否定匹配结果然后 if 条件满足未用括号括起来的字段，例如 "A"|"B"|"C"。
函数gsub("\"\\|\"",$i)删除子串"|" $i 中的尽可能多。字符 " 必须用 \ 和 | 必须用 \\ 转义。它可能会使代码变得模糊并且可读性较差。
最后的 print 是 print $0 的简写，它打印修改后的由字段组成的行：$1,$2,... $4 通过用 OFS 分隔，也分配给“.”。

假设/理解：

字段以句点分隔
用括号包裹的字段将被保留
所有其他字段都有前导/尾随双引号，而所有其他双引号以及管道都将被删除

示例数据：

$ cat pipes.dat
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
"j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")

一个awk想法：

awk '
BEGIN { FS=OFS="." }                                      # define input/output field separator as a period

      { printf "############\nbefore: %s\n",$0            # print a record separator and the current input line;
                                                          # solely for display purposes; this line can
                                                          # be removed/commented-out once logic is verified

        for (i=1; i<=NF; i++)                             # loop through fields
            if ( $i !~ "^[(].*[)]$" )                     # if field does not start/end with parens then ...
                $i="\"" gensub(/"|\|/,"g",$i) "\""     # replace field with a new double quote (+) modified string
                                                          # whereby all double quotes and pipes are removed (+)
                                                          # a new ending double quote

        printf "after : %s\n",$0                          # print the newly modified line;
                                                          # can be replaced with "print" once logic is verified
      }
' pipes.dat                                               # read data from file; to read from a variable remove this line and ...
#' <<< "${variable_name}"                                 # uncomment this line

以上生成：

############
before: ("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
after : ("a"|"b"|"c")."ABC".("e"|"f")."EF"
############
before: "j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")
after : "jKL"."mnop".("x"|"y"|"z")

删除评论并进行 printf 更改后：

awk '
BEGIN { FS=OFS="." }
      { for (i=1; i<=NF; i++)
            if ( $i !~ "^[(].*[)]$" )
                $i="\"" gensub(/"|\|/,$i) "\"" 
        print
      }
' pipes.dat

产生：

("a"|"b"|"c")."ABC".("e"|"f")."EF"
"jKL"."mnop".("x"|"y"|"z")

使用 SED 替换括号内的特定模式？

如何解决使用 SED 替换括号内的特定模式？

解决方法

相关推荐