如何解决对文件中数据块内的行进行排序根据列中的索引将一个块中的行移动到数据块中的不同位置
我的文件中有数据,文件排列如下。这仅显示了两个数据块/迭代。
21 ! <-- This is the number of lines of data in the data block/iteration. It never changes.
Linkages. Iteration:1_1010 ! <-- This number does not always increase by 5 like in this example,but always increases.
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
21
Linkages. Iteration:1_1015
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
我需要做的是重新分配“C”线。具体来说,我需要将“C”线分成四块,然后将第一块 C 线移到第一组“ABB”线下方。这是一个数据块/迭代的示例(我想对文件中的所有数据块/迭代执行完全相同的操作):
21
Linkages. Iteration:1_1010
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
我一直在尝试使用“排序”在 bash 中执行此操作,但没有取得太大进展。我发现按列索引(如我的第一列)排序的一般方法是这样做:
sort -n -k1 file
我还发现了这篇文章 (https://unix.stackexchange.com/questions/99582/sorting-blocks-of-lines),其中第二个答案使用“拆分”将文件拆分为由四行组成的块:
split -a 6 -l 4 input_file my_prefix_
但我不知道如何使用数据块/迭代移动四行。如果有人知道可以解释这一点的资源,最好能找到。
解决方法
在每个 Unix 机器上的任何 shell 中使用任何 awk:
$ cat tst.awk
$1 ~ /^[ABC]$/ {
vals[++numVals] = $0
next
}
{
prtVals()
print
}
END { prtVals() }
function prtVals( row,valNr,blocks,numBlocks,blockNr,numCs) {
if ( numVals != 0 ) {
for (valNr=1; valNr<=numVals; valNr++) {
row = vals[valNr]
split(row,f)
if ( f[1] == "A" ) {
++numBlocks
}
if ( f[1] == "C" ) {
if ( (++numCs % 4) == 1 ) {
blockNr++
}
blocks[blockNr] = blocks[blockNr] row ORS
}
else {
blocks[numBlocks] = blocks[numBlocks] row ORS
}
}
for (blockNr=1; blockNr<=numBlocks; blockNr++) {
printf "%s",blocks[blockNr]
}
delete vals
numVals = 0
}
}
$ awk -f tst.awk file
21 ! <-- This is the number of lines of data in the data block/iteration. It never changes.
Linkages. Iteration:1_1010 ! <-- This number does not always increase by 5 like in this example,but always increases.
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
21
Linkages. Iteration:1_1015
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
,
这是 GNU awk 中的一个。只用你的数据测试过,如果数据不完美,我不知道会发生什么,可能你会得到空行。此外,它使用 getline
,它可能有问题,我没有检查它的返回值或检查几乎任何其他东西——但从好的方面来说,这是你开始练习 awk 的一个好的开始技能... :D
$ gawk '
BEGIN {
o="A,B,C,C" # the predefined order
n=split(o,p,/,/) # split to p array to be fetched
}
{
c=d=$1 # count of input lines
while(c-->=0) { # keep reading til the reading is done
getline
a[$1][++i[$1]]=$0 # hash records to a 2-d array
}
print d
print a["Linkages."][1] # this may help in understanding the a array
do { # once required amount is hashed
for(j=1;j<=n;j++) { # use the BEGIN defined index order
print a[p[j]][++k[p[j]]] # and output
}
} while((d-=n)>0)
delete a;delete i;delete k # regroup for next batch
}' file
,
这个解决方案的工作原理是将每个 21 行的输入数据块分成一个二维数组,每个子维度由 7 行 (A,C
) 组成:
blocks[1][ 1] = A record
blocks[1][2-3] = B records
blocks[1][4-7] = C records
blocks[2][ 1] = A record
blocks[2][2-3] = B records
blocks[2][4-7] = C records
blocks[3][ 1] = A record
blocks[3][2-3] = B records
blocks[3][4-7] = C records
使用这种二维数组思想的一个awk
解决方案;我们将在处理 A
和 B
记录时填充数组,然后用 C
记录回填数组(即填充空白):
awk '
# function to print current array contents to stdout,then reset data structures for next block of lines
function print_blocks() {
for (i=1; i<=a; i++) # loop through first dimension indices
for (j=1; j<=7; j++) # loop through second dimension indices
print blocks[i][j]
delete blocks # clear array
a=0 # reset first dimension index
cblock=0 # reset C block processing flag
}
NF == 1 { # single field on line,eg,"21" ?
print_blocks() # flush previously populated array
print # print current line
next # skip to next line
}
/Linkages/ { print # print current line
next # skip to next line
}
$1 == "A" { # "A" record?
blocks[++a][1]=$0 # store current line in array
# and reset second dimension indexes ...
b=2 # for B records
c=4 # for C records
next
}
$1 == "B" { # "B" record?
blocks[a][b++]=$0 # store current line in array
next
}
$1 == "C" { # "C" record?
if (cblock==0) # if first C record then:
{ a=0 # reset first dimension index
cblock=1 # set flag to skip this logic for rest of C records
}
if (c==4) a++ # for each new set of 4x C records increment first dimension index
blocks[a][c]=$0 # store current line in array
c++ # increment second dimension index but ...
if (c>7) c=4 # make sure second dimension index is always in the range 4-7
next
}
END { print_blocks() } # flush the last set of array data to stdout
' data.txt
注意:
- 依赖于输入数据匹配样本数据(即
A,A,....
);如果输入顺序不同,则此代码可能不会生成所需的输出 - 对于给定的样本,这显然是硬编码的(即 21 行输入数据,每个
A,C
的输出块 7 行) - 可以修改代码以处理一组更动态的输入数据(但不是所有代码都是这样吗?)
- 可以删除注释以整理代码
- 需要
GNU awk
来实现二维数组(也称为数组数组)
针对给定的样本数据(data.txt
),上面生成:
21
Linkages. Iteration:1_1010
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
21
Linkages. Iteration:1_1015
A 1.010 -3.582 -3.135
B 0.730 -4.428 -3.854
B -3.883 4.671 0.010
C 3.944 2.513 -5.172
C -4.669 1.056 2.747
C 0.645 0.001 -3.737
C -2.875 -1.233 -0.538
A -0.223 2.522 -4.893
B 2.769 4.634 0.179
B -2.024 -3.640 -1.032
C 4.279 -5.187 -2.820
C 1.067 -2.279 2.021
C 2.667 -1.558 0.588
C 3.628 -0.025 2.464
A 4.613 3.914 1.567
B 2.746 -0.545 1.430
B -0.532 3.380 -2.107
C -0.023 1.717 1.175
C 0.925 -1.548 2.273
C 1.152 2.914 1.039
C 0.878 -0.445 -0.948
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。