使用Julia语言处理文件

如何解决使用Julia语言处理文件

我是Julia的初学者（也是脚本初学者）。

我有一个包含4列的文本文件：

1 5.4 9.5 19.5

2 5.4 9.4 20.6

2 6.2 9.6 18.3

1 9.1 0.5 17.2

2 8.5 1.4 19.6

2 8.4 0.6 24.1

etc.

我不知道如何在Julia中如何根据现有的列模式122 122替换行中的某些值或添加新值。例如，我想添加带有字母C和O（当C为第一列中为1，当为2时为O）。我想在带有C和O的那一列之后添加新列，其中模式1 2 2由数字4指定，然后由数字5指定。这就是我想象的结果：

C 4 1 5.4 9.5 19.5 

O 4 2 5.4 9.4 20.6

O 4 2 6.2 9.6 18.3

C 5 1 9.1 0.5 17.2

O 5 2 8.5 1.4 19.6

O 5 2 8.4 0.6 24.1

谢谢您的帮助。

Kasia。

解决方法

在Julia中，字符串处理非常简单。您可能会编写一个使用输入和输出文件名的函数，如下所示：

function munge_file(in::AbstractString,out::AbstractString)
    # open the output file for writing
    open(out,"w") do out_io
        # open the input file for reading
        open(in,"r") do in_io
            # and process the contents
            munge_file(in_io,out_io)
        end
    end
end

现在，对munge_file的内部调用将必须完成实际的工作（这不是特别优化的，但是应该非常简单）：

function munge_file(input::IO,io::IO = IOBuffer())
    # initialize the pattern index
    pattern_index = 3
    # iterate over each line of the input
    for line in eachline(input)
        # skip empty lines
        isempty(line) && continue
        # split the current line into parts 
        parts = split(line,' ')
        # this line doesn't conform to the specified input pattern
        # might be better to throw an error here
        length(parts) == 4 || continue
        # this line starts a new pattern if the first character is a 1
        is_start = parse(Int,parts[1]) == 1
        # increment the counter (for the second output column)
        pattern_index += is_start
        # first column depends on whether a 1 2 2 pattern starts here or not
        print(io,is_start ? 'C' : 'O')
        print(io,' ')
        # print the pattern counter
        print(io,pattern_index)
        print(io,' ')
        # print the original line
        println(io,line)
    end
    return io
end

在REPL中使用代码会产生预期的输出：

shell> cat input.txt
1 5.4 9.5 19.5
2 5.4 9.4 20.6
2 6.2 9.6 18.3
1 9.1 0.5 17.2
2 8.5 1.4 19.6
2 8.4 0.6 24.1 

julia> munge_file("input.txt","output.txt")
IOStream(<file output.txt>)

shell> cat output.txt
C 4 1 5.4 9.5 19.5
O 4 2 5.4 9.4 20.6
O 4 2 6.2 9.6 18.3
C 5 1 9.1 0.5 17.2
O 5 2 8.5 1.4 19.6
O 5 2 8.4 0.6 24.1

假设您的文件为{^[0-9]{0,10}$|^[0-9\.]{0,10}$，您可以这样做：

input.txt

上面代码中的

点（open("output.txt","w") do f println.(Ref(f),replace.(replace.(readlines("input.txt"),r"^1 "=>"C "),r"^2 "=>"O ")) end;）对它进行矢量化处理，因此函数可用于矢量而不是标量。 .函数采用replace，正则表达式和新值。正则表达式中的String表示“行以”开头。

使用Julia语言处理文件

如何解决使用Julia语言处理文件

解决方法

相关推荐