如何将每个列值分配给它的名称？

如何解决如何将每个列值分配给它的名称？

我有一个 MetaData.csv 文件，其中包含许多用于执行分析的值。我想要的是： 1- 读取列名并使变量类似于列名。 2- 将每列中的值作为可以被其他命令读取的整数放入变量中。 column_name=Its_value

元数据.csv：

const App = () => (
  <SafeAreaProvider> 
    <NavigationContainer>
      <ApolloProvider client={client}>
        <Stack.Navigator headerMode="none">
          <Stack.Screen name="LandingScreen" component={LandingScreen} />
          <Stack.Screen name="LoginScreen" component={LoginScreen} />
          <Stack.Screen name="SignUpScreen" component={SignUpScreen} />
          <Stack.Screen name="ProfileScreen" component={ProfileScreen} />
        </Stack.Navigator>
      </ApolloProvider>
    </NavigationContainer>
  </SafeAreaProvider>
);

const ProfileScreen = () => {
  const {loading,error,data} = useQuery(GET_USER);
  if (loading) {
    return <Text>Loading..</Text>;
  }
  
  const { user,appointmentsForUser } = data;
  return (
    <SafeAreaView style={{ flex: 1,backgroundColor: colors.purple }}>
      <StatusBar barStyle="light-content" backgroundColor={colors.purple} />
      <View style={styles.container}>
        <UpcomingAppointments user={user} appointmentsForUser={appointmentsForUser} />
        <View style={styles.child}>
          <View>
            <Text style={styles.title}>Placeholder</Text>
            <Button label="Placeholder" />
          </View>
        </View>
        <View style={styles.halfScreen}>
          <View>
            <Text style={styles.title} >Placeholder</Text>
          </View>
          <View style={styles.rightHalf}>
            <Text style={styles.title}>Placeholder</Text>
          </View>
        </View>
      </View>
    </SafeAreaView>
  );
};

我编写了以下代码，但效果不佳：

MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01

输出：

#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array 

for i in $(seq 1 ${#Col_Names[@]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w 
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done

预期输出：

HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found

问题：

1- 我想使用数组长度 (${#Col_Names[@]}) 作为最后一次迭代，即 5，但数组索引从 0 (0-4) 开始。所以 MAF 列没有被循环捕获。循环也迭代两次（一次是 0-4，再次是 2-4！）。 2- 当我尝试调用变量中的值 (echo $MAF) 时，它们是空的！

非常感谢任何解决方案。

解决方法

这会根据您发布的示例输入生成您发布的预期输出：

$ awk -F,-v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i],$i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01

如果这不是您所需要的全部，请编辑您的问题以阐明您的要求。

我真的不认为你可以在 Bash 中实现一个强大的 CSV 阅读器/解析器，但你可以实现它以在某种程度上使用简单的 CSV 文件。例如，一个非常简单的 bash 实现的 CSV 可能如下所示：

#!/bin/bash

set -e

ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
    if test "$ROW_NUMBER" == '0'; then
        for (( I = 0; I < ${#ROW[@]}; I++ )); do
            HEADERS["$I"]="${ROW[I]}"
        done
    else
        declare -A DATA_ROW_MAP
        for (( I = 0; I < ${#ROW[@]}; I++ )); do
            DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
        done
# DEMO {
        echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
        unset DATA_ROW_MAP
    fi
    ROW_NUMBER=$((ROW_NUMBER + 1))
done

请注意，它有多个缺点：

它仅适用于以 , 分隔的字段（真正的“C”SV）；
它不能处理多行记录；
它无法处理字段转义；
它认为第一行总是代表标题行。

这就是为什么许多命令可能会产生和使用以 \0 分隔的数据的原因，只是因为此控制字符可能更易于使用。现在我不确定 test 是否是 bash 执行的唯一外部命令（我相信是，但它可能可以使用 case 重新实现，这样就不会外部 test 是否被执行？）。

使用示例（带有演示输出）：

./read-csv.sh < MetaData.csv

19.vcf.gz    0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz

我根本不建议使用这个解析器，但建议使用更面向 CSV 的工具（Python 可能是最容易使用的选择；+ 或者如果你最喜欢的语言，如你所提到的，是 R，那么可能这是您的另一个选择：Run R script from command line ).

如果我正确理解您的要求，请您尝试以下操作：

#!/bin/bash

nr=1                                    # initialize input line number to 1
while IFS=,read -r -a ary; do          # split the line on "," then assign "ary" to the fields
    if (( nr == 1 )); then              # handle the header line
        col_names=("${ary[@]}")         # assign column names
    else                                # handle the body lines
        for (( i = 0; i < ${#ary[@]}; i++ )); do
            printf -v "${col_names[i]}" "${ary[i]}"
                                        # assign the variable "${col_names[i]}" to the input field
        done
        # now you can access the values via its column name
        echo "Fnames=$Fnames"
        echo "MAF=$MAF"
        fname_list+=("$Fnames")         # create a list of Fnames
    fi
    (( nr++ ))                          # increment the input line number
done < MetaData.csv
echo "${fname_list[@]}"                 # print the list of Fnames

输出：

Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz

statetemt IFS=,read -a ary 大体上等同于您的前三行；它将输入拆分为“,”，并分配数组变量 ary 到字段值。
有几种方法可以使用变量的值作为变量名（间接变量引用）。 printf -v VarName Value 就是其中之一。

[编辑]

基于 OP 更新的输入文件，这是另一个版本：

#!/bin/bash

nr=1                                    # initialize input line number to 1
while IFS=," then assign "ary" to the fields
    if (( nr == 1 )); then              # handle the header line
        col_names=("${ary[@]}")         # assign column names
    else                                # handle the body lines
        for (( i = 0; i < ${#ary[@]}; i++ )); do
            printf -v "${col_names[i]}" "${ary[i]}"
                                        # assign the variable "${col_names[i]}" to the input field
        done
    fi
    (( nr++ ))                          # increment the input line number
done < MetaData.csv

for n in "${col_names[@]}"; do          # iterate over the variable names
    echo "$n=${!n}"                     # print variable name and its value
done

# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"

输出：

MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01

至于输出，前四行由echo "$n=${!n}"打印，最后一行由echo "MAF=$MAF ...打印。您可以根据您对以下代码中变量的使用情况选择任一语句。

如何将每个列值分配给它的名称？

如何解决如何将每个列值分配给它的名称？

解决方法

相关推荐