如何编写脚本来读取多个 CSV 文件名和数据并写入另一个 CSV 文件？

Question

我有很多 CSV 文件名，需要将文件中的所有文件名和数据写入另一个 CSV 文件。

示例：

文件 1：较少 bonding_err_bond0-if_eth2-d.rrd.csv

1617613500,0.0000000000e+00

文件 2：较少 bonding_err_bond0-if_eth3-d.rrd.csv

1617613500,0.0000000000e+00

最终输出结果

最终文件：less bonding.csv

bonding_err_bond0-if_eth2-d.rrd,bonding_err_bond0-if_eth3-d.rrd.csv
0.0000000000e+00,0.0000000000e+00

注意：脚本可以是python或bash脚本

Answer 1

所以基本上你想要一个 table 和 header 以及文件名和一串数据？这是一个可能对您有所帮助的片段

#!/bin/bash
HEADER=''
DATA=''
while IFS= read -r -d '' CSV
do
  HEADER="${HEADER}$(basename "$CSV"),"
  DATA="${DATA}$(cut -d "," -f 2 "$CSV"),"
done <   <(find ./ -name "*.csv" -type f -print0)
echo "${HEADER%,}"
echo "${DATA%,}"

首先我们初始化两个空变量，HEADER 将包含我们所有的文件名，DATA 包含每个文件的第二个字段，由 , 符号分隔。

之后我们有一个 while 循环，它可能看起来很复杂，但这里解释了其原因：https://github.com/koalaman/shellcheck/wiki/SC2044

TLDR 版本是我们要处理所有可能破坏我们的 for 循环的不寻常字符。

在循环中，我们将 CSV 变量中包含的文件名附加到 HEADER 变量。 basename 只给我们文件名部分，没有文件夹。如果你不需要 .csv 扩展，你可以使用 basename -s .csv "$CSV" 作为那里的命令。

DATA 以相同的方式处理，但我们通过 , 拆分文件内容并仅打印第二个字段。

形成两个字符串后，我们用删除的尾随逗号回显它们，这种技术称为 bash 参数替换，查看 https://www.cyberciti.biz/tips/bash-shell-parameter-substitution-2.html 了解更多。

此脚本将处理当前目录及其子目录中的所有 csv 文件。

要从它创建一个文件，只需将它的输出重定向到文件，即将这个脚本另存为 merge_csv.sh 和运行

bash merge_csv.sh > bonding.csv

测试：

正在生成 5 个内容相似的文件：

for i in $(seq 1 5); do echo "0.0000000000e+00,$i.0000000000e+00" > "$i.csv"; done

运行文件夹中的这个脚本导致：

1,2,3,4,5
1.0000000000e+00,2.0000000000e+00,3.0000000000e+00,4.0000000000e+00,5.0000000000e+00

Answer 2

Pandas Python 库非常适合处理 CSV。

import os
import pandas as pd
import re

out_file_name = './less bonding.csv'

# Create a Pandas DataFrame
output = pd.DataFrame()

# Remove any output files we might've made previously
if os.path.isfile(out_file_name):
    os.remove(out_file_name)

# Get all the files in the current dir
file_names = os.listdir()

# Loop through our file_names
for file_name in file_names:

    # Regex check it's a .csv file
    csv = re.match(r'^.+\.csv$', file_name)
    if(csv != None):

        # Read our csv into a DataFrame
        # To preserve our data rather than it be converted to floats, use dtype=str
        data = pd.read_csv(file_name, header=None, dtype=str)

        # Put column 1 of csv into column [file_name] of our output DataFrame
        output[file_name] = data[1]

# Remove the index (first column) - we don't need it
output.set_index(output.columns[0], inplace=True)

# Output it as a csv
output.to_csv(out_file_name)

这是输出：

less bonding_err_bond0-if_eth2-d.rrd.csv,less bonding_err_bond0-if_eth3-d.rrd.csv
0.0000000000e+00,0.0000000000e+00

Answer 3

顺便说一下Pandas，Python 图书馆很有用。

举个例子：

from pathlib import Path
import csv, os
import pandas as pd

def finalFile(fname):
    
    output = pd.DataFrame()

    file_names = os.listdir()

    for file_name in file_names:
        if file_name.startswith(fname):
            data = pd.read_csv(file_name, header=None, dtype=str)
            output[file_name.rsplit('.', 4)[2]] = data[1]

    output.set_index(output.columns[0], inplace=True)
    output.to_csv(fname.rsplit('.', 2)[2] + ".csv")


finalFile('xxx.test.test-bonding')

最终结果

test-bonding_err_bond0-if_eth3-d,test-bonding_err_bond0-if_eth2-d
0.0000000000e+00,0.0000000000e+00

如何编写脚本来读取多个 CSV 文件名和数据并写入另一个 CSV 文件？

How to write script to read many CSV filename and data and write into another CSV file?

python

csv

bash

shell

script