按日期删除行并为多个 csv 添加文件名列

Question

我有多个以“,”分隔的 csv 文件，其中记录了水管压力传感器数据，已按日期排序 older-newer。对于所有原始文件，第一列始终包含格式为 YYYYMMDD 的日期。我看过类似的讨论主题，但找不到我需要的东西。

Python 脚本，用于向目录中的每个 csv 文件添加一个新列，其中标题为 "Pipe" 的新列的每一行都有一个文件名，省略文件扩展字符串。
可以选择将截止日期指定为 YYYYMMDD，以便删除原始输入文件中的行。例如，如果某个文件的日期为 20140101 到 20140630，我想删除日期 < 20140401 的数据行。
可以选择在进行这些修改后覆盖原始文件，或者将每个文件保存到不同的目录，文件名与原始文件相同。

输入：PipeRed.csv； Headers：日期、压力 1、压力 2、温度 1、温度 2 等，

输出：PipeRed.csv； Headers：管道、日期、压力 1、压力 2、温度 1、温度 2 等，

我找到了一些代码并稍作修改，但它不会像上面描述的那样删除行，而是将文件名列添加到最后而不是第一个。

import csv
import sys
import glob
import re

for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    writer = csv.writer(f)
    writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
    header = reader.next()
    header.append(filename.replace('.csv',""))
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename.replace('.csv',""))
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

Answer 1

这个功能应该很接近你想要的。我已经在 Python 2.7.9 和 3.4.2 中对其进行了测试。我发布的初始版本有一些问题，因为——正如我当时提到的——它未经测试。我不确定您使用的是 Python 2 还是 3，但这在任何一个中都能正常工作。

与先前版本相比的另一个变化是可选关键字日期参数的名称已从 cutoff_date 更改为 start_date 以更好地反映它是什么。 cutoff date 通常表示可以做某事的最后日期——与您在问题中使用它的方式相反。另请注意，提供的任何日期都应该是字符串，即 start_date='20140401'，而不是整数。

一项改进是，如果指定了输出目录但尚不存在，它现在将创建输出目录。

import csv
import os
import sys

def open_csv(filename, mode='r'):
    """ Open a csv file in proper mode depending on Python verion. """
    return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
            open(filename, mode=mode, newline=''))

def process_file(filename, start_date=None, new_dir=None):
    # Read the entire contents of the file into memory skipping rows before
    # any start_date given (assuming row[0] is a date column).
    with open_csv(filename, 'r') as f:
        reader = csv.reader(f)
        header = next(reader)  # Save first row.
        contents = [row for row in reader if start_date and row[0] >= start_date
                                                or not start_date]

    # Create different output file path if new_dir was specified.
    basename = os.path.basename(filename)  # Remove dir name from filename.
    output_filename = os.path.join(new_dir, basename) if new_dir else filename
    if new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.
        os.makedirs(new_dir)

    # Open the output file and create a CSV writer for it.
    with open_csv(output_filename, 'w') as f:
        writer = csv.writer(f)

        # Add name of new column to header.
        header = ['Pipe'] + header  # Prepend new column name.
        writer.writerow(header)

        # Data for new column is the base filename without extension.
        new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]

        # Process each row of the body by prepending data for new column to it.
        writer.writerows((new_column+row for row in contents))

按日期删除行并为多个 csv 添加文件名列

delete rows by date and add file name column for multiple csv

python

csv

date

append