使用逗号分隔值拆分 csv 中的行

Splitting rows in csv with comma separated values

我有 csv 文件,列中的信息(id 和文本)如下例所示:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3, 
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

我想要的输出是将 ID 传输到一行并将其与其文本相关联(用于数据库)。因为 csv 文件真的很大,我只给你分数来理解我想要什么:

| ID             | Features   
+----------------+-------------
| 1              | Šildomos grindys
| 2              | Šildomos grindys
| 2              | Rekuperacinė sistema
| 3              | null
| 4              | Skalbimo mašina
| 4              | Su baldais
| 4              | Šaldytuvas
| 4              | Šildomos grindys

我怎样才能通过 python 做到这一点?谢谢!

您可以阅读 CSV,然后将每一行附加到新列表中。

data.csv:

1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys

代码:

import csv

data = []
with open('data.csv') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        for i in range(1, len(row)):
            value = row[i].strip()
            if value == "":
                value = "null"
            data.append([int(row[0]), value])
print(data)

输出:

[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]

参考文献:

这是一种完成您所要求的方法:

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

输出:

contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys

更新:

另一种方法是考虑使用 pandas (docs)。 Pandas 提供了许多处理表格数据的强大方法,但它也有一些学习曲线:

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

输出

Dataframe read from input file:
  ID                                            Feature
0  1                                 [Šildomos grindys]
1  2           [Šildomos grindys, Rekuperacinė sistema]
2  3                                                 []
3  4  [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
  ID               Feature
0  1      Šildomos grindys
1  2      Šildomos grindys
2  2  Rekuperacinė sistema
3  3
4  4       Skalbimo mašina
5  4            Su baldais
6  4            Šaldytuvas
7  4      Šildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      Šildomos grindys
1   2      Šildomos grindys
2   2  Rekuperacinė sistema
3   3                   NaN
4   4       Skalbimo mašina
5   4            Su baldais
6   4            Šaldytuvas
7   4      Šildomos grindys

pandas 的文档链接在这里: