使用逗号分隔值拆分 csv 中的行
Splitting rows in csv with comma separated values
我有 csv 文件,列中的信息(id 和文本)如下例所示:
1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys
我想要的输出是将 ID 传输到一行并将其与其文本相关联(用于数据库)。因为 csv 文件真的很大,我只给你分数来理解我想要什么:
| ID | Features
+----------------+-------------
| 1 | Šildomos grindys
| 2 | Šildomos grindys
| 2 | Rekuperacinė sistema
| 3 | null
| 4 | Skalbimo mašina
| 4 | Su baldais
| 4 | Šaldytuvas
| 4 | Šildomos grindys
我怎样才能通过 python 做到这一点?谢谢!
您可以阅读 CSV,然后将每一行附加到新列表中。
data.csv
:
1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys
代码:
import csv
data = []
with open('data.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
for i in range(1, len(row)):
value = row[i].strip()
if value == "":
value = "null"
data.append([int(row[0]), value])
print(data)
输出:
[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]
参考文献:
这是一种完成您所要求的方法:
with open('infoo.txt', 'r', encoding="utf-8") as f:
records = []
rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
for row in rows:
for i in range(1, len(row)):
records.append([row[0], row[i] if row[i] else 'null'])
with open('outfoo.txt', 'w', encoding="utf-8") as g:
g.write('ID,Features\n')
for record in records:
g.write(f'{",".join(field for field in record)}\n')
# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
print('contents of output file:')
[print(row.strip('\n')) for row in f.readlines()]
输出:
contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys
更新:
另一种方法是考虑使用 pandas (docs)。 Pandas 提供了许多处理表格数据的强大方法,但它也有一些学习曲线:
import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
records = []
rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
print('Dataframe read from input file:'); print(df)
df = df.explode('Feature').reset_index(drop=True)
print('Dataframe with one Feature per row:'); print(df)
df.to_csv('outfoo.txt', index = False)
# check the output file:
df2 = pd.read_csv('outfoo.txt')
print('Dataframe re-read from output file:'); print(df2)
输出
Dataframe read from input file:
ID Feature
0 1 [Šildomos grindys]
1 2 [Šildomos grindys, Rekuperacinė sistema]
2 3 []
3 4 [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
ID Feature
0 1 Šildomos grindys
1 2 Šildomos grindys
2 2 Rekuperacinė sistema
3 3
4 4 Skalbimo mašina
5 4 Su baldais
6 4 Šaldytuvas
7 4 Šildomos grindys
Dataframe re-read from output file:
ID Feature
0 1 Šildomos grindys
1 2 Šildomos grindys
2 2 Rekuperacinė sistema
3 3 NaN
4 4 Skalbimo mašina
5 4 Su baldais
6 4 Šaldytuvas
7 4 Šildomos grindys
pandas 的文档链接在这里:
我有 csv 文件,列中的信息(id 和文本)如下例所示:
1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys
我想要的输出是将 ID 传输到一行并将其与其文本相关联(用于数据库)。因为 csv 文件真的很大,我只给你分数来理解我想要什么:
| ID | Features
+----------------+-------------
| 1 | Šildomos grindys
| 2 | Šildomos grindys
| 2 | Rekuperacinė sistema
| 3 | null
| 4 | Skalbimo mašina
| 4 | Su baldais
| 4 | Šaldytuvas
| 4 | Šildomos grindys
我怎样才能通过 python 做到这一点?谢谢!
您可以阅读 CSV,然后将每一行附加到新列表中。
data.csv
:
1, Šildomos grindys
2, Šildomos grindys, Rekuperacinė sistema
3,
4, Skalbimo mašina, Su baldais, Šaldytuvas, Šildomos grindys
代码:
import csv
data = []
with open('data.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
for i in range(1, len(row)):
value = row[i].strip()
if value == "":
value = "null"
data.append([int(row[0]), value])
print(data)
输出:
[[1, 'Šildomos grindys'], [2, 'Šildomos grindys'], [2, 'Rekuperacinė sistema'], [3, 'null'], [4, 'Skalbimo mašina'], [4, 'Su baldais'], [4, 'Šaldytuvas'], [4, 'Šildomos grindys']]
参考文献:
这是一种完成您所要求的方法:
with open('infoo.txt', 'r', encoding="utf-8") as f:
records = []
rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
for row in rows:
for i in range(1, len(row)):
records.append([row[0], row[i] if row[i] else 'null'])
with open('outfoo.txt', 'w', encoding="utf-8") as g:
g.write('ID,Features\n')
for record in records:
g.write(f'{",".join(field for field in record)}\n')
# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
print('contents of output file:')
[print(row.strip('\n')) for row in f.readlines()]
输出:
contents of output file:
ID,Features
1,Šildomos grindys
2,Šildomos grindys
2,Rekuperacinė sistema
3,null
4,Skalbimo mašina
4,Su baldais
4,Šaldytuvas
4,Šildomos grindys
更新:
另一种方法是考虑使用 pandas (docs)。 Pandas 提供了许多处理表格数据的强大方法,但它也有一些学习曲线:
import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
records = []
rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
print('Dataframe read from input file:'); print(df)
df = df.explode('Feature').reset_index(drop=True)
print('Dataframe with one Feature per row:'); print(df)
df.to_csv('outfoo.txt', index = False)
# check the output file:
df2 = pd.read_csv('outfoo.txt')
print('Dataframe re-read from output file:'); print(df2)
输出
Dataframe read from input file:
ID Feature
0 1 [Šildomos grindys]
1 2 [Šildomos grindys, Rekuperacinė sistema]
2 3 []
3 4 [Skalbimo mašina, Su baldais, Šaldytuvas, Šild...
Dataframe with one Feature per row:
ID Feature
0 1 Šildomos grindys
1 2 Šildomos grindys
2 2 Rekuperacinė sistema
3 3
4 4 Skalbimo mašina
5 4 Su baldais
6 4 Šaldytuvas
7 4 Šildomos grindys
Dataframe re-read from output file:
ID Feature
0 1 Šildomos grindys
1 2 Šildomos grindys
2 2 Rekuperacinė sistema
3 3 NaN
4 4 Skalbimo mašina
5 4 Su baldais
6 4 Šaldytuvas
7 4 Šildomos grindys
pandas 的文档链接在这里: