如何读取 python 中附加逗号、序列号和句号的 csv 文件的前 100 行?
How to read first 100 rows of a csv file in python appending comma, serial number and full stop marks?
假设我在 .csv 文件中有 2 列和 3000 行。我只想读取 csv 文件的前 100 行,我需要在第一列之后附加一个逗号 (,
) 并且需要插入一个句号 (.
) 来结束排。有什么办法可以达到同样的效果。另外,我需要在阅读第一行之前包括序列号。如何实现?
输入格式:
question answer
what is your name i am maxi
are you happy yes i am
what you do i am a student
输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
我试过的代码如下
import csv
import itertools
with open('data.csv', 'r') as f:
mycsv = csv.reader(f)
next(mycsv, None)
for row in itertools.islice(mycsv, 100):
row = ('"{}."'.format(', '.join(row)) for row in mycsv)
raw_text = ', '.join(row)
print(raw_text)
简单使用pandas库:
import pandas as pd
# to load data from file
df = pd.read_csv("data.csv")
# test data
df = pd.DataFrame({"question": ['what is your name', 'are you happy', 'what you do '],
"answer": ["i am maxi", "yes i am", "i am a student"]})
# get fist 100 rows
df = df[:100]
# set numbers
df['number'] = np.arange(1,len(df)+1).astype(str)
df['summary'] = df['number'] + '. ' + df['question'] + ', ' + df['answer'] + '.'
输出:
question answer number summary
0 what is your name i am maxi 1 1. what is your name, i am maxi.
1 are you happy yes i am 2 2. are you happy, yes i am.
2 what you do i am a student 3 3. what you do , i am a student.
假设关键字段由多个空格分隔:
import re
with open('test.csv', 'r') as f:
next(f)
pat = re.compile(r'\s{2,}')
for i, row in enumerate(f, 1):
print('{}. {}.'.format(i, pat.sub(', ', row.strip(), 1)))
if i == 100: break
正则表达式 \s{2,}
详细信息:
\s
- 空白字符
{2,}
- {n,m} 其中 n >= 0 且 m >= n。在 n
到 m
次之间重复上一项。贪婪,因此在将重复减少到 n
次之前尝试重复 m
次。前任。 a{2,4}
匹配 aaaa
、aaa
或 aa
示例输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
的无正则表达式变体:
创建演示数据:
with open("data.csv","w") as f:
f.write(f"""question answer
what is your name i am maxi
are you happy yes i am
what you do i am a student
""")
for i in range(10): # 30some more lines
f.write(f"""what is your name i am maxi
are you happy yes i am
what you do i am a student
""")
处理数据:
with open('data.csv', 'r') as f:
next(f) # skip header
skipped = 0
for number, line in enumerate(f,1):
if line.strip():
a,b = line.split(" ",1) # split at 2 spaces
print(f"{number-skipped}. {a.strip()}, {b.strip()}.")
else:
skipped += 1
if number == 10: # reduced to 10 due to output lenght
break
输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
4. what is your name, i am maxi.
5. are you happy, yes i am.
6. what you do, i am a student.
7. what is your name, i am maxi.
8. are you happy, yes i am.
9. what you do, i am a student.
10. what is your name, i am maxi.
这甚至可以优雅地处理数据中的空行。
假设我在 .csv 文件中有 2 列和 3000 行。我只想读取 csv 文件的前 100 行,我需要在第一列之后附加一个逗号 (,
) 并且需要插入一个句号 (.
) 来结束排。有什么办法可以达到同样的效果。另外,我需要在阅读第一行之前包括序列号。如何实现?
输入格式:
question answer
what is your name i am maxi
are you happy yes i am
what you do i am a student
输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
我试过的代码如下
import csv
import itertools
with open('data.csv', 'r') as f:
mycsv = csv.reader(f)
next(mycsv, None)
for row in itertools.islice(mycsv, 100):
row = ('"{}."'.format(', '.join(row)) for row in mycsv)
raw_text = ', '.join(row)
print(raw_text)
简单使用pandas库:
import pandas as pd
# to load data from file
df = pd.read_csv("data.csv")
# test data
df = pd.DataFrame({"question": ['what is your name', 'are you happy', 'what you do '],
"answer": ["i am maxi", "yes i am", "i am a student"]})
# get fist 100 rows
df = df[:100]
# set numbers
df['number'] = np.arange(1,len(df)+1).astype(str)
df['summary'] = df['number'] + '. ' + df['question'] + ', ' + df['answer'] + '.'
输出:
question answer number summary
0 what is your name i am maxi 1 1. what is your name, i am maxi.
1 are you happy yes i am 2 2. are you happy, yes i am.
2 what you do i am a student 3 3. what you do , i am a student.
假设关键字段由多个空格分隔:
import re
with open('test.csv', 'r') as f:
next(f)
pat = re.compile(r'\s{2,}')
for i, row in enumerate(f, 1):
print('{}. {}.'.format(i, pat.sub(', ', row.strip(), 1)))
if i == 100: break
正则表达式 \s{2,}
详细信息:
\s
- 空白字符{2,}
- {n,m} 其中 n >= 0 且 m >= n。在n
到m
次之间重复上一项。贪婪,因此在将重复减少到n
次之前尝试重复m
次。前任。a{2,4}
匹配aaaa
、aaa
或aa
示例输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
创建演示数据:
with open("data.csv","w") as f:
f.write(f"""question answer
what is your name i am maxi
are you happy yes i am
what you do i am a student
""")
for i in range(10): # 30some more lines
f.write(f"""what is your name i am maxi
are you happy yes i am
what you do i am a student
""")
处理数据:
with open('data.csv', 'r') as f:
next(f) # skip header
skipped = 0
for number, line in enumerate(f,1):
if line.strip():
a,b = line.split(" ",1) # split at 2 spaces
print(f"{number-skipped}. {a.strip()}, {b.strip()}.")
else:
skipped += 1
if number == 10: # reduced to 10 due to output lenght
break
输出:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
4. what is your name, i am maxi.
5. are you happy, yes i am.
6. what you do, i am a student.
7. what is your name, i am maxi.
8. are you happy, yes i am.
9. what you do, i am a student.
10. what is your name, i am maxi.
这甚至可以优雅地处理数据中的空行。