如何读取 python 中附加逗号、序列号和句号的 csv 文件的前 100 行?

How to read first 100 rows of a csv file in python appending comma, serial number and full stop marks?

假设我在 .csv 文件中有 2 列和 3000 行。我只想读取 csv 文件的前 100 行,我需要在第一列之后附加一个逗号 (,) 并且需要插入一个句号 (.) 来结束排。有什么办法可以达到同样的效果。另外,我需要在阅读第一行之前包括序列号。如何实现?

输入格式:

question              answer
what is your name     i am maxi
are you happy         yes i am
what you do           i am a student

输出:

1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.

我试过的代码如下

import csv
import itertools

with open('data.csv', 'r') as f:
   mycsv = csv.reader(f)
   next(mycsv, None)
   for row in itertools.islice(mycsv, 100):
       row = ('"{}."'.format(', '.join(row)) for row in mycsv)

       raw_text = ', '.join(row)
       print(raw_text)

简单使用pandas库:

import pandas as pd
# to load data from file
df = pd.read_csv("data.csv")
# test data
df = pd.DataFrame({"question": ['what is your name', 'are you happy', 'what you do '],
                   "answer": ["i am maxi", "yes i am", "i am a student"]})

# get fist 100 rows
df = df[:100]
# set numbers
df['number'] = np.arange(1,len(df)+1).astype(str)

df['summary'] = df['number'] + '. ' + df['question'] + ', ' + df['answer'] + '.'

输出:

            question          answer number                          summary
0  what is your name       i am maxi      1  1. what is your name, i am maxi.
1      are you happy        yes i am      2       2. are you happy, yes i am.
2       what you do   i am a student      3  3. what you do , i am a student.

假设关键字段由多个空格分隔:

import re

with open('test.csv', 'r') as f:
    next(f)
    pat = re.compile(r'\s{2,}')

    for i, row in enumerate(f, 1):
        print('{}. {}.'.format(i, pat.sub(', ', row.strip(), 1)))
        if i == 100: break

正则表达式 \s{2,} 详细信息:

  • \s - 空白字符
  • {2,} - {n,m} 其中 n >= 0 且 m >= n。在 nm 次之间重复上一项。贪婪,因此在将重复减少到 n 次之前尝试重复 m 次。前任。 a{2,4} 匹配 aaaaaaaaa

示例输出:

1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.

的无正则表达式变体:

创建演示数据:

with open("data.csv","w") as f: 
    f.write(f"""question              answer
what is your name     i am maxi
are you happy         yes i am
what you do           i am a student
""") 
    for i in range(10): # 30some more lines
        f.write(f"""what is your name     i am maxi
are you happy         yes i am
what you do           i am a student
""") 

处理数据:

with open('data.csv', 'r') as f:
    next(f) # skip header
    skipped = 0
    for number, line in enumerate(f,1):
        if line.strip():
            a,b = line.split("  ",1) # split at 2 spaces
            print(f"{number-skipped}. {a.strip()}, {b.strip()}.")
        else: 
            skipped += 1
        if number == 10: # reduced to 10 due to output lenght 
              break

输出:

1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
4. what is your name, i am maxi.
5. are you happy, yes i am.
6. what you do, i am a student.
7. what is your name, i am maxi.
8. are you happy, yes i am.
9. what you do, i am a student.
10. what is your name, i am maxi.

这甚至可以优雅地处理数据中的空行。