使用 pandas read_csv 读取此制表符分隔文件时丢失行
Rows are lost when reading this tab-separated file with pandas read_csv
我有一个具有以下格式的 .text
文件,其中字段(索引号、名称和消息)由 \t
分隔(制表符分隔):
712 ben Battle of the Books
713 james i used to be in TOM
714 tomy i was in BOB once
715 ben Tournaments of Minds
716 tommy Also the Lion in the upcoming school play
717 tommy Can you guess
718 tommy P
...
我用 read_csv
读入数据框:
chat = pd.read_csv("f.text", sep = "\t", header = None, usecols = [2])
但是数据框只有 9812
行,而普通文件有超过 12428
行(只有 21 行空行)。这很奇怪。你有什么主意吗?谢谢。
我认为你需要添加参数 quoting
:
import csv
chat = pd.read_csv("f.text",sep = "\t", header = None, usecols = [2], quoting=csv.QUOTE_NONE)
我有一个具有以下格式的 .text
文件,其中字段(索引号、名称和消息)由 \t
分隔(制表符分隔):
712 ben Battle of the Books
713 james i used to be in TOM
714 tomy i was in BOB once
715 ben Tournaments of Minds
716 tommy Also the Lion in the upcoming school play
717 tommy Can you guess
718 tommy P
...
我用 read_csv
读入数据框:
chat = pd.read_csv("f.text", sep = "\t", header = None, usecols = [2])
但是数据框只有 9812
行,而普通文件有超过 12428
行(只有 21 行空行)。这很奇怪。你有什么主意吗?谢谢。
我认为你需要添加参数 quoting
:
import csv
chat = pd.read_csv("f.text",sep = "\t", header = None, usecols = [2], quoting=csv.QUOTE_NONE)