Python 在 csv 文件中检测到 delimiter/separator

Python detection of delimiter/separator in a csv file

我有一个函数可以读取和处理多个数据帧中的 *.csv 文件。

但是,并非所有 CSV 文件都具有相同的分隔符。那么,python如何检测csv文件有哪种类型的分隔符,然后在read_csv()函数中使用它来读取它在pandas中?

df = pd.read_csv(path, sep = 'xxx',header = None, index_col = 0)

更新

其实就是用engine='python'作为read_csv的参数。它将尝试自动检测正确的分隔符。

sepstr, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

使用csv.Sniffer:

import csv

def find_delimiter(filename):
    sniffer = csv.Sniffer()
    with open(filename) as fp:
        delimiter = sniffer.sniff(fp.read(5000)).delimiter
    return delimiter

演示:

>>> find_delimiter('data.csv')
','

>>> find_delimiter('data.txt')
' ' 

正如 Reda El Hail 之前在评论中所说,设置参数 sep = None,如下所示:

pandas.read_csv('data.csv',sep=None)

如果你使用 lib awswrangler 读取 s3 中的 csv 文件,你可以这样做:

awswrangler.s3.read_csv('s3://bucket/prefix', sep = None)