Python CSV reader TypeError: string pattern on bytes object

Python CSV reader TypeError: string pattern on bytes object

我第一次尝试使用 python CSV reader。我有一个方法要求用户 select 他们想要解析的文件,然后将该文件路径传递给解析方法:

def parse(filename):
        parsedFile = []
        with open(filename, 'rb') as csvfile:
                dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,|')
                csvfile.seek(0)
                reader = csv.reader(csvfile, dialect)

                for line in reader:
                    parsedFile.append(line)
                return(parsedFile)

def selectFile():
        print('start selectFile method')
        localPath = os.getcwd() + '\Files'
        print(localPath)
        for fileA in os.listdir(localPath):
                print (fileA)

        test = False
        while test == False:
                fileB = input('which file would you like to DeID? \n')
                conjoinedPath = os.path.join(localPath, fileB)
                test = os.path.isfile(conjoinedPath)


        userInput = input('Please enter the number corresponding to which client ' + fileB + ' belongs to. \n\nAcceptable options are: \n1.A \n2.B \n3.C \n4.D \n5.E \n')
        client = ''
        if (userInput == '1'):
                client = 'A'
        elif (userInput == '2'):
                client = 'B'
        elif (userInput == '3'):
                client = 'CServices'
        elif (userInput == '4'):
                client = 'D'
        elif (userInput == '5'):
                client = 'E'
        return(client, conjoinedPath)



def main():
       x, y = selectFile() 
       parse(y)


if __name__ == '__main__':
        main()

一切似乎都在按预期工作,但我得到了:

TypeError: can't use a string pattern on a bytes-like object 

尝试打开文件名时(代码中的第 3 行)。我尝试将文件名转换为字符串类型和字节类型,但似乎都不起作用。

这是输出:

>>> 
start selectFile method
C:\PythonScripts\DeID\Files
89308570_201601040630verifyppn.txt
89339985_201601042316verifyppn.txt
which file would you like to DeID? 
89339985_201601042316verifyppn.txt
Please enter the number corresponding to which client 89339985_201601042316verifyppn.txt belongs to. 

Acceptable options are: 
1.Client A
2.Client B
3.Client C
4.Client D
5.Client E
3
Traceback (most recent call last):
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 107, in <module>
    main()
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 103, in main
    parse(y)
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 63, in parse
    dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,|')
  File "C:\Python34\lib\csv.py", line 183, in sniff
    self._guess_quote_and_delimiter(sample, delimiters)
  File "C:\Python34\lib\csv.py", line 224, in _guess_quote_and_delimiter
    matches = regexp.findall(data)
TypeError: can't use a string pattern on a bytes-like object
>>> 

我不确定我做错了什么。

这不是文件名,而是您打开文件的事实:

with open(filename, 'rb') as csvfile:

其中'rb'模式指定文件将以二进制模式打开,即文件内容被视为byte objects. Documentation:

'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

然后您尝试使用 csv.Sniff().sniff() 和字符串模式在其中进行搜索,正如 TypeError 优雅地指出的那样,这是不允许的。

从模式中删除 b 并简单地使用 r 就可以了。


注意:Python 2.x 在 Unix 机器上不会表现出这种行为。这是 bytesstr 对象在 3.x 中分离为不同类型的结果。