如何使用 pandas read_csv 读取具有反斜杠和双引号的csv文件
How to use pandas read_csv to read csv file having backward slash and double quotation
我有一个这样的 CSV 文件(逗号分隔)
ID, Name,Context, Location
123,"John","{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 1"
234,"Mike","{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 2"
我想像这样创建 DataFrame:
ID | Name |Context |Location
123| John |{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 1
234| Mike |{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 2
你能帮我看看如何使用 pandas read_csv 吗?
答案 - 如果您愿意接受 \
字符被剥离:
pd.read_csv(your_filepath, escapechar='\')
ID Name Context Location
0 123 John {"Organization":{"Id":12345,"IsDefault":false}... Road 1
1 234 Mike {"Organization":{"Id":23456,"IsDefault":false}... Road 2
如果您真的想要反斜杠,请回答 - 使用自定义转换器:
def backslash_it(x):
return x.replace('"','\"')
pd.read_csv(your_filepath, escapechar='\', converters={'Context': backslash_it})
ID Name Context Location
0 123 John {\"Organization\":{\"Id\":12345,\"IsDefault\":... Road 1
1 234 Mike {\"Organization\":{\"Id\":23456,\"IsDefault\":... Road 2
read_csv
上的 escapechar
用于实际读取 csv
,然后自定义转换器将反斜杠放回原处。
请注意,我调整了 header 行以使列名称更容易匹配:
ID,Name,Context,Location
我有一个这样的 CSV 文件(逗号分隔)
ID, Name,Context, Location
123,"John","{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 1"
234,"Mike","{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}","Road 2"
我想像这样创建 DataFrame:
ID | Name |Context |Location
123| John |{\"Organization\":{\"Id\":12345,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 1
234| Mike |{\"Organization\":{\"Id\":23456,\"IsDefault\":false},\"VersionNumber\":-1,\"NewVersionId\":\"88229ef9-e97b-4b88-8eba-31740d48fd15\",\"ApiIntegrationType\":0,\"PortalIntegrationType\":0}|Road 2
你能帮我看看如何使用 pandas read_csv 吗?
答案 - 如果您愿意接受 \
字符被剥离:
pd.read_csv(your_filepath, escapechar='\')
ID Name Context Location
0 123 John {"Organization":{"Id":12345,"IsDefault":false}... Road 1
1 234 Mike {"Organization":{"Id":23456,"IsDefault":false}... Road 2
如果您真的想要反斜杠,请回答 - 使用自定义转换器:
def backslash_it(x):
return x.replace('"','\"')
pd.read_csv(your_filepath, escapechar='\', converters={'Context': backslash_it})
ID Name Context Location
0 123 John {\"Organization\":{\"Id\":12345,\"IsDefault\":... Road 1
1 234 Mike {\"Organization\":{\"Id\":23456,\"IsDefault\":... Road 2
read_csv
上的 escapechar
用于实际读取 csv
,然后自定义转换器将反斜杠放回原处。
请注意,我调整了 header 行以使列名称更容易匹配:
ID,Name,Context,Location