使用 pandas 从 CSV 中删除非 ASCII 字符
Remove non-ascii characters from CSV using pandas
我正在 SQL 服务器数据库中查询 table 并使用 pandas:
导出到 CSV
import pandas as pd
df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)
有没有办法在导出 CSV 文件时删除非 ASCII 字符?
你可以读入文件然后用正则表达式去掉non-ASCII个字符:
df.to_csv(csvFile, index=False)
with open(csvFile) as f:
new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())
with open(csvFile, 'w') as f:
f.write(new_text)
这就是我 运行 遇到的情况。以下是对我有用的方法:
import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
for line in infile: #keep looping until we hit EOF (meaning there's no more lines to read)
outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)
我正在 SQL 服务器数据库中查询 table 并使用 pandas:
导出到 CSVimport pandas as pd
df = pd.read_sql_query(sql, conn)
df.to_csv(csvFile, index=False)
有没有办法在导出 CSV 文件时删除非 ASCII 字符?
你可以读入文件然后用正则表达式去掉non-ASCII个字符:
df.to_csv(csvFile, index=False)
with open(csvFile) as f:
new_text = re.sub(r'[^\x00-\x7F]+', '', f.read())
with open(csvFile, 'w') as f:
f.write(new_text)
这就是我 运行 遇到的情况。以下是对我有用的方法:
import re
regex = re.compile(r'[^\x00-\x7F]+') #regex that matches non-ascii characters
with open(csvFile, 'r') as infile, open('myfile.csv', 'w') as outfile:
for line in infile: #keep looping until we hit EOF (meaning there's no more lines to read)
outfile.write(regex.sub('', line)) #write the current line in the input file to the output file, but if it matches our regex then we replace it with nothing (so it will get removed)