集合中的字符串给出奇怪的结果

Question

我的代码正在读取 csv 文件的 header 并将其转换为 table 的查找 column_name=>column_index:

class CSVOutput:
  def __init__(self, csv_file, required_columns):
    csv_reader = csv.reader(csv_file)

    # Construct lookup table for header
    self.header = {}
    for idx, column in enumerate(next(csv_reader)):
      print(f"{column.lower().strip()} == key: {column.lower().strip() == 'key'}")
      print(f"{column.lower().strip()} is key: {column.lower().strip() is 'key'}")
      self.header[column.lower().strip()] = idx

    print(self.header)

     # Load the row data into memory/index it against key
     key_idx = self.header['key']

with open("test.csv") as csv_file:
    data = CSVOutput(csv_file, {})

当我运行这个时，我得到以下输出和错误：

{'key': 0, 'col1': 1, 'col2': 2}

key == key: False
key is key: False
col1 == key: False
col1 is key: False
col2 == key: False
col2 is key: False

Traceback (most recent call last):
  File "D:\compare.py", line 74, in <module>
    actual_data = CSVOutput(act_csv, required_columns)
  File "D:\compare.py", line 40, in __init__
    key_idx = self.header['key']
KeyError: 'key'

基本上，文字 'key' 和从文件加载的 'key' 之间似乎存在不等价关系。我试过在 notepad++ 中查看源文件并显示所有符号，但我没有看到任何区别。我也刚刚在十六进制编辑器中查看了 csv 文件，我可以看到开头如下所示：ï»¿Key，ï»¿ being EF BB BF。我不确定这是否是我问题的根源，但如果是，为什么 strip() 没有摆脱它，我该如何处理它？

有什么想法吗？

Answer 1

EF BB BF

这是UTF-8 BOM, you might use utf-8-sig encoding to deal with such files. Use encoding of open函数遵循的方式

with open("test.csv",encoding="utf-8-sig") as csv_file:

集合中的字符串给出奇怪的结果

String in set gives weird results

python

comparison

set

string-comparison