Python 需要帮助:无法读取 tsv 文件的内容并根据需要填充到字典中

Python Help needed : Unable to read contents of a tsv file and populate in the dictionary as desired

我要解析的内容:

我有一个如下所示的 tsv 文件: https://i.stack.imgur.com/yxsXD.png

最终目标是什么:

我的目标是在不使用 csv 解析器的情况下读取 tsv 文件并在字典和嵌套列表中填充 csv 文件的内容。

最后 in_memory_table 结构看起来 像这样(当然有两行以上):

{

 "header": [
    "STATION",
    "STATION_ID",
    "ELEVATION",
    "LAT",
    "LONG",
    "DATE",
    "MNTH_MIN",
    "MNTH_MAX"
],

"rows": [
    [
        "Tukwila",
        "12345afbl",
        "10",
        "47.5463454",
        "-122.34234234",
        "2016-01-01",
        "10",
        "41"
    ],
    [
        "Tukwila",
        "12345afbl",
        "10",
        "47.5463454",
        "-122.34234234",
        "2016-02-01",
        "5",
        "35"
    ],
]

}

我的代码如下所示:

in_memory_table = {
'header': [],
'rows': []      }

with open('fahrenheit_monthly_readings.tsv') as f:
in_file = f.readlines()

i = 0
for line in in_file:
    temp_list = [line.split('\t')]

    if (i == 0):
        in_memory_table['header']= line

    elif(i != 0):
       in_memory_table['rows'].append(line)

    i += 1



print("\n",in_memory_table)

代码输出:

C:\Users\svats\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/svats/PycharmProjects/BrandNew/module4_lab2/module4_lab2.py

 {'header': 'STATION\tSTATION_ID\tELEVATION\tLAT\tLONG\tDATE\tMNTH_MIN\tMNTH_MAX\n', 'rows': ['Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-01-01\t10\t41\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-02-01\t5\t35\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-03-01\t32\t47\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-04-01\t35\t49\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-05-01\t41\t60\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-06-01\t50\t72\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-07-01\t57\t70\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-08-01\t68\t79\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-09-01\t55\t71\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-10-01\t47\t77\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-11-01\t32\t66\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-12-01\t27\t55\n']}

需要帮助:

我非常接近找到解决方案 我有 2 个问题:

1. how to get rid of the \t in the o/p?
2. My o/p is little different from the desired o/p. how do i get it ?

如果您将代码重写为:

for line in in_file:
    print('repr(line) before :', repr(line) )
    temp_list = [line.split()]
    #line = line.split()
    print('temp_list :',temp_list)
    print('repr(line) after  :', repr(line) )
    print(' %s -----------------' % i)

    if ........

和de-comment行#line = line.split()
你会明白你获得糟糕结果的原因。

原因是 line.split() 没有改变名字的对象 line ,
它创建一个新对象(您想要的列表),如果您希望此名称引用获得的列表,则名称 line 必须是 re-assigned。

请注意,如果参数 sep 为 [=18=,方法 str.split([sep[ maxsplit]]) 具有不同的算法] 或不 None,请参阅文档 https://docs.python.org/2/library/stdtypes.html#str.split 了解这一点

.

也就是说,还有更好的方法。

with open('fahrenheit_monthly_readings.tsv','r') as f:
    in_memory_table = {'header':next(f).split()}
    in_memory_table['rows'] = [line.split() for line in f]

with open('fahrenheit_monthly_readings.tsv','r') as f:
    in_memory_table = {'header':next(f).split()}
    in_memory_table['rows'] = list(map(str.split, f))