Python 需要帮助:无法读取 tsv 文件的内容并根据需要填充到字典中
Python Help needed : Unable to read contents of a tsv file and populate in the dictionary as desired
我要解析的内容:
我有一个如下所示的 tsv 文件:
https://i.stack.imgur.com/yxsXD.png
最终目标是什么:
我的目标是在不使用 csv 解析器的情况下读取 tsv 文件并在字典和嵌套列表中填充 csv 文件的内容。
最后 in_memory_table
结构看起来
像这样(当然有两行以上):
{
"header": [
"STATION",
"STATION_ID",
"ELEVATION",
"LAT",
"LONG",
"DATE",
"MNTH_MIN",
"MNTH_MAX"
],
"rows": [
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-01-01",
"10",
"41"
],
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-02-01",
"5",
"35"
],
]
}
我的代码如下所示:
in_memory_table = {
'header': [],
'rows': [] }
with open('fahrenheit_monthly_readings.tsv') as f:
in_file = f.readlines()
i = 0
for line in in_file:
temp_list = [line.split('\t')]
if (i == 0):
in_memory_table['header']= line
elif(i != 0):
in_memory_table['rows'].append(line)
i += 1
print("\n",in_memory_table)
代码输出:
C:\Users\svats\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/svats/PycharmProjects/BrandNew/module4_lab2/module4_lab2.py
{'header': 'STATION\tSTATION_ID\tELEVATION\tLAT\tLONG\tDATE\tMNTH_MIN\tMNTH_MAX\n', 'rows': ['Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-01-01\t10\t41\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-02-01\t5\t35\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-03-01\t32\t47\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-04-01\t35\t49\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-05-01\t41\t60\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-06-01\t50\t72\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-07-01\t57\t70\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-08-01\t68\t79\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-09-01\t55\t71\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-10-01\t47\t77\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-11-01\t32\t66\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-12-01\t27\t55\n']}
需要帮助:
我非常接近找到解决方案
我有 2 个问题:
1. how to get rid of the \t in the o/p?
2. My o/p is little different from the desired o/p. how do i get it ?
如果您将代码重写为:
for line in in_file:
print('repr(line) before :', repr(line) )
temp_list = [line.split()]
#line = line.split()
print('temp_list :',temp_list)
print('repr(line) after :', repr(line) )
print(' %s -----------------' % i)
if ........
和de-comment行#line = line.split()
你会明白你获得糟糕结果的原因。
原因是 line.split()
没有改变名字的对象 line
,
它创建一个新对象(您想要的列表),如果您希望此名称引用获得的列表,则名称 line
必须是 re-assigned。
请注意,如果参数 sep
为 [=18=,方法 str.split([sep[ maxsplit]]) 具有不同的算法] 或不 None
,请参阅文档 https://docs.python.org/2/library/stdtypes.html#str.split 了解这一点
.
也就是说,还有更好的方法。
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = [line.split() for line in f]
或
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = list(map(str.split, f))
我要解析的内容:
我有一个如下所示的 tsv 文件: https://i.stack.imgur.com/yxsXD.png
最终目标是什么:
我的目标是在不使用 csv 解析器的情况下读取 tsv 文件并在字典和嵌套列表中填充 csv 文件的内容。
最后 in_memory_table
结构看起来
像这样(当然有两行以上):
{
"header": [
"STATION",
"STATION_ID",
"ELEVATION",
"LAT",
"LONG",
"DATE",
"MNTH_MIN",
"MNTH_MAX"
],
"rows": [
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-01-01",
"10",
"41"
],
[
"Tukwila",
"12345afbl",
"10",
"47.5463454",
"-122.34234234",
"2016-02-01",
"5",
"35"
],
]
}
我的代码如下所示:
in_memory_table = {
'header': [],
'rows': [] }
with open('fahrenheit_monthly_readings.tsv') as f:
in_file = f.readlines()
i = 0
for line in in_file:
temp_list = [line.split('\t')]
if (i == 0):
in_memory_table['header']= line
elif(i != 0):
in_memory_table['rows'].append(line)
i += 1
print("\n",in_memory_table)
代码输出:
C:\Users\svats\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/svats/PycharmProjects/BrandNew/module4_lab2/module4_lab2.py
{'header': 'STATION\tSTATION_ID\tELEVATION\tLAT\tLONG\tDATE\tMNTH_MIN\tMNTH_MAX\n', 'rows': ['Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-01-01\t10\t41\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-02-01\t5\t35\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-03-01\t32\t47\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-04-01\t35\t49\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-05-01\t41\t60\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-06-01\t50\t72\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-07-01\t57\t70\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-08-01\t68\t79\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-09-01\t55\t71\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-10-01\t47\t77\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-11-01\t32\t66\n', 'Tukwila\t12345afbl\t10\t47.5463454\t-122.34234234\t2016-12-01\t27\t55\n']}
需要帮助:
我非常接近找到解决方案 我有 2 个问题:
1. how to get rid of the \t in the o/p?
2. My o/p is little different from the desired o/p. how do i get it ?
如果您将代码重写为:
for line in in_file:
print('repr(line) before :', repr(line) )
temp_list = [line.split()]
#line = line.split()
print('temp_list :',temp_list)
print('repr(line) after :', repr(line) )
print(' %s -----------------' % i)
if ........
和de-comment行#line = line.split()
你会明白你获得糟糕结果的原因。
原因是 line.split()
没有改变名字的对象 line
,
它创建一个新对象(您想要的列表),如果您希望此名称引用获得的列表,则名称 line
必须是 re-assigned。
请注意,如果参数 sep
为 [=18=,方法 str.split([sep[ maxsplit]]) 具有不同的算法] 或不 None
,请参阅文档 https://docs.python.org/2/library/stdtypes.html#str.split 了解这一点
.
也就是说,还有更好的方法。
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = [line.split() for line in f]
或
with open('fahrenheit_monthly_readings.tsv','r') as f:
in_memory_table = {'header':next(f).split()}
in_memory_table['rows'] = list(map(str.split, f))