赋值时推给KeyError
Shove giving KeyError when assigning value
我正在使用 shove 来避免将庞大的词典加载到内存中。
from shove import Shove
lemmaDict = Shove('file://storage')
with open(str(sys.argv[1])) as lemmaCPT:\
for line in lemmaCPT:
line = line.rstrip('\n')
lineAr = string.split(line, ' ||| ')
lineKey = lineAr[0] + ' ||| ' + lineAr[1]
lineValue = lineAr[2]
print lineValue
lemmaDict[lineKey] = lineValue
但是,我在阅读 lemmaCPT
的过程中遇到了以下 KeyError 和 Traceback。怎么回事?
Traceback (most recent call last):
File "./stemmer.py", line 19, in <module>
lemmaDict[lineKey] = lineValue
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/core.py", line 44, in __setitem__
self.sync()
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/core.py", line 74, in sync
self._store.update(self._buffer)
File "/opt/Python-2.7.6/lib/python2.7/_abcoll.py", line 542, in update
self[key] = other[key]
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/base.py", line 123, in __setitem__
raise KeyError(key)
KeyError: '! ! ! \xd1\x87\xd0\xb8\xd1\x82\xd0\xb0\xd0\xb5\xd1\x82\xd1\x81\xd1\x8f \xd1\x82\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x80\xd0\xb0\xd1\x82\xd0\xbd\xd1\x8b\xd0\xbc \xd0\xbf\xd0\xbe\xd0\xb2\xd1\x82\xd0\xbe\xd1\x80\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5\xd0\xbc \xd0\xbb\xd1\x8e\xd0\xb1\xd0\xbe\xd0\xb3\xd0\xbe ||| ! ! ! is pronounced by'
示例输入:
! ! ! читается троекратным повторением ||| ! ! ! is pronounced by repeating ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is pronounced by ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is pronounced ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is ||| 0.00819374 8.53148e-39 0.00989281 0.0128612
! ! ! читается троекратным повторением ||| ! ! ! ||| 0.000119622 8.53148e-39 0.0098932 0.590703
! ! ! читается троекратным повторением ||| , ! ! ! is pronounced by ||| 0.00819374 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| , ! ! ! is pronounced ||| 0.00819374 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| , ! ! ! is ||| 0.00819374 8.53148e-39 0.00989281 0.00154241
! ! ! читается троекратным повторением ||| , ! ! ! ||| 0.0074488 8.53148e-39 0.00989281 0.070842
! ! ! читается троекратным повторением любого ||| ! ! ! is pronounced by repeating ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением любого ||| ! ! ! is pronounced by ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
运行 code.py sampleinput
将产生上述的 KeyError 和 Traceback。
好吧,如果这是实际输入,那么问题出在长度 LemmaDict
和 input
...
aftnix@dev:~⟫ cat input | wc -l
11
我更改的代码....
from shove import Shove
import sys
import string
lemmaDict = Shove('file://storage')
i = 0
with open(str(sys.argv[1])) as lemmaCPT:
for line in lemmaCPT:
line = line.rstrip('\n')
lineAr = string.split(line, ' ||| ')
lineKey = lineAr[0] + ' ||| ' + lineAr[1]
lineValue = lineAr[2]
print lineValue
print len(lemmaDict)
#print len(lemmaCPT)
i+=1
print i
#lemmaDict[lineKey] = lineValue
给出以下输出...
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
1
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
2
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
3
0.00819374 8.53148e-39 0.00989281 0.0128612
9
4
0.000119622 8.53148e-39 0.0098932 0.590703
9
5
0.00819374 8.53148e-39 0.00989281 8.53148e-39
9
6
0.00819374 8.53148e-39 0.00989281 8.53148e-39
9
7
0.00819374 8.53148e-39 0.00989281 0.00154241
9
8
0.0074488 8.53148e-39 0.00989281 0.070842
9
9
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
10
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
所以你只是在超越 Dict
。
如果您从输入中删除两行,它将停止抛出异常。
我不知道 shove,但快速检查 shell 告诉我它总是 returns 行键控字典。必须有一种方法来培养它……也许有一种方法或类似的东西……你应该更仔细地挖掘它的文档
我只是觉得你使用 Shove
的方式不对。
编辑:这有点奇怪...在查看 Shove
代码后,发现它应该在达到缓冲区限制时同步它的内存内容...
def __setitem__(self, key, value):
self._cache[key] = self._buffer[key] = value
# when buffer reaches self._limit, write buffer to store
if len(self._buffer) >= self._sync:
self.sync()
编辑 2
好吧,我之前的观点完全错了...但我得到了一些有趣的指示。其中一个问题是 shove
引发了一个令人困惑的异常...
真正的异常发生是因为...
def __setitem__(self, key, value):
118 # (per Larry Meyn)
119 try:
120 with open(self._key_to_file(key), 'wb') as item:
121 item.write(self.dumps(value))
122 except (IOError, OSError):
123 raise KeyError(key)
所以异常实际上来自open
系统调用。这意味着它在写入文件时遇到了麻烦。我对字符串的长度有了新的怀疑...
storage
文件夹的外观...
aftnix@dev:~⟫ ls -l storage/
total 36
-rw-rw-r-- 1 aftnix aftnix 49 ডিসে 4 01:35 %21+%21+%21+%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%D1%81%D1%8F+%D1%82%D1%80%D0%BE%D0%B5%D0%BA%D1%80%D0%B0%D1%82%D0%BD%D1%8B%D0%BC+%D0%BF%D0%BE%D0%B2%D1%82%D0%BE%D1%80%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC+%7C%7C%7C+%21+%21+%21
-rw-rw-r-- 1 aftnix aftnix 52 ডিসে 4 01:35 %21+%21+%21+%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%D1%81%D1%8F+%D1%82%D1%80%D0%BE%D0%B5%D0%BA%D1%80%D0%B0%D1%82%D0%BD%D1%8B%D0%BC+%D0%BF%D0%BE%D0%B2%D1%82%D0%BE%D1%80%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC+%7C%7C%7C+%2C+%21+%21+%21+is+pronounced
所以 shove
正在使用密钥作为文件名。所以它可能会变得非常难看,因为你的字符串在最后两个条目中非常大,尤其是倒数第二个条目。因此,为了进行测试,我从输入的最后两行中删除了一些字符。并且代码 运行 没有任何异常。
Linux内核有文件名长度限制....
aftnix@dev:~⟫ cat /usr/include/linux/limits.h
#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H
#define NR_OPEN 1024
#define NGROUPS_MAX 65536 /* supplemental group IDs are available */
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
#define LINK_MAX 127 /* # links a file may have */
#define MAX_CANON 255 /* size of the canonical input queue */
#define MAX_INPUT 255 /* size of the type-ahead buffer */
#define NAME_MAX 255 /* # chars in a file name */
因此,要绕过它,您必须做些别的事情。您不能将香草解析的密钥放入 Shove
.
我正在使用 shove 来避免将庞大的词典加载到内存中。
from shove import Shove
lemmaDict = Shove('file://storage')
with open(str(sys.argv[1])) as lemmaCPT:\
for line in lemmaCPT:
line = line.rstrip('\n')
lineAr = string.split(line, ' ||| ')
lineKey = lineAr[0] + ' ||| ' + lineAr[1]
lineValue = lineAr[2]
print lineValue
lemmaDict[lineKey] = lineValue
但是,我在阅读 lemmaCPT
的过程中遇到了以下 KeyError 和 Traceback。怎么回事?
Traceback (most recent call last):
File "./stemmer.py", line 19, in <module>
lemmaDict[lineKey] = lineValue
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/core.py", line 44, in __setitem__
self.sync()
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/core.py", line 74, in sync
self._store.update(self._buffer)
File "/opt/Python-2.7.6/lib/python2.7/_abcoll.py", line 542, in update
self[key] = other[key]
File "/opt/Python-2.7.6/lib/python2.7/site-packages/shove/base.py", line 123, in __setitem__
raise KeyError(key)
KeyError: '! ! ! \xd1\x87\xd0\xb8\xd1\x82\xd0\xb0\xd0\xb5\xd1\x82\xd1\x81\xd1\x8f \xd1\x82\xd1\x80\xd0\xbe\xd0\xb5\xd0\xba\xd1\x80\xd0\xb0\xd1\x82\xd0\xbd\xd1\x8b\xd0\xbc \xd0\xbf\xd0\xbe\xd0\xb2\xd1\x82\xd0\xbe\xd1\x80\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5\xd0\xbc \xd0\xbb\xd1\x8e\xd0\xb1\xd0\xbe\xd0\xb3\xd0\xbe ||| ! ! ! is pronounced by'
示例输入:
! ! ! читается троекратным повторением ||| ! ! ! is pronounced by repeating ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is pronounced by ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is pronounced ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| ! ! ! is ||| 0.00819374 8.53148e-39 0.00989281 0.0128612
! ! ! читается троекратным повторением ||| ! ! ! ||| 0.000119622 8.53148e-39 0.0098932 0.590703
! ! ! читается троекратным повторением ||| , ! ! ! is pronounced by ||| 0.00819374 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| , ! ! ! is pronounced ||| 0.00819374 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением ||| , ! ! ! is ||| 0.00819374 8.53148e-39 0.00989281 0.00154241
! ! ! читается троекратным повторением ||| , ! ! ! ||| 0.0074488 8.53148e-39 0.00989281 0.070842
! ! ! читается троекратным повторением любого ||| ! ! ! is pronounced by repeating ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
! ! ! читается троекратным повторением любого ||| ! ! ! is pronounced by ||| 0.00744887 8.53148e-39 0.00989281 8.53148e-39
运行 code.py sampleinput
将产生上述的 KeyError 和 Traceback。
好吧,如果这是实际输入,那么问题出在长度 LemmaDict
和 input
...
aftnix@dev:~⟫ cat input | wc -l
11
我更改的代码....
from shove import Shove
import sys
import string
lemmaDict = Shove('file://storage')
i = 0
with open(str(sys.argv[1])) as lemmaCPT:
for line in lemmaCPT:
line = line.rstrip('\n')
lineAr = string.split(line, ' ||| ')
lineKey = lineAr[0] + ' ||| ' + lineAr[1]
lineValue = lineAr[2]
print lineValue
print len(lemmaDict)
#print len(lemmaCPT)
i+=1
print i
#lemmaDict[lineKey] = lineValue
给出以下输出...
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
1
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
2
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
3
0.00819374 8.53148e-39 0.00989281 0.0128612
9
4
0.000119622 8.53148e-39 0.0098932 0.590703
9
5
0.00819374 8.53148e-39 0.00989281 8.53148e-39
9
6
0.00819374 8.53148e-39 0.00989281 8.53148e-39
9
7
0.00819374 8.53148e-39 0.00989281 0.00154241
9
8
0.0074488 8.53148e-39 0.00989281 0.070842
9
9
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
10
0.00744887 8.53148e-39 0.00989281 8.53148e-39
9
所以你只是在超越 Dict
。
如果您从输入中删除两行,它将停止抛出异常。
我不知道 shove,但快速检查 shell 告诉我它总是 returns 行键控字典。必须有一种方法来培养它……也许有一种方法或类似的东西……你应该更仔细地挖掘它的文档
我只是觉得你使用 Shove
的方式不对。
编辑:这有点奇怪...在查看 Shove
代码后,发现它应该在达到缓冲区限制时同步它的内存内容...
def __setitem__(self, key, value):
self._cache[key] = self._buffer[key] = value
# when buffer reaches self._limit, write buffer to store
if len(self._buffer) >= self._sync:
self.sync()
编辑 2
好吧,我之前的观点完全错了...但我得到了一些有趣的指示。其中一个问题是 shove
引发了一个令人困惑的异常...
真正的异常发生是因为...
def __setitem__(self, key, value):
118 # (per Larry Meyn)
119 try:
120 with open(self._key_to_file(key), 'wb') as item:
121 item.write(self.dumps(value))
122 except (IOError, OSError):
123 raise KeyError(key)
所以异常实际上来自open
系统调用。这意味着它在写入文件时遇到了麻烦。我对字符串的长度有了新的怀疑...
storage
文件夹的外观...
aftnix@dev:~⟫ ls -l storage/
total 36
-rw-rw-r-- 1 aftnix aftnix 49 ডিসে 4 01:35 %21+%21+%21+%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%D1%81%D1%8F+%D1%82%D1%80%D0%BE%D0%B5%D0%BA%D1%80%D0%B0%D1%82%D0%BD%D1%8B%D0%BC+%D0%BF%D0%BE%D0%B2%D1%82%D0%BE%D1%80%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC+%7C%7C%7C+%21+%21+%21
-rw-rw-r-- 1 aftnix aftnix 52 ডিসে 4 01:35 %21+%21+%21+%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%D1%81%D1%8F+%D1%82%D1%80%D0%BE%D0%B5%D0%BA%D1%80%D0%B0%D1%82%D0%BD%D1%8B%D0%BC+%D0%BF%D0%BE%D0%B2%D1%82%D0%BE%D1%80%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC+%7C%7C%7C+%2C+%21+%21+%21+is+pronounced
所以 shove
正在使用密钥作为文件名。所以它可能会变得非常难看,因为你的字符串在最后两个条目中非常大,尤其是倒数第二个条目。因此,为了进行测试,我从输入的最后两行中删除了一些字符。并且代码 运行 没有任何异常。
Linux内核有文件名长度限制....
aftnix@dev:~⟫ cat /usr/include/linux/limits.h
#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H
#define NR_OPEN 1024
#define NGROUPS_MAX 65536 /* supplemental group IDs are available */
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
#define LINK_MAX 127 /* # links a file may have */
#define MAX_CANON 255 /* size of the canonical input queue */
#define MAX_INPUT 255 /* size of the type-ahead buffer */
#define NAME_MAX 255 /* # chars in a file name */
因此,要绕过它,您必须做些别的事情。您不能将香草解析的密钥放入 Shove
.