当它试图扫描一个非常大的 Hbase 列时,happybase 崩溃
happybase crash when it's trying to scan a very big Hbase column
我的代码如下:
for key,data in table.scan(columns=["raw:dataInfo"]):
count+=1
...
列raw:dataInfo可能有50MB那么大,当我运行上面的代码happybase崩溃并抛出以下异常:
Traceback (most recent call last):
File "happybasetestscan.py", line 8, in <module>
for key,data in table.scan(columns=["raw:sample"],limit=10):
File "/usr/lib/python2.6/site-packages/happybase/table.py", line 374, in scan
self.name, scan, {})
.......
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
有什么想法吗,怎么算大column.Thanks!
我猜 thrift 服务器没有正确回答。 happybase 报告(通过 thrift 库)无法从套接字读取数据。
无论如何,如果您想进行完整的 table 扫描以进行计数(效率低下但没问题),请在扫描时使用过滤器:
# Scan, get only keys (data will be empty)
scanner = table.scan(
row_start=b'aaa',
row_stop=b'bbb',
filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)
for row_key, data in scanner:
pass # do something with row_key
见
https://github.com/wbolster/happybase/issues/12#issuecomment-12754400了解更多信息
我的代码如下:
for key,data in table.scan(columns=["raw:dataInfo"]):
count+=1
...
列raw:dataInfo可能有50MB那么大,当我运行上面的代码happybase崩溃并抛出以下异常:
Traceback (most recent call last):
File "happybasetestscan.py", line 8, in <module>
for key,data in table.scan(columns=["raw:sample"],limit=10):
File "/usr/lib/python2.6/site-packages/happybase/table.py", line 374, in scan
self.name, scan, {})
.......
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
有什么想法吗,怎么算大column.Thanks!
我猜 thrift 服务器没有正确回答。 happybase 报告(通过 thrift 库)无法从套接字读取数据。
无论如何,如果您想进行完整的 table 扫描以进行计数(效率低下但没问题),请在扫描时使用过滤器:
# Scan, get only keys (data will be empty)
scanner = table.scan(
row_start=b'aaa',
row_stop=b'bbb',
filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)
for row_key, data in scanner:
pass # do something with row_key
见 https://github.com/wbolster/happybase/issues/12#issuecomment-12754400了解更多信息