无法通过 python happybase - HDP 3 在 Hbase 中上传大小超过 10MB 的 pdf 文件
unable to upload pdf files of size more than 10MB in Hbase via python happybase - HDP 3
我们正在使用 HDP 3。我们正在尝试将 PDF 文件插入 Hbase table 中特定列族的列之一。开发环境是python3.6,hbase connector是happybase 1.1.0。
我们无法在 hbase 中上传任何大于 10 MB 的 PDF 文件。
在hbase中我们设置了如下参数:
我们收到以下错误:
IOError(message=b'org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 1 action: org.apache.hadoop.hbase.DoNotRetryIOException: Cell
with size 80941994 exceeds limit of 10485760 bytes\n\tat
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkCellSizeLimit(RSRpcServices.java:937)\n\tat
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1010)\n\tat
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)\n\tat
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\n\tat
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2683)\n\tat
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService.callBlockingMethod(ClientProtos.java:42014)\n\tat
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)\n\tat
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)\n\tat
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)\n\tat
您必须检查 hbase source code 以了解发生了什么:
private void checkCellSizeLimit(final HRegion r, final Mutation m) throws IOException {
945 if (r.maxCellSize > 0) {
946 CellScanner cells = m.cellScanner();
947 while (cells.advance()) {
948 int size = PrivateCellUtil.estimatedSerializedSizeOf(cells.current());
949 if (size > r.maxCellSize) {
950 String msg = "Cell with size " + size + " exceeds limit of " + r.maxCellSize + " bytes";
951 if (LOG.isDebugEnabled()) {
952 LOG.debug(msg);
953 }
954 throw new DoNotRetryIOException(msg);
955 }
956 }
957 }
958 }
根据错误消息,您超出了 r.maxCellSize
。
上面注意:函数PrivateCellUtil.estimatedSerializedSizeOf
已弃用,将在以后的版本中删除。
这是它的描述:
Estimate based on keyvalue's serialization format in the RPC layer.
Note that there is an extra SIZEOF_INT added to the size here that
indicates the actual length of the cell for cases where cell's are
serialized in a contiguous format (For eg in RPCs).
您必须检查设置的值在哪里。
首先检查 HRegion.java
处的 "ordinary" 值
this.maxCellSize = conf.getLong(HBASE_MAX_CELL_SIZE_KEY, DEFAULT_MAX_CELL_SIZE);
所以可能有一个HBASE_MAX_CELL_SIZE_KEY
和DEFAULT_MAX_CELL_SIZE
限制somewhere:
public static final String HBASE_MAX_CELL_SIZE_KEY = "hbase.server.keyvalue.maxsize";
public static final int DEFAULT_MAX_CELL_SIZE = 10485760;
这里有您的 10485760 限制,它显示在您的错误消息中。如果您需要,您可以尝试将此限制提高到您的限制值。我建议在使用它之前对其进行适当的测试(限制可能有一些原因)。
编辑: 添加有关如何更改 base.server.keyvalue.maxsize
值的信息。检查 config.files
:
您可以在哪里阅读:
hbase.client.keyvalue.maxsize (Description)
Specifies the combined maximum allowed size of a KeyValue instance. This is to set an upper boundary for a single entry saved in
a storage file. Since they cannot be split it helps avoiding that a
region cannot be split any further because the data is too large. It
seems wise to set this to a fraction of the maximum region size.
Setting it to zero or less disables the check.
Default
10485760
hbase.server.keyvalue.maxsize (Description)
Maximum allowed size of an individual cell, inclusive of value and all key components. A value of 0 or less disables the check. The
default value is 10MB. This is a safety setting to protect the server
from OOM situations.
Default
10485760
我们正在使用 HDP 3。我们正在尝试将 PDF 文件插入 Hbase table 中特定列族的列之一。开发环境是python3.6,hbase connector是happybase 1.1.0。
我们无法在 hbase 中上传任何大于 10 MB 的 PDF 文件。
在hbase中我们设置了如下参数:
我们收到以下错误:
IOError(message=b'org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.DoNotRetryIOException: Cell with size 80941994 exceeds limit of 10485760 bytes\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.checkCellSizeLimit(RSRpcServices.java:937)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1010)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2683)\n\tat org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService.callBlockingMethod(ClientProtos.java:42014)\n\tat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)\n\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)\n\tat
您必须检查 hbase source code 以了解发生了什么:
private void checkCellSizeLimit(final HRegion r, final Mutation m) throws IOException {
945 if (r.maxCellSize > 0) {
946 CellScanner cells = m.cellScanner();
947 while (cells.advance()) {
948 int size = PrivateCellUtil.estimatedSerializedSizeOf(cells.current());
949 if (size > r.maxCellSize) {
950 String msg = "Cell with size " + size + " exceeds limit of " + r.maxCellSize + " bytes";
951 if (LOG.isDebugEnabled()) {
952 LOG.debug(msg);
953 }
954 throw new DoNotRetryIOException(msg);
955 }
956 }
957 }
958 }
根据错误消息,您超出了 r.maxCellSize
。
上面注意:函数PrivateCellUtil.estimatedSerializedSizeOf
已弃用,将在以后的版本中删除。
这是它的描述:
Estimate based on keyvalue's serialization format in the RPC layer. Note that there is an extra SIZEOF_INT added to the size here that indicates the actual length of the cell for cases where cell's are serialized in a contiguous format (For eg in RPCs).
您必须检查设置的值在哪里。 首先检查 HRegion.java
处的 "ordinary" 值this.maxCellSize = conf.getLong(HBASE_MAX_CELL_SIZE_KEY, DEFAULT_MAX_CELL_SIZE);
所以可能有一个HBASE_MAX_CELL_SIZE_KEY
和DEFAULT_MAX_CELL_SIZE
限制somewhere:
public static final String HBASE_MAX_CELL_SIZE_KEY = "hbase.server.keyvalue.maxsize";
public static final int DEFAULT_MAX_CELL_SIZE = 10485760;
这里有您的 10485760 限制,它显示在您的错误消息中。如果您需要,您可以尝试将此限制提高到您的限制值。我建议在使用它之前对其进行适当的测试(限制可能有一些原因)。
编辑: 添加有关如何更改 base.server.keyvalue.maxsize
值的信息。检查 config.files
:
您可以在哪里阅读:
hbase.client.keyvalue.maxsize (Description)
Specifies the combined maximum allowed size of a KeyValue instance. This is to set an upper boundary for a single entry saved in a storage file. Since they cannot be split it helps avoiding that a region cannot be split any further because the data is too large. It seems wise to set this to a fraction of the maximum region size. Setting it to zero or less disables the check. Default
10485760
hbase.server.keyvalue.maxsize (Description)
Maximum allowed size of an individual cell, inclusive of value and all key components. A value of 0 or less disables the check. The default value is 10MB. This is a safety setting to protect the server from OOM situations. Default
10485760