Google Cloud Dataflow 作业神秘中断
Google Cloud Dataflow job breaking mysteriously
我反复尝试 运行 一组 google 云数据流作业,这些作业直到最近才正常运行,但现在往往会崩溃。这个错误是最令人困惑的,因为我不知道引用了什么代码,而且它似乎是 GCP 的内部代码?
我这里的工作编号是:2019-02-26_13_27_30-16974532604317793751
我运行在 n1-standard-96 实例上执行这些作业。
供参考,完整跟踪:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
op.start()
File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
def start(self):
File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.scoped_process_state:
File "dataflow_worker/shuffle_operations.py", line 66, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.shuffle_source.reader() as reader:
File "dataflow_worker/shuffle_operations.py", line 68, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
for key_values in reader:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 433, in __iter__
for entry in entries_iterator:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 272, in next
return next(self.iterator)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 230, in __iter__
chunk, next_position = self.reader.Read(start_position, end_position)
File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 133, in shuffle_client.PyShuffleReader.Read
IOError: Shuffle read failed: DATA_LOSS: Missing last fragment of a large value.
也许现在输入的数据比较大,DataFlow 处理不了?
我的工作存在洗牌问题。当我切换到可选 "shuffle service" 时它开始工作。您可能想尝试一下。只需将以下内容添加到您的作业命令中:
--experiments shuffle_mode=service
参考:参见 this page 的 "Using Cloud Dataflow Shuffle" 部分。
我反复尝试 运行 一组 google 云数据流作业,这些作业直到最近才正常运行,但现在往往会崩溃。这个错误是最令人困惑的,因为我不知道引用了什么代码,而且它似乎是 GCP 的内部代码?
我这里的工作编号是:2019-02-26_13_27_30-16974532604317793751
我运行在 n1-standard-96 实例上执行这些作业。
供参考,完整跟踪:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
op.start()
File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
def start(self):
File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.scoped_process_state:
File "dataflow_worker/shuffle_operations.py", line 66, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
with self.shuffle_source.reader() as reader:
File "dataflow_worker/shuffle_operations.py", line 68, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
for key_values in reader:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 433, in __iter__
for entry in entries_iterator:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 272, in next
return next(self.iterator)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 230, in __iter__
chunk, next_position = self.reader.Read(start_position, end_position)
File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 133, in shuffle_client.PyShuffleReader.Read
IOError: Shuffle read failed: DATA_LOSS: Missing last fragment of a large value.
也许现在输入的数据比较大,DataFlow 处理不了?
我的工作存在洗牌问题。当我切换到可选 "shuffle service" 时它开始工作。您可能想尝试一下。只需将以下内容添加到您的作业命令中:
--experiments shuffle_mode=service
参考:参见 this page 的 "Using Cloud Dataflow Shuffle" 部分。