TensorFlowException:TensorSliceReader 构造函数不成功:找不到 /mnt/yarn/usercache 的任何匹配文件
TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache
我正在尝试 运行“onto_electra_base_uncased”对存储在配置单元 table 中的某些数据建模,
在将数据保存到配置单元 table 之前,我在 df 上 运行 count() table 并得到了这个异常。
Spark Shell 启动配置:
spark-shell spark.ui.port="4052" --driver-memory 20g --executor-memory 45g --conf spark.driver.memoryOverhead=4g --conf spark.executor.memoryOverhead=4g spark.driver.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" spark.executor.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" --jars "spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar"
异常:
scala> nonNullPerson.count()
[Stage 9:> (0 + 8) / 200]21/08/23 10:07:23 WARN TaskSetManager: Lost task 4.0 in stage 9.0 (TID 309, ip-10-237-133-245.ec2.internal, executor 2): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440)
at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:212)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$$anonfun.apply(ObjectHashAggregateExec.scala:107)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$$anonfun.apply(ObjectHashAggregateExec.scala:105)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$$anonfun.apply(RDD.scala:823)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$$anonfun.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.getModelIfNotSet(BertEmbeddings.scala:172)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.annotate(BertEmbeddings.scala:223)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate.apply(AnnotatorModel.scala:35)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate.apply(AnnotatorModel.scala:34)
... 34 more
Caused by: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access0(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:326)
at org.tensorflow.Session$Runner.run(Session.java:276)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:325)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper.readObject(TensorflowWrapper.scala:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2295)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$$anonfun$apply.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
... 43 more
我运行在 EMR 上安装它,它的配置是:
- 大师(1) -> r5.2xlarge
- 核心 (2) -> r5.4xlarge
- 任务(1) -> r5.4xlarge
这个问题的解决方案是使用kryo序列化,默认的spark-shell或spark-submit调用是使用java序列化,spark-nlp中的Annotate class是实现使用 Kryo 序列化因此同样应该用于 运行 任何 spark-nlp 作业
我正在尝试 运行“onto_electra_base_uncased”对存储在配置单元 table 中的某些数据建模, 在将数据保存到配置单元 table 之前,我在 df 上 运行 count() table 并得到了这个异常。
Spark Shell 启动配置:
spark-shell spark.ui.port="4052" --driver-memory 20g --executor-memory 45g --conf spark.driver.memoryOverhead=4g --conf spark.executor.memoryOverhead=4g spark.driver.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" spark.executor.extraClassPath="spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar" --jars "spark-nlp-assembly-2.7.5.jar,bdl-voltage.jar,vibesimplejava.jar,voltage-hadoop-5.0.0.jar,vsconfig.jar"
异常:
scala> nonNullPerson.count()
[Stage 9:> (0 + 8) / 200]21/08/23 10:07:23 WARN TaskSetManager: Lost task 4.0 in stage 9.0 (TID 309, ip-10-237-133-245.ec2.internal, executor 2): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.serializefromobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440)
at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:212)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage5.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$$anon.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$$anonfun.apply(ObjectHashAggregateExec.scala:107)
at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$$anonfun.apply(ObjectHashAggregateExec.scala:105)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$$anonfun.apply(RDD.scala:823)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$$anonfun.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.getModelIfNotSet(BertEmbeddings.scala:172)
at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.annotate(BertEmbeddings.scala:223)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate.apply(AnnotatorModel.scala:35)
at com.johnsnowlabs.nlp.AnnotatorModel$$anonfun$dfAnnotate.apply(AnnotatorModel.scala:34)
... 34 more
Caused by: org.tensorflow.TensorFlowException: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /mnt/yarn/usercache/hadoop/appcache/application_1629712606577_0001/container_1629712606577_0001_01_000003/tmp/7e8b5bbda7f3_ner3970436258867963111/variables
[[{{node save/RestoreV2}}]]
at org.tensorflow.Session.run(Native Method)
at org.tensorflow.Session.access0(Session.java:48)
at org.tensorflow.Session$Runner.runHelper(Session.java:326)
at org.tensorflow.Session$Runner.run(Session.java:276)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:325)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper.readObject(TensorflowWrapper.scala:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2295)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun.apply(TorrentBroadcast.scala:308)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:309)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$$anonfun$apply.apply(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock.apply(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
... 43 more
我运行在 EMR 上安装它,它的配置是:
- 大师(1) -> r5.2xlarge
- 核心 (2) -> r5.4xlarge
- 任务(1) -> r5.4xlarge
这个问题的解决方案是使用kryo序列化,默认的spark-shell或spark-submit调用是使用java序列化,spark-nlp中的Annotate class是实现使用 Kryo 序列化因此同样应该用于 运行 任何 spark-nlp 作业