java.lang.OutOfMemoryError 与 Spark Graphframe bfs 有关
java.lang.OutOfMemoryError related to Spark Graphframe bfs
这样调用bfs 20+次后出现OutOfMemoryError:
list_locals = []
#g is the graphframe with > 3 million nodes and > 15 million edges.
def fn(row):
arg1 = "id = '%s'" %row.arg1
arg2 = "id = '%s'" %row.arg2
results = g.bfs(arg1, arg2, maxPathLength = 4)
list_locals.append(results.rdd.collect())
results = None
# t is a list of row objects
for i in range(101):
fn(t[i])
print i
从日志中,我可以看到 bfs 创建了很多广播变量并试图清除它们。不知道广播变量的清除是不是没有彻底完成?我在下面附上了最新的错误消息。谢谢!
16/07/11 09:44:28 INFO storage.BlockManagerInfo: Removed broadcast_922_piece0 on dsg-cluster-server-s06.xxx:40047
在内存中(大小:8.1 KB,可用空间:3.0 GB)
16/07/11 09:44:38 INFO storage.MemoryStore: Block broadcast_924 stored as values in memory (estimated size 24.4 KB, free 2.8 MB)
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMoryError: Java heap space
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMoryError: Java heap space
这是驱动进程异常,你应该增加你的驱动内存。
这样调用bfs 20+次后出现OutOfMemoryError:
list_locals = []
#g is the graphframe with > 3 million nodes and > 15 million edges.
def fn(row):
arg1 = "id = '%s'" %row.arg1
arg2 = "id = '%s'" %row.arg2
results = g.bfs(arg1, arg2, maxPathLength = 4)
list_locals.append(results.rdd.collect())
results = None
# t is a list of row objects
for i in range(101):
fn(t[i])
print i
从日志中,我可以看到 bfs 创建了很多广播变量并试图清除它们。不知道广播变量的清除是不是没有彻底完成?我在下面附上了最新的错误消息。谢谢!
16/07/11 09:44:28 INFO storage.BlockManagerInfo: Removed broadcast_922_piece0 on dsg-cluster-server-s06.xxx:40047
在内存中(大小:8.1 KB,可用空间:3.0 GB)
16/07/11 09:44:38 INFO storage.MemoryStore: Block broadcast_924 stored as values in memory (estimated size 24.4 KB, free 2.8 MB)
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMoryError: Java heap space
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMoryError: Java heap space
这是驱动进程异常,你应该增加你的驱动内存。