Java 内存不足错误时,我应该增加哪种类型的 Spark 内存?
Which type of Spark memory should I increase on Java out of memory error?
所以,我有一个如下所示的模式。
def someFunction(...) : ... =
{
// Somewhere here some large string (still < 1 GB) is made ...
// ... and sometimes I get Java.lang.OutOfMemoryError while building that string
}
....
val RDDb = RDDa.map(x => someFunction(...))
因此,在 someFunction
中,在一个地方制作了一个大字符串,它仍然不是那么大(< 1 GB),但在构建该字符串时有时会出现 java.lang.OutOfMemoryError: Java heap space
错误。即使我的执行程序内存很大(8 GB),也会发生这种情况。
根据this article,有User内存和Spark内存。现在就我而言,我应该增加哪个部分,用户内存还是 Spark 内存?
P.S: 我使用的是 Spark 版本 2.0
1G原始字符串可以轻松使用超过8G内存。最好使用流式处理,例如 XMLEventReader for XML.
参考 Rober Sedgewick 和 Kevin Wayne 的书算法中的估计。每个字符串有 56 个字节的开销。
我写了一个简单的测试程序 运行 和 -Xmx8G
object TestStringBuilder {
val m = 1024 * 1024
def memUsage(): Unit = {
val runtime = Runtime.getRuntime
println(
s"""max: ${runtime.maxMemory() / m} M
|allocated: ${runtime.totalMemory() / m} M
|free: ${runtime.freeMemory() / m} M""".stripMargin)
}
def main(args: Array[String]): Unit = {
val builder = new StringBuilder()
val size = 10 * m
try {
while (true) {
builder.append(Math.random())
if (builder.length % size == 0) {
println(s"len is ${builder.length / m} M")
memUsage()
}
}
}
catch {
case ex: OutOfMemoryError =>
println(s"OutOfMemoryError len is ${builder.length/m} M")
memUsage()
case ex =>
println(ex)
}
}
}
输出可能是这样的。
len is 140 M
max: 7282 M allocated: 673 M free: 77 M
len is 370 M
max: 7282 M allocated: 2402 M free: 72 M
len is 470 M
max: 7282 M allocated: 1479 M free: 321 M
len is 720 M
max: 7282 M allocated: 3784 M free: 314 M
len is 750 M
max: 7282 M allocated: 3784 M free: 314 M
len is 1020 M
max: 7282 M allocated: 3784 M free: 307 M
OutOfMemoryError len is 1151 M
max: 7282 M allocated: 3784 M free: 303 M
所以,我有一个如下所示的模式。
def someFunction(...) : ... =
{
// Somewhere here some large string (still < 1 GB) is made ...
// ... and sometimes I get Java.lang.OutOfMemoryError while building that string
}
....
val RDDb = RDDa.map(x => someFunction(...))
因此,在 someFunction
中,在一个地方制作了一个大字符串,它仍然不是那么大(< 1 GB),但在构建该字符串时有时会出现 java.lang.OutOfMemoryError: Java heap space
错误。即使我的执行程序内存很大(8 GB),也会发生这种情况。
根据this article,有User内存和Spark内存。现在就我而言,我应该增加哪个部分,用户内存还是 Spark 内存?
P.S: 我使用的是 Spark 版本 2.0
1G原始字符串可以轻松使用超过8G内存。最好使用流式处理,例如 XMLEventReader for XML.
参考 Rober Sedgewick 和 Kevin Wayne 的书算法中的估计。每个字符串有 56 个字节的开销。
我写了一个简单的测试程序 运行 和 -Xmx8G
object TestStringBuilder {
val m = 1024 * 1024
def memUsage(): Unit = {
val runtime = Runtime.getRuntime
println(
s"""max: ${runtime.maxMemory() / m} M
|allocated: ${runtime.totalMemory() / m} M
|free: ${runtime.freeMemory() / m} M""".stripMargin)
}
def main(args: Array[String]): Unit = {
val builder = new StringBuilder()
val size = 10 * m
try {
while (true) {
builder.append(Math.random())
if (builder.length % size == 0) {
println(s"len is ${builder.length / m} M")
memUsage()
}
}
}
catch {
case ex: OutOfMemoryError =>
println(s"OutOfMemoryError len is ${builder.length/m} M")
memUsage()
case ex =>
println(ex)
}
}
}
输出可能是这样的。
len is 140 M
max: 7282 M allocated: 673 M free: 77 M
len is 370 M
max: 7282 M allocated: 2402 M free: 72 M
len is 470 M
max: 7282 M allocated: 1479 M free: 321 M
len is 720 M
max: 7282 M allocated: 3784 M free: 314 M
len is 750 M
max: 7282 M allocated: 3784 M free: 314 M
len is 1020 M
max: 7282 M allocated: 3784 M free: 307 M
OutOfMemoryError len is 1151 M
max: 7282 M allocated: 3784 M free: 303 M