如何在 Android 中使用 Moshi 解析非常大的 json 数组?
How to parse very large json array with Moshi in Android?
我有一个非常大的 json
文件,其中包含字典中来自特定语言的单词。此文件包含超过 348 000 多个单词。每个对象都有不同的属性。
这是 json
数组的示例:
[
...
{"id":"57414","form":"t'est","formNoAccent":"test","formUtf8General":"test","reverse":"tset","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.98","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"1","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245287"},
{"id":"57415","form":"ț'est","formNoAccent":"țest","formUtf8General":"țest","reverse":"tseț","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.93","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"24","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245213"},
...
]
我想将这些条目添加到 Room
中并让它们保留在那里。我现在面临的问题是我没有做任何类似的事情,当我尝试使用 Moshi
.
将所有内容转换为对象列表时,我得到 out of memory
解决方案是分别加载每个项目,但我认为这不可能。
到目前为止,它看起来像这样:
val archive = context.assets.open("table_lexeme.zip")
val destination = File.createTempFile("table_lexeme", ".zip")
val jsonFile = File.createTempFile("lexeme", ".json")
archive.use {
destination.writeBytes(it.readBytes())
}
ZipFile(destination).use { zip ->
zip.entries().asSequence().forEach { zipEntry ->
if (zipEntry.name == "dex_table_lexeme.json") {
zip.getInputStream(zipEntry).use { inputStream ->
val bos = BufferedOutputStream(FileOutputStream(jsonFile))
val bytesIn = ByteArray(BUFFER_SIZE)
var read: Int
while (inputStream.read(bytesIn).also { read = it } != -1) {
bos.write(bytesIn, 0, read)
}
bos.close()
}
}
}
}
val jsonReader = JsonReader(InputStreamReader(jsonFile.inputStream(), Charsets.UTF_8))
jsonReader.beginArray()
字面上的解决方案是切换到流式 JSON 解析器,因此您的整个数据集不会立即加载到 RAM 中。 Android SDK中的JsonReader
就是这样工作的,Gson有streaming模式。我不记得 Moshi 提供这个,但我最近没找过。
现实的解决办法是不打包JSON。即使您使用事务批处理(例如,在一个批处理中插入 100 行),导入它们也会很慢。您将数据打包为资产,因此您最好(恕我直言)在您的开发机器上生成 SQLite 数据库并将其打包。 Room 内置支持从资产复制打包的数据库并将其放置到位以供使用。虽然您的数据库文件会很大,但复制它比使用导入的数据动态创建它要快。
我有一个非常大的 json
文件,其中包含字典中来自特定语言的单词。此文件包含超过 348 000 多个单词。每个对象都有不同的属性。
这是 json
数组的示例:
[
...
{"id":"57414","form":"t'est","formNoAccent":"test","formUtf8General":"test","reverse":"tset","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.98","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"1","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245287"},
{"id":"57415","form":"ț'est","formNoAccent":"țest","formUtf8General":"țest","reverse":"tseț","number":null,"description":"","noAccent":"0","consistentAccent":"1","frequency":"0.93","hyphenations":null,"pronunciations":null,"stopWord":"0","compound":"0","modelType":"N","modelNumber":"24","restriction":"","staleParadigm":"0","notes":"","hasApheresis":"0","hasApocope":"1","createDate":"1196798482","modDate":"1637245213"},
...
]
我想将这些条目添加到 Room
中并让它们保留在那里。我现在面临的问题是我没有做任何类似的事情,当我尝试使用 Moshi
.
out of memory
解决方案是分别加载每个项目,但我认为这不可能。
到目前为止,它看起来像这样:
val archive = context.assets.open("table_lexeme.zip")
val destination = File.createTempFile("table_lexeme", ".zip")
val jsonFile = File.createTempFile("lexeme", ".json")
archive.use {
destination.writeBytes(it.readBytes())
}
ZipFile(destination).use { zip ->
zip.entries().asSequence().forEach { zipEntry ->
if (zipEntry.name == "dex_table_lexeme.json") {
zip.getInputStream(zipEntry).use { inputStream ->
val bos = BufferedOutputStream(FileOutputStream(jsonFile))
val bytesIn = ByteArray(BUFFER_SIZE)
var read: Int
while (inputStream.read(bytesIn).also { read = it } != -1) {
bos.write(bytesIn, 0, read)
}
bos.close()
}
}
}
}
val jsonReader = JsonReader(InputStreamReader(jsonFile.inputStream(), Charsets.UTF_8))
jsonReader.beginArray()
字面上的解决方案是切换到流式 JSON 解析器,因此您的整个数据集不会立即加载到 RAM 中。 Android SDK中的JsonReader
就是这样工作的,Gson有streaming模式。我不记得 Moshi 提供这个,但我最近没找过。
现实的解决办法是不打包JSON。即使您使用事务批处理(例如,在一个批处理中插入 100 行),导入它们也会很慢。您将数据打包为资产,因此您最好(恕我直言)在您的开发机器上生成 SQLite 数据库并将其打包。 Room 内置支持从资产复制打包的数据库并将其放置到位以供使用。虽然您的数据库文件会很大,但复制它比使用导入的数据动态创建它要快。