将稀疏矩阵从 Python 传输到 R

Question

我正在 Python 做一些文本分析工作。不幸的是，我需要切换到 R 才能使用特定的包（不幸的是，无法在 Python 中轻松复制该包）。

目前文本被解析为双字母组计数，减少到大约 11,000 个双字母组的词汇量，然后存储为字典：

{id1: {'bigrams':[(bigram1, count), (bigram2, count), ...]},
id2: {'bigrams': ...}

我需要将它放入 R 中的 dgCMatrix 中，其中行是 id1、id2...，列是不同的双字母组，这样一个单元格代表该 id-bigram 的 'count' .

有什么建议吗？我考虑过将它扩展到一个巨大的 CSV，但这似乎非常低效，而且由于内存限制可能不可行。

Answer 1

你能用 scipy mmwrite and then read it into R using readMM from the Matrix 包写出 MatrixMarket 格式的矩阵吗？

Transporting Sparse Matrix from Python to R