JNIUS 和 TIKA - 尝试 parseToString 时出错
JNIUS & TIKA - error trying to parseToString
尝试 运行 使用 jnius 的 tike-app 但遇到了问题(macOS Sierra,Java 1.8 JDK,Python 2.7 & Python 3.6)
在 parseToString
命令之前一切正常(tika.detect 的输出很好)。如果你 运行 这个命令似乎有一个弹出窗口显示
(也使用 java 程序进行了测试并且有效)。但是 运行 jnius 它停止工作并且没有输出也没有错误。
import os
os.environ['CLASSPATH'] = "tika-app-1.14.jar"
from jnius import autoclass
from jnius import JavaException
# Import the Java classes
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
File = autoclass('java.io.File')
# Raise an exception and continue if parsing fails
try:
file = File('./source/test.doc')
tika = Tika()
meta = Metadata()
detectText = tika.detect(file)
print(detectText) # Working the output is: application/msword
contentText = tika.parseToString(file) #here it stops no further steps are executed
print(contentText)
except (JavaException,UnicodeDecodeError) as e:
print("ERROR: %s" % (e))
我终于找到了解决办法。缺少 JVM 的选项
告诉 tiki.jar 使用无头模式。
#Config have to be before import minus
import jnius_config
jnius_config.add_options('-Djava.awt.headless=true')
import os
os.environ['CLASSPATH'] = "tika-app-1.14.jar"
from jnius import autoclass
## Import the Java classes we are going to need
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
FileInputStream = autoclass('java.io.FileInputStream')
tika = Tika()
meta = Metadata()
text = tika.parseToString(FileInputStream("./source/test.doc"), meta)
print(text)
尝试 运行 使用 jnius 的 tike-app 但遇到了问题(macOS Sierra,Java 1.8 JDK,Python 2.7 & Python 3.6)
在 parseToString
命令之前一切正常(tika.detect 的输出很好)。如果你 运行 这个命令似乎有一个弹出窗口显示
(也使用 java 程序进行了测试并且有效)。但是 运行 jnius 它停止工作并且没有输出也没有错误。
import os
os.environ['CLASSPATH'] = "tika-app-1.14.jar"
from jnius import autoclass
from jnius import JavaException
# Import the Java classes
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
File = autoclass('java.io.File')
# Raise an exception and continue if parsing fails
try:
file = File('./source/test.doc')
tika = Tika()
meta = Metadata()
detectText = tika.detect(file)
print(detectText) # Working the output is: application/msword
contentText = tika.parseToString(file) #here it stops no further steps are executed
print(contentText)
except (JavaException,UnicodeDecodeError) as e:
print("ERROR: %s" % (e))
我终于找到了解决办法。缺少 JVM 的选项 告诉 tiki.jar 使用无头模式。
#Config have to be before import minus
import jnius_config
jnius_config.add_options('-Djava.awt.headless=true')
import os
os.environ['CLASSPATH'] = "tika-app-1.14.jar"
from jnius import autoclass
## Import the Java classes we are going to need
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
FileInputStream = autoclass('java.io.FileInputStream')
tika = Tika()
meta = Metadata()
text = tika.parseToString(FileInputStream("./source/test.doc"), meta)
print(text)