Python 无法导入 tika
Python can't import tika
我在 python 文件中导入 tika 时遇到问题。我花了很多时间在谷歌上搜索,却找不到任何东西。这是 iPython 命令:import tika,以及后续的堆栈跟踪。
我突然想到,tika 所依赖的模块可能有问题,例如 requests 或 urllib3。但是,当我尝试使用 pip 安装它们时,它说要求已经满足。 PYTHONHOME director我也仔细检查过,我99%确定它是正确的。
$ ipython
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.
IPython 4.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
WARNING: Readline services not available or not loaded.
WARNING: Proper color support under MS Windows requires the pyreadline library.
You can find it at:
http://ipython.org/pyreadline.html
Defaulting color scheme to 'NoColor'
In [1]: import tika
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
26 try:
---> 27 from . import urllib3
28 except ImportError:
ImportError: cannot import name 'urllib3'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-9f3de0ba3e70> in <module>()
----> 1 import tika
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
18
19 try:
---> 20 __import__('pkg_resources').declare_namespace(__name__)
21 except ImportError:
22 from pkgutil import extend_path
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in declare_namespace(packageName)
2161 # Ensure all the parent's path items are reflected in the child,
2162 # if they apply
-> 2163 _handle_ns(packageName, path_item)
2164
2165 finally:
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in _handle_ns(packageName, path_item)
2096 path = module.__path__
2097 path.append(subpath)
-> 2098 loader.load_module(packageName)
2099 _rebuild_mod_path(path, packageName, module)
2100 return subpath
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
89 open = codecs.open
90
---> 91 import requests
92 import socket
93 import tempfile
C:\cygwin64\lib\python3.6\site-packages\requests\__init__.py in <module>()
50 # Attempt to enable urllib3's SNI support, if possible
51 try:
---> 52 from .packages.urllib3.contrib import pyopenssl
53 pyopenssl.inject_into_urllib3()
54 except ImportError:
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
27 from . import urllib3
28 except ImportError:
---> 29 import urllib3
30 sys.modules['%s.urllib3' % __name__] = urllib3
31
C:\cygwin64\lib\python3.6\site-packages\urllib3\__init__.py in <module>()
6 import warnings
7
----> 8 from .connectionpool import (
9 HTTPConnectionPool,
10 HTTPSConnectionPool,
C:\cygwin64\lib\python3.6\site-packages\urllib3\connectionpool.py in <module>()
9
10
---> 11 from .exceptions import (
12 ClosedPoolError,
13 ProtocolError,
C:\cygwin64\lib\python3.6\site-packages\urllib3\exceptions.py in <module>()
1 from __future__ import absolute_import
----> 2 from .packages.six.moves.http_client import (
3 IncompleteRead as httplib_IncompleteRead
4 )
5 # Base Exceptions
ValueError: source code string cannot contain null bytes
你确定你安装了那个模块吗?
如果没有,只需转到命令提示符并键入 pip install tika
如果其他人正在看这个,下面是我最终解决问题的方法。
我错误地认为 python-tika 模块是一个完全打包的,准备好 运行 版本的 tika。事实上,你需要从 Apache 下载 java tika 服务器,并且当你使用 python-tika 时它必须是 运行ning(你可以很容易地只 运行 服务器在本地主机上)。
然后 Python-tika 模块允许您从 python 代码向该服务器发出请求。我可能应该知道这一点,但出于某种原因我没有在文档中找到它。
我认为这是在 Windows 上安装 tika 的好方法:
首先,从 link:
安装 java SE
https://www.oracle.com/ca-en/java/technologies/javase-downloads.html
其次,安装特定版本的tika:
pip install tika==1.23
第三,从 apache 下载并运行 tika 服务器和 tika 应用程序文件:
https://archive.apache.org/dist/tika/tika-server-1.23.jar
https://archive.apache.org/dist/tika/tika-app-1.23.jar
应该没问题,你应该可以 运行 tika 在你的应用程序中。
我在 python 文件中导入 tika 时遇到问题。我花了很多时间在谷歌上搜索,却找不到任何东西。这是 iPython 命令:import tika,以及后续的堆栈跟踪。
我突然想到,tika 所依赖的模块可能有问题,例如 requests 或 urllib3。但是,当我尝试使用 pip 安装它们时,它说要求已经满足。 PYTHONHOME director我也仔细检查过,我99%确定它是正确的。
$ ipython
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.
IPython 4.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
WARNING: Readline services not available or not loaded.
WARNING: Proper color support under MS Windows requires the pyreadline library.
You can find it at:
http://ipython.org/pyreadline.html
Defaulting color scheme to 'NoColor'
In [1]: import tika
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
26 try:
---> 27 from . import urllib3
28 except ImportError:
ImportError: cannot import name 'urllib3'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-9f3de0ba3e70> in <module>()
----> 1 import tika
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
18
19 try:
---> 20 __import__('pkg_resources').declare_namespace(__name__)
21 except ImportError:
22 from pkgutil import extend_path
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in declare_namespace(packageName)
2161 # Ensure all the parent's path items are reflected in the child,
2162 # if they apply
-> 2163 _handle_ns(packageName, path_item)
2164
2165 finally:
C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in _handle_ns(packageName, path_item)
2096 path = module.__path__
2097 path.append(subpath)
-> 2098 loader.load_module(packageName)
2099 _rebuild_mod_path(path, packageName, module)
2100 return subpath
C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
89 open = codecs.open
90
---> 91 import requests
92 import socket
93 import tempfile
C:\cygwin64\lib\python3.6\site-packages\requests\__init__.py in <module>()
50 # Attempt to enable urllib3's SNI support, if possible
51 try:
---> 52 from .packages.urllib3.contrib import pyopenssl
53 pyopenssl.inject_into_urllib3()
54 except ImportError:
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
27 from . import urllib3
28 except ImportError:
---> 29 import urllib3
30 sys.modules['%s.urllib3' % __name__] = urllib3
31
C:\cygwin64\lib\python3.6\site-packages\urllib3\__init__.py in <module>()
6 import warnings
7
----> 8 from .connectionpool import (
9 HTTPConnectionPool,
10 HTTPSConnectionPool,
C:\cygwin64\lib\python3.6\site-packages\urllib3\connectionpool.py in <module>()
9
10
---> 11 from .exceptions import (
12 ClosedPoolError,
13 ProtocolError,
C:\cygwin64\lib\python3.6\site-packages\urllib3\exceptions.py in <module>()
1 from __future__ import absolute_import
----> 2 from .packages.six.moves.http_client import (
3 IncompleteRead as httplib_IncompleteRead
4 )
5 # Base Exceptions
ValueError: source code string cannot contain null bytes
你确定你安装了那个模块吗?
如果没有,只需转到命令提示符并键入 pip install tika
如果其他人正在看这个,下面是我最终解决问题的方法。
我错误地认为 python-tika 模块是一个完全打包的,准备好 运行 版本的 tika。事实上,你需要从 Apache 下载 java tika 服务器,并且当你使用 python-tika 时它必须是 运行ning(你可以很容易地只 运行 服务器在本地主机上)。
然后 Python-tika 模块允许您从 python 代码向该服务器发出请求。我可能应该知道这一点,但出于某种原因我没有在文档中找到它。
我认为这是在 Windows 上安装 tika 的好方法: 首先,从 link:
安装 java SEhttps://www.oracle.com/ca-en/java/technologies/javase-downloads.html
其次,安装特定版本的tika:
pip install tika==1.23
第三,从 apache 下载并运行 tika 服务器和 tika 应用程序文件:
https://archive.apache.org/dist/tika/tika-server-1.23.jar
https://archive.apache.org/dist/tika/tika-app-1.23.jar
应该没问题,你应该可以 运行 tika 在你的应用程序中。