有没有一种方法可以在不抓取的情况下从 PyPI 访问搜索结果?

Is there a way to access search results from PyPI without scraping?

我正在开发用于管理 Python 虚拟环境的 GUI。到目前为止,我能够实现我想提供给用户的大部分功能。但我坚持一件事:

在创建虚拟环境时,用户可以根据需要将软件包安装到其中。为此,我想让他们从命令行执行 pip search <package> 之类的搜索。结果将显示在 table 视图中。我遇到的问题是我不确定获取搜索结果的最佳方式是什么。

我尝试使用内置模块 subprocess 并执行 pip search 以将结果填充到 table。这是可能的,但是它很棘手,因为我必须首先格式化输出(包名称、版本、描述)以适应 table.

因为这需要大量的嵌套循环和字符串操作,所以我寻找了一种直接访问数据的方法,理想情况下无需抓取 Python Package Index


编辑:

我考虑过使用 PyPI 的 XML-RPC API,但有一个注意事项,它在未来将被弃用,不推荐使用,所以我不确定我是否应该在我的项目中使用它。

The XML-RPC API will be deprecated in the future. Use of this API is not recommended, and existing consumers of the API should migrate to the RSS and/or JSON APIs instead.

Users of this API are strongly encouraged to subscribe to the pypi-announce mailing list for notices as we begin the process of removing XML-RPC from PyPI.

是否有其他方法可以从 PyPI 获取搜索结果,或者 XML-RPC API 目前是唯一的方法吗?

您可以使用 PyPI XML-RPC APIsearch 方法,这就是 pip 用于 pip search 的方法。

没有对应的 JSON API 用于搜索,but there are plans to add one

XML-RPC 搜索端点已于 2020 年 12 月中旬暂时禁用,因为搜索端点的请求负载不断增加。截至目前,目前根本无法使用 API 在 pypi.org 上搜索包。

强调Antti Haapala 回复:

  • 截至 1 月。 2020,https://status.python.org 提供了一些关于持续的 pip 错误的有意义的信息。请参阅下面的引用。
  • 任何执行 pip search(使用 pip 20.3.3 测试)命令的人都可能会遇到以下错误消息: xmlrpc.client.Fault: <Fault -32500: "RuntimeError: PyPI's XMLRPC API has been temporarily disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.">
  • pip install your_package 仍然有效

另见


上述错误消息的一个重要部分是(巨大的粗体强调我的):

PyPI 的 XMLRPC API [...] 将在不久的将来被弃用


引用 https://status.python.org: (我不打算进一步更新此 post,只是提供一些上下文。)

Update - We are continuing to monitor for any further issues.

Dec 28, 13:51 UTC

Update - The XMLRPC Search endpoint remains disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again.

Dec 28, 13:50 UTC

Update - The XMLRPC Search endpoint is still disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again. We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.

Dec 23, 14:54 UTC

Update - The XMLRPC Search endpoint is still disabled due to ongoing request volume. As of this update, there has been no reduction in inbound traffic to the endpoint from abusive IPs and we are unable to re-enable the endpoint, as it would immediately cause PyPI service to degrade again. We are working with the abuse contact at the owner of the IPs and trying to make contact with the maintainers of whatever tool is flooding us via other channels.

Dec 15, 20:59 UTC

Monitoring - With the temporary disabling of XMLRPC we are hoping that the mass consumer that is causing us trouble will make contact. Due to the huge swath of IPs we were unable to make a more targeted block without risking more severe disruption, and were not able to receive a response from their abuse contact or direct outreach in an actionable time frame.

Dec 14, 17:46 UTC

Update - Due to the overwhelming surges of inbound XMLRPC search requests (and growing) we will be temporarily disabling the XMLRPC search endpoint until further notice.

Dec 14, 17:30 UTC

Identified - We've identified that the issue is with excess volume to our XLMRPC search endpoint that powers pip search among other tools. We are working to try to identify patterns and prohibit abusive clients to retain service health.

Dec 14, 15:09 UTC

Investigating - PyPI's search backends are experiencing an outage causing the backends to timeout and fail, leading to degradation of service for the web app. Uploads and installs are currently unaffected but logged in actions and search via the web app and API access via XMLRPC are currently experiencing partial outages.

Dec 14, 09:41 UTC