python 中的 feedparser 输出意外截断

Output of feedparser in python unexpectedly truncated

我正在编写一段代码来解析来自 RSS 提要的信息。我正在存储解析后的信息以供以后研究。在手头的案例中,我想存储 [姓名、姓氏、内幕交易类型、价格……] 等信息。

我的问题

我尝试解析的字符串有超过 1800 个字符,但我的解析器输出的字符串只有大约 330 个字符,并以“...”结尾。 我的问题是 如何调整 Python 中 feedparser 解析的字符串的最大长度? 为什么我的代码被截断并且没有完整列出打印还是存储?

我试过的

import feedparser
InsiderFeed = feedparser.parse("https://www.finanztreff.de/rdf_news_category-insidertrades.rss")
summary = InsiderFeed.entries[0].summary # just to give one example here instead of looping through full list
print(summary)

输出

看起来像:

Notification and public disclosure of transactions by persons discharging managerial responsibilities and persons closely associated with them 23.06.2020 / 18:37 The issuer is solely responsible for the content of this announcement. *1. Details of the person discharging managerial responsibilities / person closely associated*...

但应该看起来像:(忽略刹车 \n 似乎默认情况下由 feedparser 清理)

Notification and public disclosure of transactions by persons discharging
managerial responsibilities and persons closely associated with them

23.06.2020 / 18:37
The issuer is solely responsible for the content of this announcement.

*1. Details of the person discharging managerial responsibilities / person
closely associated*

a) Name

+++
|Name and legal form:|Krüper + Krüper Hochallee 60 GbR|
+++
*2. Reason for the notification*

a) Position / status

+++
|Person closely associated with: |
+++
|Title: |Dr. |
+++
|First name: |Manfred |
+++
|Last name(s): |Krüper |
+++
|Position: |Member of the administrative or supervisory |
| |body |
+++
b) Initial notification

*3. Details of the issuer, emission allowance market participant, auction
platform, auctioneer or auction monitor*

a) Name

++
|ENCAVIS AG|
++
b) LEI

++
|391200ECRGNL09Y2KJ67|
++
*4. Details of the transaction(s)*

a) Description of the financial instrument, type of instrument,
identification code

+++
|Type:|Share |
+++
|ISIN:|DE0006095003|
+++
b) Nature of the transaction

++
|Erwerb von neuen Aktien durch die Ausübung von 10.363 |
|Bezugsrechten im Rahmen der Aktiendividende der Encavis AG. |
|10.363 : 60,25 = 172 neue Aktien. |
++
c) Price(s) and volume(s)

+++
|Price(s) |Volume(s) |
+++
|10.845 EUR|1865.34 EUR|
+++
d) Aggregated information

+++
|Price |Aggregated volume|
+++
|10.8450 EUR|1865.3400 EUR |
+++
e) Date of the transaction

++
|2020-06-19; UTC+2|
++
f) Place of the transaction

++
|Outside a trading venue|
++

23.06.2020 The DGAP Distribution Services include Regulatory Announcements,
Financial/Corporate News and Press Releases.
Archive at www.dgap.de
Language: English
Company: ENCAVIS AG
Große Elbstraße 59
22767 Hamburg
Germany
Internet: www.encavis.com

End of News DGAP News Service

60877 23.06.2020



(END) Dow Jones Newswires

June 23, 2020 12:38 ET ( 16:38 GMT) 

在此处使用此示例 http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911

我也试图在 feedparser documentation 中找到一个合适的标志/关键字来定义我解析的字符串的最大长度,但没有成功。

期待您的帮助,不胜感激!

知道了

事实证明 feedparser 没有问题。网站 RSS 提要的内容只是网站上显示内容的截断版本,因为下面提要的摘录清楚地显示了每个标题。

看来我必须解析 RSS 提要附带的链接以获得完整的内容,并解析它以获得我需要的信息。

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href='https://www.w3.org/2000/08/w3c-synd/style.css' type='text/css'?>
<rss version='2.0' xmlns:media="https://search.yahoo.com/mrss/">
  <channel>
    <title>finanztreff.de / INSIDERTRADES </title>
    <description>News und Berichte aus der Finanzwelt von finanztreff.de</description>
    <language>de-de</language>
    <copyright>Copyright 2020 vwd netsolutions GmbH</copyright>
    <lastBuildDate>2020-06-25T12:26:48+02:00</lastBuildDate>
    <link>https://www.finanztreff.de</link>
    <image>
      <title>finanztreff.de-Logo</title>
      <url>https://www.finanztreff.de/images/finanztreff.jpg</url>
      <link>https://www.finanztreff.de</link>
    </image>
  <item>
    <title>EANS-DD: Oberbank AG / Mitteilung über Eigengeschäfte von Führungskräften gemäß Artikel 19 MAR - ANHANG</title>
    <link>http://www.finanztreff.de/news/eans-dd-oberbank-ag+mitteilung-ueber-eigengeschaefte-von-fuehrungskraeften-gemaess-artikel/20867797</link>
    <description>Directors&apos; Dealings-Mitteilung gemäß Artikel 19 MAR übermittelt durch euro adhoc mit dem Ziel einer europaweiten Verbreitung. Für den Inhalt ist der Emittent verantwortlich. Personenbezogene Daten: Mitteilungspflichtige Person: Name: Elfriede Höchtel (Natürliche Person) Grund der Mitteilungspflicht: Grund: Meldepflichtige...</description>
    <enclosure url='https:' length='' type='image/' />
    <media:keywords></media:keywords>
    <media:thumbnail url='https:' width='' height='' />
    <media:thumbnail url='https:' width='' height='' />
    <pubDate>2020-06-25T11:59:05+02:00</pubDate>
    <guid>20867797</guid>

编辑 1:解决方案

下面的代码从 rss 提要中被截断的网站获取完整的字符串。

import requests
from bs4 import BeautifulSoup
html_text = requests.get("http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911").text
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.find(id="newsSource56").text)