python 中的 feedparser 输出意外截断
Output of feedparser in python unexpectedly truncated
我正在编写一段代码来解析来自 RSS 提要的信息。我正在存储解析后的信息以供以后研究。在手头的案例中,我想存储 [姓名、姓氏、内幕交易类型、价格……] 等信息。
我的问题
我尝试解析的字符串有超过 1800 个字符,但我的解析器输出的字符串只有大约 330 个字符,并以“...”结尾。
我的问题是 如何调整 Python 中 feedparser 解析的字符串的最大长度? 或 为什么我的代码被截断并且没有完整列出打印还是存储?
我试过的
import feedparser
InsiderFeed = feedparser.parse("https://www.finanztreff.de/rdf_news_category-insidertrades.rss")
summary = InsiderFeed.entries[0].summary # just to give one example here instead of looping through full list
print(summary)
输出
看起来像:
Notification and public disclosure of transactions by persons discharging managerial responsibilities and persons closely associated with them 23.06.2020 / 18:37 The issuer is solely responsible for the content of this announcement. *1. Details of the person discharging managerial responsibilities / person closely associated*...
但应该看起来像:(忽略刹车 \n 似乎默认情况下由 feedparser 清理)
Notification and public disclosure of transactions by persons discharging
managerial responsibilities and persons closely associated with them
23.06.2020 / 18:37
The issuer is solely responsible for the content of this announcement.
*1. Details of the person discharging managerial responsibilities / person
closely associated*
a) Name
+++
|Name and legal form:|Krüper + Krüper Hochallee 60 GbR|
+++
*2. Reason for the notification*
a) Position / status
+++
|Person closely associated with: |
+++
|Title: |Dr. |
+++
|First name: |Manfred |
+++
|Last name(s): |Krüper |
+++
|Position: |Member of the administrative or supervisory |
| |body |
+++
b) Initial notification
*3. Details of the issuer, emission allowance market participant, auction
platform, auctioneer or auction monitor*
a) Name
++
|ENCAVIS AG|
++
b) LEI
++
|391200ECRGNL09Y2KJ67|
++
*4. Details of the transaction(s)*
a) Description of the financial instrument, type of instrument,
identification code
+++
|Type:|Share |
+++
|ISIN:|DE0006095003|
+++
b) Nature of the transaction
++
|Erwerb von neuen Aktien durch die Ausübung von 10.363 |
|Bezugsrechten im Rahmen der Aktiendividende der Encavis AG. |
|10.363 : 60,25 = 172 neue Aktien. |
++
c) Price(s) and volume(s)
+++
|Price(s) |Volume(s) |
+++
|10.845 EUR|1865.34 EUR|
+++
d) Aggregated information
+++
|Price |Aggregated volume|
+++
|10.8450 EUR|1865.3400 EUR |
+++
e) Date of the transaction
++
|2020-06-19; UTC+2|
++
f) Place of the transaction
++
|Outside a trading venue|
++
23.06.2020 The DGAP Distribution Services include Regulatory Announcements,
Financial/Corporate News and Press Releases.
Archive at www.dgap.de
Language: English
Company: ENCAVIS AG
Große Elbstraße 59
22767 Hamburg
Germany
Internet: www.encavis.com
End of News DGAP News Service
60877 23.06.2020
(END) Dow Jones Newswires
June 23, 2020 12:38 ET ( 16:38 GMT)
在此处使用此示例 http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911。
我也试图在 feedparser documentation 中找到一个合适的标志/关键字来定义我解析的字符串的最大长度,但没有成功。
期待您的帮助,不胜感激!
知道了
事实证明 feedparser 没有问题。网站 RSS 提要的内容只是网站上显示内容的截断版本,因为下面提要的摘录清楚地显示了每个标题。
看来我必须解析 RSS 提要附带的链接以获得完整的内容,并解析它以获得我需要的信息。
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href='https://www.w3.org/2000/08/w3c-synd/style.css' type='text/css'?>
<rss version='2.0' xmlns:media="https://search.yahoo.com/mrss/">
<channel>
<title>finanztreff.de / INSIDERTRADES </title>
<description>News und Berichte aus der Finanzwelt von finanztreff.de</description>
<language>de-de</language>
<copyright>Copyright 2020 vwd netsolutions GmbH</copyright>
<lastBuildDate>2020-06-25T12:26:48+02:00</lastBuildDate>
<link>https://www.finanztreff.de</link>
<image>
<title>finanztreff.de-Logo</title>
<url>https://www.finanztreff.de/images/finanztreff.jpg</url>
<link>https://www.finanztreff.de</link>
</image>
<item>
<title>EANS-DD: Oberbank AG / Mitteilung über Eigengeschäfte von Führungskräften gemäß Artikel 19 MAR - ANHANG</title>
<link>http://www.finanztreff.de/news/eans-dd-oberbank-ag+mitteilung-ueber-eigengeschaefte-von-fuehrungskraeften-gemaess-artikel/20867797</link>
<description>Directors' Dealings-Mitteilung gemäß Artikel 19 MAR übermittelt durch euro adhoc mit dem Ziel einer europaweiten Verbreitung. Für den Inhalt ist der Emittent verantwortlich. Personenbezogene Daten: Mitteilungspflichtige Person: Name: Elfriede Höchtel (Natürliche Person) Grund der Mitteilungspflicht: Grund: Meldepflichtige...</description>
<enclosure url='https:' length='' type='image/' />
<media:keywords></media:keywords>
<media:thumbnail url='https:' width='' height='' />
<media:thumbnail url='https:' width='' height='' />
<pubDate>2020-06-25T11:59:05+02:00</pubDate>
<guid>20867797</guid>
编辑 1:解决方案
下面的代码从 rss 提要中被截断的网站获取完整的字符串。
import requests
from bs4 import BeautifulSoup
html_text = requests.get("http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911").text
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.find(id="newsSource56").text)
我正在编写一段代码来解析来自 RSS 提要的信息。我正在存储解析后的信息以供以后研究。在手头的案例中,我想存储 [姓名、姓氏、内幕交易类型、价格……] 等信息。
我的问题
我尝试解析的字符串有超过 1800 个字符,但我的解析器输出的字符串只有大约 330 个字符,并以“...”结尾。 我的问题是 如何调整 Python 中 feedparser 解析的字符串的最大长度? 或 为什么我的代码被截断并且没有完整列出打印还是存储?
我试过的
import feedparser
InsiderFeed = feedparser.parse("https://www.finanztreff.de/rdf_news_category-insidertrades.rss")
summary = InsiderFeed.entries[0].summary # just to give one example here instead of looping through full list
print(summary)
输出
看起来像:
Notification and public disclosure of transactions by persons discharging managerial responsibilities and persons closely associated with them 23.06.2020 / 18:37 The issuer is solely responsible for the content of this announcement. *1. Details of the person discharging managerial responsibilities / person closely associated*...
但应该看起来像:(忽略刹车 \n 似乎默认情况下由 feedparser 清理)
Notification and public disclosure of transactions by persons discharging
managerial responsibilities and persons closely associated with them
23.06.2020 / 18:37
The issuer is solely responsible for the content of this announcement.
*1. Details of the person discharging managerial responsibilities / person
closely associated*
a) Name
+++
|Name and legal form:|Krüper + Krüper Hochallee 60 GbR|
+++
*2. Reason for the notification*
a) Position / status
+++
|Person closely associated with: |
+++
|Title: |Dr. |
+++
|First name: |Manfred |
+++
|Last name(s): |Krüper |
+++
|Position: |Member of the administrative or supervisory |
| |body |
+++
b) Initial notification
*3. Details of the issuer, emission allowance market participant, auction
platform, auctioneer or auction monitor*
a) Name
++
|ENCAVIS AG|
++
b) LEI
++
|391200ECRGNL09Y2KJ67|
++
*4. Details of the transaction(s)*
a) Description of the financial instrument, type of instrument,
identification code
+++
|Type:|Share |
+++
|ISIN:|DE0006095003|
+++
b) Nature of the transaction
++
|Erwerb von neuen Aktien durch die Ausübung von 10.363 |
|Bezugsrechten im Rahmen der Aktiendividende der Encavis AG. |
|10.363 : 60,25 = 172 neue Aktien. |
++
c) Price(s) and volume(s)
+++
|Price(s) |Volume(s) |
+++
|10.845 EUR|1865.34 EUR|
+++
d) Aggregated information
+++
|Price |Aggregated volume|
+++
|10.8450 EUR|1865.3400 EUR |
+++
e) Date of the transaction
++
|2020-06-19; UTC+2|
++
f) Place of the transaction
++
|Outside a trading venue|
++
23.06.2020 The DGAP Distribution Services include Regulatory Announcements,
Financial/Corporate News and Press Releases.
Archive at www.dgap.de
Language: English
Company: ENCAVIS AG
Große Elbstraße 59
22767 Hamburg
Germany
Internet: www.encavis.com
End of News DGAP News Service
60877 23.06.2020
(END) Dow Jones Newswires
June 23, 2020 12:38 ET ( 16:38 GMT)
在此处使用此示例 http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911。
我也试图在 feedparser documentation 中找到一个合适的标志/关键字来定义我解析的字符串的最大长度,但没有成功。
期待您的帮助,不胜感激!
知道了
事实证明 feedparser 没有问题。网站 RSS 提要的内容只是网站上显示内容的截断版本,因为下面提要的摘录清楚地显示了每个标题。
看来我必须解析 RSS 提要附带的链接以获得完整的内容,并解析它以获得我需要的信息。
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href='https://www.w3.org/2000/08/w3c-synd/style.css' type='text/css'?>
<rss version='2.0' xmlns:media="https://search.yahoo.com/mrss/">
<channel>
<title>finanztreff.de / INSIDERTRADES </title>
<description>News und Berichte aus der Finanzwelt von finanztreff.de</description>
<language>de-de</language>
<copyright>Copyright 2020 vwd netsolutions GmbH</copyright>
<lastBuildDate>2020-06-25T12:26:48+02:00</lastBuildDate>
<link>https://www.finanztreff.de</link>
<image>
<title>finanztreff.de-Logo</title>
<url>https://www.finanztreff.de/images/finanztreff.jpg</url>
<link>https://www.finanztreff.de</link>
</image>
<item>
<title>EANS-DD: Oberbank AG / Mitteilung über Eigengeschäfte von Führungskräften gemäß Artikel 19 MAR - ANHANG</title>
<link>http://www.finanztreff.de/news/eans-dd-oberbank-ag+mitteilung-ueber-eigengeschaefte-von-fuehrungskraeften-gemaess-artikel/20867797</link>
<description>Directors' Dealings-Mitteilung gemäß Artikel 19 MAR übermittelt durch euro adhoc mit dem Ziel einer europaweiten Verbreitung. Für den Inhalt ist der Emittent verantwortlich. Personenbezogene Daten: Mitteilungspflichtige Person: Name: Elfriede Höchtel (Natürliche Person) Grund der Mitteilungspflicht: Grund: Meldepflichtige...</description>
<enclosure url='https:' length='' type='image/' />
<media:keywords></media:keywords>
<media:thumbnail url='https:' width='' height='' />
<media:thumbnail url='https:' width='' height='' />
<pubDate>2020-06-25T11:59:05+02:00</pubDate>
<guid>20867797</guid>
编辑 1:解决方案
下面的代码从 rss 提要中被截断的网站获取完整的字符串。
import requests
from bs4 import BeautifulSoup
html_text = requests.get("http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911").text
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.find(id="newsSource56").text)