如何仅从使用 python 的网站上抓取 <div class ='quotetext'>?

How do I scrape ONLY <div class ='quotetext'> from a website using python?

我正在尝试从该网站导入爱因斯坦语录:

https://www.goodreads.com/author/quotes/9810.Albert_Einstein

我只想要报价文本。甚至没有他的名字,或其他任何东西。只是文字,帮助建立一个马尔霍夫链聊天机器人。

这是我的代码:

from lxml import html
import requests

page = requests.get('https://www.goodreads.com/author/quotes/9810.Albert_Einstein')
tree = html.fromstring(page.content)

quotes = tree.xpath('//div[@class="quoteText"]/text()')


print quotes

这是输出:

[u"\n \u201cTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cThere are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cIf you can't explain it to a six year old, you don't understand it yourself.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cIf you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cLogic will get you from A to Z; imagination will get you everywhere.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cLife is like riding a bicycle. To keep your balance, you must keep moving.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cAnyone who has never made a mistake has never tried anything new.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI speak to everyone in the same way, whether he is the garbage man or the president of the university.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cWhen you are courting a nice girl an hour seems like a second. When you sit on a red-hot cinder a second seems like an hour. That's relativity.\u201d\n ", u' \u2015\n ', '\n', u'\n \u201cNever memorize something that you can look up.\u201d\n ', u' \u2015\n
', '\n', u'\n \u201cA clever person solves a problem. A wise person avoids it.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cScience without religion is lame, religion without science is blind.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cReality is merely an illusion, albeit a very persistent one.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cIf we knew what it was we were doing, it would not be called research, would it?\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI have no special talents. I am only passionately curious.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cIf a cluttered desk is a sign of a cluttered mind, of what, then, is an empty desk a sign?\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cThe important thing is to not stop questioning. Curiosity has its own reason for existence. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery each day.', u'\xe2\x80\x94"Old Man\'s Advice to Youth: \'Never Lose a Holy Curiosity.\'" ', u' (2 May 1955) p. 64\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cTry not to become a man of success. Rather become a man of value.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cAny fool can know. The point is to understand.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cA human being is a part of the whole called by us universe, a part limited in time and space. He experiences himself, his thoughts and feeling as something separated from the rest, a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from this prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cOnce you can accept the universe as matter expanding into nothing that is something, wearing stripes with plaid comes easy.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cIf I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of music.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cI know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.\u201d\n ', u' \u2015\n ', '\n', u'\n
\u201cYou never fail until you stop trying.\u201d\n ', u' \u2015\n
', '\n', u'\n \u201cGreat spirits have always encountered violent opposition from mediocre minds.\u201d\n ', u' \u2015\n ', '\n', u'\n \u201cThe most beautiful experience we can have is the mysterious. It is the fundamental emotion that stands at the cradle of true art and true science.\u201d\n ', u' \u2015\n ', ',\n ', '\n \n\n \n', '\n\n\n', '\n\n', u'\n \u201cGravitation is not responsible for people falling in love.\u201d\n ', u' \u2015\n ', '\n', u"\n \u201cIt is not that I'm so smart. But I stay with the questions much longer.\u201d\n ", u' \u2015\n ', '\n']

我觉得必须有更好的方法来完全做到这一点,因为这是以列表形式打印并且有所有这些额外的文本,但我到处都碰壁了。任何帮助将不胜感激!

谢谢

A python 2x 脚本使用模块 beautifulsoup

from __future__ import print_function
from re import sub
from BeautifulSoup import BeautifulSoup
from urllib2 import urlopen
urlpage=urlopen("https://www.goodreads.com/author/quotes/9810.Albert_Einstein").read()
bswebpage=BeautifulSoup(urlpage)
results=bswebpage.findAll("div",{'class':"quoteText"})
for result in results:
    print("\nQuotes\n")
    print(sub("&ldquo;|.&rdquo;","","".join(result.contents[0:1]).strip()))

结果在我这边

Quotes

Two things are infinite: the universe and human stupidity; and I'm not sure about the universe

Quotes

There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle

Quotes

I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world
..............................................
..............................................