Python 将 get_text 项与列表项进行比较

Python comparing get_text item against a list item

随着我的 python 项目的进行,但我偶然发现了另一个令人沮丧的阶段。

我没有从论坛中找到最后一个 post 日期的代码片段,将它保存在一个临时变量(想用它来检查每个日期)和一个 public/global 一个在整个范围内进一步使用。

但是,我尝试使用的方法是从论坛中获取所有最后的 post 日期,并将它们与 .csv 文件中已有的日期进行比较,以查看是否有任何新的 posts已制作,如果不是,请不要抓取/挖掘数据。

但这正是我正在努力解决的问题,无法将我挖掘的 (get_text) 元素与 .csv 列表中的项目进行比较。

任何想法将不胜感激,尝试了多种方法,将其保留在下面的最后一个仍然无效。

代码:

#Preparing csv file to be read through to check if dates match
storedDates = open(os.path.expanduser("PostDates.csv"))
csv_storedDates = csv.reader(storedDates)
dateRow = list(csv_storedDates) #Storing all the dates as a "List" object
listLength = len(dateRow) #Grabbing the csv List length
startingDate = 0 #Variable for looping through each date for each post.

lPostDate = lPostDate2 = ""

#Looping through 6 times (As that's how many pages each forum has, and collecting Next Page Link,Each Thread Title, It's Link
#.. last post date (To know how recent it is) and assigning next page link to current url, and continuing loop.
while number < 6:
    for postDate in soup.find_all(title=re.compile("^Replies:")):
        tempData = ""
        tempData += (postDate.get_text("\n", strip=True)[0:10] + "\n")
        lPostDate += (postDate.get_text("\n", strip=True)[0:10] + "\n")
        if any(tempData in s for s in dateRow[startingDate]):
            print("Matched a date" + tempData + "to one from database" + dateRow[startingDate])
            startingDate +=1
        else :
            startingDate += 1
            print("Date " + tempData + "was not matched to anything" + str(dateRow[startingDate]))

这只是代码的一部分,但这是我目前正在努力工作的唯一部分。假设 PostDates.csv 中已经有信息。此外,这是输出的样子:

Date 02-11-2017
was not matched to anything['02-11-2017']
Date 01-10-2017
was not matched to anything['01-10-2017']
Date 02-12-2017
was not matched to anything['02-12-2017']
Date 10-01-2016
was not matched to anything['10-01-2016']
Date 09-30-2016
was not matched to anything['09-30-2016']
Date 08-10-2016
was not matched to anything['08-10-2016']
Date 10-01-2015
was not matched to anything['10-01-2015']
Date 10-01-2015
was not matched to anything['10-01-2015']
Date 08-29-2015
was not matched to anything['08-29-2015']
Date 03-16-2015
was not matched to anything['03-16-2015']
Date 07-16-2014
was not matched to anything['07-16-2014']
Date 07-13-2014
was not matched to anything['07-13-2014']
Date 02-11-2014
was not matched to anything['02-11-2014']
Date 07-02-2013
was not matched to anything['07-02-2013']
Date 06-28-2013
was not matched to anything['06-28-2013']
Date 04-22-2013
was not matched to anything['04-22-2013']
Date 05-28-2012
was not matched to anything['05-28-2012']
Date 05-25-2012
was not matched to anything['05-25-2012']
Date 05-09-2012
was not matched to anything['05-09-2012']
Date 06-10-2010
was not matched to anything['06-10-2010']
Date 01-18-2010
was not matched to anything['01-18-2010']
Date 01-18-2010
was not matched to anything['01-18-2010']
Date 12-29-2009
was not matched to anything['12-29-2009']
Date 06-08-2009
was not matched to anything['06-08-2009']
Date 02-02-2009
was not matched to anything['02-02-2009']
Date 11-24-2008
was not matched to anything['11-24-2008']
Date 09-02-2008
was not matched to anything['09-02-2008']
Date 08-07-2008
was not matched to anything['08-07-2008']
Date 06-05-2008
was not matched to anything['06-05-2008']
Date 05-22-2008
was not matched to anything['05-22-2008']
Date 04-21-2008
was not matched to anything['04-21-2008']
Date 03-29-2008
was not matched to anything['03-29-2008']
1
Date 02-11-2017
was not matched to anything['02-11-2017']
Date 01-10-2017
was not matched to anything['01-10-2017']
Date 11-07-2007
was not matched to anything['11-07-2007']
Date 11-07-2007
was not matched to anything['11-07-2007']
Date 09-19-2007
was not matched to anything['09-19-2007']
Date 09-01-2007
was not matched to anything['09-01-2007']
Date 08-31-2007
was not matched to anything['08-31-2007']
Date 08-31-2007
was not matched to anything['08-31-2007']
Date 08-30-2007
was not matched to anything['08-30-2007']
Date 08-24-2007
was not matched to anything['08-24-2007']
Date 08-19-2007
was not matched to anything['08-19-2007']
Date 08-08-2007
was not matched to anything['08-08-2007']
Date 08-03-2007
was not matched to anything['08-03-2007']
Date 07-29-2007
was not matched to anything['07-29-2007']
Date 07-18-2007
was not matched to anything['07-18-2007']
Date 06-26-2007
was not matched to anything['06-26-2007']
Date 06-26-2007
was not matched to anything['06-26-2007']
Date 01-12-2007
was not matched to anything['01-12-2007']
Date 12-05-2006
was not matched to anything['12-05-2006']
Date 11-16-2006
was not matched to anything['11-16-2006']
Date 11-05-2006
was not matched to anything['11-05-2006']
Date 11-05-2006
was not matched to anything['11-05-2006']
Date 11-03-2006
was not matched to anything['11-03-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-19-2006
was not matched to anything['09-19-2006']
Date 09-12-2006
was not matched to anything['09-12-2006']
Date 08-17-2006
was not matched to anything['08-17-2006']
Date 08-07-2006
was not matched to anything['08-07-2006']
Date 08-02-2006
was not matched to anything['08-02-2006']
Date 07-16-2006
was not matched to anything['07-16-2006']
Date 07-07-2006
was not matched to anything['07-07-2006']

第2页之后我就不再粘贴otput了,因为它有6页那么长,所以数据量很大。

这是它之前被抓取并存储在 .csv 文件(dateRow 变量)中时的样子:

Date,
02-11-2017
01-10-2017
02-12-2017
10-01-2016
09-30-2016
08-10-2016
10-01-2015
10-01-2015
08-29-2015
03-16-2015
07-16-2014
07-13-2014
02-11-2014
07-02-2013
06-28-2013
04-22-2013
05-28-2012
05-25-2012
05-09-2012
06-10-2010
01-18-2010
01-18-2010
12-29-2009
06-08-2009
02-02-2009
11-24-2008
09-02-2008
08-07-2008
06-05-2008
05-22-2008
04-21-2008
03-29-2008
02-11-2017
01-10-2017
11-07-2007
11-07-2007
09-19-2007
09-01-2007
08-31-2007
08-31-2007

任何有关如何处理它以便找到匹配日期的建议都将不胜感激,谢谢!

在评论中总结一下我们的对话: 您键入 any(tempData in s for s in dateRow[startingDate]),我认为它必须是类型不匹配。好吧,事实证明是这样。那是因为any()定义如下:

any(iterable) Return True if any element of the iterable is true. If the iterable is empty, return False. Equivalent to:

def any(iterable):
    for element in iterable:
        if element:
            return True
    return False

你的代码分开后会给出如下内容:

>>> # Curly brackets make it syntactically correct
>>> iterable = (tempData in s for s in dateRow[startingDate]) 
>>> any(iterable)
False

但它真的可以迭代吗?让我们看看:

>>> type(iterable)
<class 'generator'>

不是!哈!但是这个:

>>> type([tempData in s for s in dateRow[startingDate]])
<class 'list'>

可迭代!

>>> hasattr([tempData in s for s in dateRow[startingDate]], '__iter__')
True

问题已解决,记得在生成器周围添加一些括号使其成为可迭代的!