如果数据可用,请不要保存到 mongo-python

if data is available don't save to mongo-python

我每 10 分钟 python 将数据从新闻站点拉取到 mongodb。有时会记录相同的数据。因为无法控制相同的数据。如果有相同的传入数据,请不要保存到 mongodb.

import feedparser
import datetime
import threading
import pymongo
from pymongo import MongoClient


client = pymongo.MongoClient("mongodb+srv://xxx:xxx@cluster0-ogqg8.mongodb.net/rss_feed?retryWrites=true&w=majority")
db = client["rss_feed"]
collection=db["rss_collection"]


def mynet():
    NewsFeedMynet = feedparser.parse("http://www.mynet.com/haber/rss/sondakika")
    entry = NewsFeedMynet.entries[1]

post_mynet={"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
collection.insert_one(post_mynet)

您可以通过两种方式解决此问题:

一个是更新插入,如果记录存在并且具有相同的标题,则更新 body。如果记录不存在,则插入。

post_mynet = {"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
# first parameter is "what to match", aka, query
# second parameter is the record
# third is the flag to upsert
collection.update_one({"baslik": entry.title}, post_mynet, upsert=True)

另一个是检查记录是否存在,存在则不更新:

post_mynet = {"baslik":entry.title,"kisa_bilgi":entry.summary,"link":entry.link,"zaman":entry.published,"saglayici":"Mynet"}
if not collection.find_one({"baslik": entry.title}):
    collection.insert_one(post_mynet)