MongoDB 使用正则表达式删除数据
MongoDB delete data using regex
我能够使用以下方法通过 pandas 删除数据:
import re
repl = {r'<[^>]+>': '',
r'\r\n': ' ',
r'Share to facebook|Share to twitter|Share to linkedin|Share on Facebook|Share on Twitter|Share on Messenger|Share on Whatsapp': ''}
articles['content'] = articles['content'].replace(repl, regex=True)
如何在 Atlas 中的实际数据库上执行相同的操作?
我的数据结构是:
_id:
title:
url:
description:
author:
publishedAt:
content:
source_id:
urlToImage:
summarization:
MongoDB 没有任何内置的运算符来执行正则表达式替换(目前)。
您可以在您选择的编程语言中使用正则表达式查找循环浏览文档,然后用这种方式替换。
from pymongo import MongoClient
import re
m_client = MongoClient("<MONGODB-URI-STRING")
db = m_client["<DB-NAME>"]
collection = db["<COLLECTION-NAME>"]
replace_dictionary = {
r'<[^>]+>': '',
r'\r\n': ' ',
r'Share to facebook|Share to twitter|Share to linkedin|Share on Facebook|Share on Twitter|Share on Messenger|Share on Whatsapp': ''
}
count = 0
for it in collection.find({
# Merge all refex finds to a single list
"$or": [{"content": re.compile(x, re.IGNORECASE)} for x in replace_dictionary.keys()]
}, {
# Project only the field to be replaced for faster execution of script
"content": 1
}):
# Iterate over regex and replacements and apply the same using `re.sub`
for k, v in replace_dictionary.items():
it["content"] = re.sub(
pattern=k,
repl=v,
string=it["content"],
)
# Update the regex replaced string
collection.update_one({
"_id": it["_id"]
}, {
"$set": {
"content": it['content']
}
})
# Count to keep track of completion
count += 1
print("\r", count, end='')
print("DONE!!!")
我能够使用以下方法通过 pandas 删除数据:
import re
repl = {r'<[^>]+>': '',
r'\r\n': ' ',
r'Share to facebook|Share to twitter|Share to linkedin|Share on Facebook|Share on Twitter|Share on Messenger|Share on Whatsapp': ''}
articles['content'] = articles['content'].replace(repl, regex=True)
如何在 Atlas 中的实际数据库上执行相同的操作?
我的数据结构是:
_id:
title:
url:
description:
author:
publishedAt:
content:
source_id:
urlToImage:
summarization:
MongoDB 没有任何内置的运算符来执行正则表达式替换(目前)。
您可以在您选择的编程语言中使用正则表达式查找循环浏览文档,然后用这种方式替换。
from pymongo import MongoClient
import re
m_client = MongoClient("<MONGODB-URI-STRING")
db = m_client["<DB-NAME>"]
collection = db["<COLLECTION-NAME>"]
replace_dictionary = {
r'<[^>]+>': '',
r'\r\n': ' ',
r'Share to facebook|Share to twitter|Share to linkedin|Share on Facebook|Share on Twitter|Share on Messenger|Share on Whatsapp': ''
}
count = 0
for it in collection.find({
# Merge all refex finds to a single list
"$or": [{"content": re.compile(x, re.IGNORECASE)} for x in replace_dictionary.keys()]
}, {
# Project only the field to be replaced for faster execution of script
"content": 1
}):
# Iterate over regex and replacements and apply the same using `re.sub`
for k, v in replace_dictionary.items():
it["content"] = re.sub(
pattern=k,
repl=v,
string=it["content"],
)
# Update the regex replaced string
collection.update_one({
"_id": it["_id"]
}, {
"$set": {
"content": it['content']
}
})
# Count to keep track of completion
count += 1
print("\r", count, end='')
print("DONE!!!")