如何将由 mongo ObjectId 列表构成的字符串转换为仅包含 ID 的 python 列表
How to convert an string built up of a list of mongo ObjectIds into a python list of just ids
我有一个数据框,其中有一列包含 ObjectId 列表的字符串表示形式。即:
"[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
我想将其从字符串文字转换为 python 仅包含 id 的列表,例如:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']
json.loads
& ast.literal_eval
都失败了,因为字符串包含 ObjectId
我分享这个正则表达式:https://regex101.com/r/m5rW2q/1
您可以点击代码生成器,例如:
import re
regex = r"ObjectId\('(\w+)'\)"
test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
输出:
Match 1 was found at 1-37: ObjectId('5d28938629fe749c7c12b6e3')
Group 1 found at 11-35: 5d28938629fe749c7c12b6e3
Match 2 was found at 39-75: ObjectId('5caf4522a30528e3458b4579')
Group 1 found at 49-73: 5caf4522a30528e3458b4579
举个例子:
import re
regex = r"ObjectId\('(\w+)'\)"
test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
matches = re.finditer(regex, test_str, re.MULTILINE)
[i.groups()[0] for i in matches]
输出:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']
有关正则表达式的所有信息,您可以在这里找到:https://docs.python.org/3/library/re.html
嗯,你可以使用替换
a = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
a.replace('ObjectId(', '').replace(")","")
#Output:
"['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']"
定位行;分裂于'; select 列表中的项目 1 和 3:
my_df.loc[my_df["my_column"].str.contains("ObjectId"),"my_column"].str.split("'")[0][1:4:2]
准确给出两个元素的列表:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']
我有一个数据框,其中有一列包含 ObjectId 列表的字符串表示形式。即:
"[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
我想将其从字符串文字转换为 python 仅包含 id 的列表,例如:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']
json.loads
& ast.literal_eval
都失败了,因为字符串包含 ObjectId
我分享这个正则表达式:https://regex101.com/r/m5rW2q/1
您可以点击代码生成器,例如:
import re
regex = r"ObjectId\('(\w+)'\)"
test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
输出:
Match 1 was found at 1-37: ObjectId('5d28938629fe749c7c12b6e3')
Group 1 found at 11-35: 5d28938629fe749c7c12b6e3
Match 2 was found at 39-75: ObjectId('5caf4522a30528e3458b4579')
Group 1 found at 49-73: 5caf4522a30528e3458b4579
举个例子:
import re
regex = r"ObjectId\('(\w+)'\)"
test_str = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
matches = re.finditer(regex, test_str, re.MULTILINE)
[i.groups()[0] for i in matches]
输出:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']
有关正则表达式的所有信息,您可以在这里找到:https://docs.python.org/3/library/re.html
嗯,你可以使用替换
a = "[ObjectId('5d28938629fe749c7c12b6e3'), ObjectId('5caf4522a30528e3458b4579')]"
a.replace('ObjectId(', '').replace(")","")
#Output:
"['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']"
定位行;分裂于'; select 列表中的项目 1 和 3:
my_df.loc[my_df["my_column"].str.contains("ObjectId"),"my_column"].str.split("'")[0][1:4:2]
准确给出两个元素的列表:
['5d28938629fe749c7c12b6e3', '5caf4522a30528e3458b4579']