如何删除 Python 中两个分隔符之间的文本
How to remove text between two delimiters in Python
我正在尝试删除短语“segmentation”之后 [] 括号之间的所有文本:请参阅下面的文件片段以了解上下文。
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [
[
621.63,
1085.67,
621.63,
1344.71,
841.66,
1344.71,
841.66,
1085.67
]
],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [
[
887.62,
1355.7,
887.62,
1615.54,
1114.64,
1615.54,
1114.64,
1355.7
]
],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [
[
1157.61,
1411.84,
1157.61,
1661.63,
1404.89,
1661.63,
1404.89,
1411.84
]
],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
},
........... and so on.....
我最终只是想在出现分词后删除方括号内的所有文字。换句话说,输出看起来像(对于第一个实例):
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
我试过使用下面的代码,但目前运气不太好。由于新行,我有什么地方出错了吗?
import re
f = open('samplfile.json')
text = f.read()
f.close()
clean = re.sub('"segmentation":(.*)\]', '', text)
print(clean)
f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()
我很欣赏我对干净行中 [s 的确切定位可能不太正确,但这段代码目前没有删除任何内容。
Python 有一个内置的 json
模块用于解析和修改 JSON。正则表达式可能很脆弱,而且比它的价值更让人头疼。
您可以执行以下操作:
import json
with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
data = json.load(input_file)
for i in range(len(data['annotations'])):
data['annotations'][i]['segmentation'] = []
json.dump(data, output_file, indent=4)
那么,output.json
包含:
{
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
}
]
}
您的方法大部分是正确的,但是 Python regrex 不接受 \n
作为 .
,要修复它,请在 [=19] 中添加 flags=re.DOTALL
作为参数=]().
顺便说一下,您可能需要在正则表达式中使用 \"
而不是 "
。
我正在尝试删除短语“segmentation”之后 [] 括号之间的所有文本:请参阅下面的文件片段以了解上下文。
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [
[
621.63,
1085.67,
621.63,
1344.71,
841.66,
1344.71,
841.66,
1085.67
]
],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [
[
887.62,
1355.7,
887.62,
1615.54,
1114.64,
1615.54,
1114.64,
1355.7
]
],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [
[
1157.61,
1411.84,
1157.61,
1661.63,
1404.89,
1661.63,
1404.89,
1411.84
]
],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
},
........... and so on.....
我最终只是想在出现分词后删除方括号内的所有文字。换句话说,输出看起来像(对于第一个实例):
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
我试过使用下面的代码,但目前运气不太好。由于新行,我有什么地方出错了吗?
import re
f = open('samplfile.json')
text = f.read()
f.close()
clean = re.sub('"segmentation":(.*)\]', '', text)
print(clean)
f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()
我很欣赏我对干净行中 [s 的确切定位可能不太正确,但这段代码目前没有删除任何内容。
Python 有一个内置的 json
模块用于解析和修改 JSON。正则表达式可能很脆弱,而且比它的价值更让人头疼。
您可以执行以下操作:
import json
with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
data = json.load(input_file)
for i in range(len(data['annotations'])):
data['annotations'][i]['segmentation'] = []
json.dump(data, output_file, indent=4)
那么,output.json
包含:
{
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
}
]
}
您的方法大部分是正确的,但是 Python regrex 不接受 \n
作为 .
,要修复它,请在 [=19] 中添加 flags=re.DOTALL
作为参数=]().
顺便说一下,您可能需要在正则表达式中使用 \"
而不是 "
。