通过 shell 脚本从文本文件中删除文本块
Deleting a text block out of a text file by shell script
我有一个很大的 LDIF 文件 (OpenLDAP),我想从每个数据集中删除一个属性。
示例:
dn: uid=axx,ou=People,dc=myfirma,dc=net
uid: axx
jpegPhoto:: /9j/4AAQSkZJRgABAQEAAAAAAAD/4QBaRXhpZgAATU0AKgAAAAgAAYKYAAIAAAA3
AAAAGgAAAABUaW1vQ29tICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
gICAgICAAAP/sABFEdWNreQABAAQAAABkAAD/4QRuaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/P
g0KPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iQWRvYmUgWE1Q
IENvcmUgNS4zLWMwMTEgNjYuMTQ1NjYxLCAyMDEyLzAyLzA2LTE0OjU2OjI3ICAgICAgICAiPg0
KCTxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3
ludGF4LW5zIyI+DQoJCTxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NP
pRRX7d7JHn7s/RD/gnRrMcnwyhXKqqxnGT719IrerjgjJ96KK/Hc+iljqiXc9Cn8KIL67xYytlf
umvzT/ao1dG+MVznjbEMnPP3jxmiivQ4VivrL9Cah5wb8T3e75tpyDzwa9n/Y7vIn+LenkbwuGG
OvbvRRX2mawSws/QKe6P0w0uPFjDt/uL79q0AcHPtRRX4vLc66a6ngn7fT4+E90N2GaFx+lfnjc
Q7oEUsrLtAI9TRRX3nCOtGXqYVirf2MUso28bcc+9foj+wJEYvhXb7ufkGPbmiit+Kb/VkvMVM+
g8YUVT18f8S+bt8h5/Ciivzpbmp+XH7VVqqfGvUNrN80a5HYda83u7aOe02q37wfhRRX7Flb/2a
n6I5am56h+xrcfZ/jHaxsNy7D26Gv0406INYQnH8A/lRRXyPFn8eL8jSDZ//9k=
postalCode: 12345
mail: xxxx@myfirma.com
sn: Hotzenplotz
c: DE
street: Waldstr 2-4
givenName: Maik
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5678
facsimileTelephoneNumber: +49 1234 5678 5676
roomNumber: 03.034
co: Deutschland
employeeActive: TRUE
cn: Maik Hotzenplotz
description: 9400
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Maik Hotzenplotz
structuralObjectClass: inetOrgPerson
entryUUID: c43b735c-2d85-1037-85c2-45672a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=axx,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3564
entryCSN: 20170914105425.815554Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
dn: uid=yyy,ou=People,dc=myfirma,dc=net
uid: yyy
jpegPhoto:: /9j/4AAQSkZJRgABAQEAAAAAAAD/4QBaRXhpZgAATU0AKgAAAAgAAYKYAAIAAAA3
AAAAGgAAAABUaW1vQ29tICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
gICAgICAAAP/sABFEdWNreQABAAQAAABkAAD/4QRuaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/P
g0KPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iQWRvYmUgWE1Q
IENvcmUgNS4zLWMwMTEgNjYuMTQ1NjYxLCAyMDEyLzAyLzA2LTE0OjU2OjI3ICAgICAgICAiPg0
KCTxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3
jV8JtK/wARKja+D2z0ooq+a71FOV2yK9lNurOqBi3GRw+B79P8c1jwSq7OzBvM5YgHaw9BRRW0G
eZitZJdDH1S2a/J87EXmEZ8sfdwOMiseG3Z7iK3kXdJI24cjhR0NFFdlPY8rEL925DvE+qzaZ4f
uhJ8y3jiD5h2zkmvnn4g+JW1LxpJZ28MO2P5iUGDjPrRRW8VdHwedVGotGlJq8M/iFbFdPhF3fT
IfOVjuZVAHSvbrLxNc2FlDCsLbY0AGIh6UUVMYo+c9tLlSP/Z
postalCode: 12345
mail: NDimpfelmoser@myfirma.com
sn: Dimpfelmoser
c: DE
street: Waldstr 2-4
givenName: Nadine
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5672
facsimileTelephoneNumber: +49 1234 5678 5673
co: Deutschland
employeeActive: TRUE
cn: Nadine Dimpfelmoser
description: 9800
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Nadine Dimpfelmoser
structuralObjectClass: inetOrgPerson
entryUUID: c4c664da-2d85-1437-85c3-4122a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=yyy,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3978
entryCSN: 20170914105425.922291Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
应删除 jpegPhoto 字段。
这学期我在正则表达式模拟器中试过
jpegPhoto::[^:]*(?=\n\w*:)
并且它在模拟器中工作。
但我无法让它与 sed 一起工作。
还是我应该使用其他工具?
感谢任何提示。
再见
彼得·舒特
sed
不支持前瞻。你可以用一个循环来做:
sed '/^jpegPhoto::/{:a;N;/\n[^:]*:/!ba;s/.*\n//;}' file
是的,还有另一个强大的工具叫做 awk:
awk '/jpegPhoto/{ f=1 }f && /^[^:]+: /{ f=0 }f{next}1' file
输出:
dn: uid=axx,ou=People,dc=myfirma,dc=net
uid: axx
postalCode: 12345
mail: xxxx@myfirma.com
sn: Hotzenplotz
c: DE
street: Waldstr 2-4
givenName: Maik
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5678
facsimileTelephoneNumber: +49 1234 5678 5676
roomNumber: 03.034
co: Deutschland
employeeActive: TRUE
cn: Maik Hotzenplotz
description: 9400
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Maik Hotzenplotz
structuralObjectClass: inetOrgPerson
entryUUID: c43b735c-2d85-1037-85c2-45672a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=axx,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3564
entryCSN: 20170914105425.815554Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
dn: uid=yyy,ou=People,dc=myfirma,dc=net
uid: yyy
postalCode: 12345
mail: NDimpfelmoser@myfirma.com
sn: Dimpfelmoser
c: DE
street: Waldstr 2-4
givenName: Nadine
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5672
facsimileTelephoneNumber: +49 1234 5678 5673
co: Deutschland
employeeActive: TRUE
cn: Nadine Dimpfelmoser
description: 9800
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Nadine Dimpfelmoser
structuralObjectClass: inetOrgPerson
entryUUID: c4c664da-2d85-1437-85c3-4122a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=yyy,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3978
entryCSN: 20170914105425.922291Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
如果您知道 postaCode
始终是下一个字段,则类似这样的方法有效:
sed '/^postalCode/p; /^jpegPhoto/,/^postalCode/d' infile
我放弃了 sed 和 awk。
但是 RomanPerekhrest 的回答很好。
对于归档我的慢速解决方案:
此解决方案有效,但对于大文件来说速度很慢:
#!/bin/sh
inJpeg=0
while IFS= read -r line
do
if [ "${line:0:9}" == "jpegPhoto" ]; then
inJpeg=1 ;
elif [ $inJpeg == 1 ]; then
if [ "${line:0:1}" != " " ]; then
inJpeg=0 ;
echo "${line}"
fi
else
echo "${line}"
fi
done
如果您有更好的解决方案,我将不胜感激。
感谢任何提示。
再见
彼得·舒特
我有一个很大的 LDIF 文件 (OpenLDAP),我想从每个数据集中删除一个属性。
示例:
dn: uid=axx,ou=People,dc=myfirma,dc=net
uid: axx
jpegPhoto:: /9j/4AAQSkZJRgABAQEAAAAAAAD/4QBaRXhpZgAATU0AKgAAAAgAAYKYAAIAAAA3
AAAAGgAAAABUaW1vQ29tICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
gICAgICAAAP/sABFEdWNreQABAAQAAABkAAD/4QRuaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/P
g0KPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iQWRvYmUgWE1Q
IENvcmUgNS4zLWMwMTEgNjYuMTQ1NjYxLCAyMDEyLzAyLzA2LTE0OjU2OjI3ICAgICAgICAiPg0
KCTxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3
ludGF4LW5zIyI+DQoJCTxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiIHhtbG5zOnhtcE1NP
pRRX7d7JHn7s/RD/gnRrMcnwyhXKqqxnGT719IrerjgjJ96KK/Hc+iljqiXc9Cn8KIL67xYytlf
umvzT/ao1dG+MVznjbEMnPP3jxmiivQ4VivrL9Cah5wb8T3e75tpyDzwa9n/Y7vIn+LenkbwuGG
OvbvRRX2mawSws/QKe6P0w0uPFjDt/uL79q0AcHPtRRX4vLc66a6ngn7fT4+E90N2GaFx+lfnjc
Q7oEUsrLtAI9TRRX3nCOtGXqYVirf2MUso28bcc+9foj+wJEYvhXb7ufkGPbmiit+Kb/VkvMVM+
g8YUVT18f8S+bt8h5/Ciivzpbmp+XH7VVqqfGvUNrN80a5HYda83u7aOe02q37wfhRRX7Flb/2a
n6I5am56h+xrcfZ/jHaxsNy7D26Gv0406INYQnH8A/lRRXyPFn8eL8jSDZ//9k=
postalCode: 12345
mail: xxxx@myfirma.com
sn: Hotzenplotz
c: DE
street: Waldstr 2-4
givenName: Maik
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5678
facsimileTelephoneNumber: +49 1234 5678 5676
roomNumber: 03.034
co: Deutschland
employeeActive: TRUE
cn: Maik Hotzenplotz
description: 9400
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Maik Hotzenplotz
structuralObjectClass: inetOrgPerson
entryUUID: c43b735c-2d85-1037-85c2-45672a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=axx,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3564
entryCSN: 20170914105425.815554Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
dn: uid=yyy,ou=People,dc=myfirma,dc=net
uid: yyy
jpegPhoto:: /9j/4AAQSkZJRgABAQEAAAAAAAD/4QBaRXhpZgAATU0AKgAAAAgAAYKYAAIAAAA3
AAAAGgAAAABUaW1vQ29tICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
gICAgICAAAP/sABFEdWNreQABAAQAAABkAAD/4QRuaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS
4wLwA8P3hwYWNrZXQgYmVnaW49Iu+7vyIgaWQ9Ilc1TTBNcENlaGlIenJlU3pOVGN6a2M5ZCI/P
g0KPHg6eG1wbWV0YSB4bWxuczp4PSJhZG9iZTpuczptZXRhLyIgeDp4bXB0az0iQWRvYmUgWE1Q
IENvcmUgNS4zLWMwMTEgNjYuMTQ1NjYxLCAyMDEyLzAyLzA2LTE0OjU2OjI3ICAgICAgICAiPg0
KCTxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3
jV8JtK/wARKja+D2z0ooq+a71FOV2yK9lNurOqBi3GRw+B79P8c1jwSq7OzBvM5YgHaw9BRRW0G
eZitZJdDH1S2a/J87EXmEZ8sfdwOMiseG3Z7iK3kXdJI24cjhR0NFFdlPY8rEL925DvE+qzaZ4f
uhJ8y3jiD5h2zkmvnn4g+JW1LxpJZ28MO2P5iUGDjPrRRW8VdHwedVGotGlJq8M/iFbFdPhF3fT
IfOVjuZVAHSvbrLxNc2FlDCsLbY0AGIh6UUVMYo+c9tLlSP/Z
postalCode: 12345
mail: NDimpfelmoser@myfirma.com
sn: Dimpfelmoser
c: DE
street: Waldstr 2-4
givenName: Nadine
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5672
facsimileTelephoneNumber: +49 1234 5678 5673
co: Deutschland
employeeActive: TRUE
cn: Nadine Dimpfelmoser
description: 9800
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Nadine Dimpfelmoser
structuralObjectClass: inetOrgPerson
entryUUID: c4c664da-2d85-1437-85c3-4122a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=yyy,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3978
entryCSN: 20170914105425.922291Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
应删除 jpegPhoto 字段。
这学期我在正则表达式模拟器中试过
jpegPhoto::[^:]*(?=\n\w*:)
并且它在模拟器中工作。
但我无法让它与 sed 一起工作。 还是我应该使用其他工具?
感谢任何提示。
再见 彼得·舒特
sed
不支持前瞻。你可以用一个循环来做:
sed '/^jpegPhoto::/{:a;N;/\n[^:]*:/!ba;s/.*\n//;}' file
是的,还有另一个强大的工具叫做 awk:
awk '/jpegPhoto/{ f=1 }f && /^[^:]+: /{ f=0 }f{next}1' file
输出:
dn: uid=axx,ou=People,dc=myfirma,dc=net
uid: axx
postalCode: 12345
mail: xxxx@myfirma.com
sn: Hotzenplotz
c: DE
street: Waldstr 2-4
givenName: Maik
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5678
facsimileTelephoneNumber: +49 1234 5678 5676
roomNumber: 03.034
co: Deutschland
employeeActive: TRUE
cn: Maik Hotzenplotz
description: 9400
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Maik Hotzenplotz
structuralObjectClass: inetOrgPerson
entryUUID: c43b735c-2d85-1037-85c2-45672a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=axx,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3564
entryCSN: 20170914105425.815554Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
dn: uid=yyy,ou=People,dc=myfirma,dc=net
uid: yyy
postalCode: 12345
mail: NDimpfelmoser@myfirma.com
sn: Dimpfelmoser
c: DE
street: Waldstr 2-4
givenName: Nadine
o: myfirma Soft- und Hardware GmbH
l: Entenhausen
telephoneNumber: +49 1234 5678 5672
facsimileTelephoneNumber: +49 1234 5678 5673
co: Deutschland
employeeActive: TRUE
cn: Nadine Dimpfelmoser
description: 9800
st: NRW
objectClass: top
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: extperson
displayName: Nadine Dimpfelmoser
structuralObjectClass: inetOrgPerson
entryUUID: c4c664da-2d85-1437-85c3-4122a55bec29
createTimestamp: 20170914104654Z
labeledURI: ldap:///ou=Organization,dc=myfirma,dc=net??sub?(&(objectClass=or
ganizationalRole)(roleOccupant=uid=yyy,ou=People,dc=myfirma,dc=net))
userPassword:: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
employeeID: 3978
entryCSN: 20170914105425.922291Z#000000#000#000000
modifiersName: cn=Manager,dc=myfirma,dc=net
modifyTimestamp: 20170914105425Z
如果您知道 postaCode
始终是下一个字段,则类似这样的方法有效:
sed '/^postalCode/p; /^jpegPhoto/,/^postalCode/d' infile
我放弃了 sed 和 awk。
但是 RomanPerekhrest 的回答很好。
对于归档我的慢速解决方案:
此解决方案有效,但对于大文件来说速度很慢:
#!/bin/sh
inJpeg=0
while IFS= read -r line
do
if [ "${line:0:9}" == "jpegPhoto" ]; then
inJpeg=1 ;
elif [ $inJpeg == 1 ]; then
if [ "${line:0:1}" != " " ]; then
inJpeg=0 ;
echo "${line}"
fi
else
echo "${line}"
fi
done
如果您有更好的解决方案,我将不胜感激。
感谢任何提示。
再见 彼得·舒特