如何从每一行中删除连续的重复字符？

Question

我在文件中有以下几行

Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;;;;
Acanthocephala;;;;;;;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Polymorphus;;

并且我想从所有行中删除重复的分号字符，如下所示（注意-上面某些行的中间也有重复的分号）

Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;
Acanthocephala;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Polymorphus;

如果有人可以分享一个 bash 单线来完成这个，我将不胜感激。

Answer 1

如果您想编辑文件本身：

printf "%s\n" 'g/;;/s/;\{2,\}/;/g' w | ed -s foo.txt

如果您想将文件的修改副本传送到其他文件并保持原始文件不变：

sed 's/;\{2,\}/;/g' foo.txt | whatever

这些将 2 个或更多分号替换为单个分号。

Answer 2

perl -p -e 's/;+/;/g' myfile   # writes output to stdout

或

perl -p -i -e 's/;+/;/g' myfile   # does an in-place edit

Answer 3

可以通过替换轻松解决。我通过使用 FS/OFS 变量添加了一个 awk 解决方案：

awk -F';+' -v OFS=';' '=' file

或

awk -F';+' -v OFS=';' '(=)||1' file

Answer 4

您可以将 tr 与 "squeeze" 一起使用：

tr -s ';' < infile

Answer 5

这是的 sed 版本：

sed 's/;\+/;/g' myfile  # Write output to stdout

或

sed -i 's/;\+/;/g' myfile  # Edit the file in-place

如何从每一行中删除连续的重复字符？

How to remove consecutive repeating characters from every line?

bash

text-processing