如何用awk重命名重复的行?
How to rename duplicate lines with awk?
我有一个包含 100 万行的文件,其中有些行是重复的。我想通过附加 "variant" + 数字来重命名重复的行。
文件格式如下:
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line
stack overflow is fun
I am a test line
stack overflow is fun
I have more sentences
I am a test line
She is beautiful
Speed for need
stack overflow is fun
Let's stop here
期望的结果:
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line variant 1
stack overflow is fun
I am a test line variant 2
stack overflow is fun variant 1
I have more sentences
I am a test line variant 3
She is beautiful variant 1
Speed for need variant 1
stack overflow is fun variant 2
Let's stop here
#!/usr/bin/python
d = {}
with open("xy.txt") as f:
for line in f:
line = line.strip()
if not line: continue
cnt = d.get(line, 0)
if not cnt:
print line
else:
print " ".join([line, "variant %d" % cnt])
d[line] = cnt + 1
好的,这不是 awk,但很容易阅读。 (好吧,我的想法...)
$ awk 'cnt[[=10=]]++{[=10=]=[=10=]" variant "cnt[[=10=]]-1} 1' file
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line variant 1
stack overflow is fun
I am a test line variant 2
stack overflow is fun variant 1
I have more sentences
I am a test line variant 3
She is beautiful variant 1
Speed for need
stack overflow is fun variant 2
Let's stop here
我有一个包含 100 万行的文件,其中有些行是重复的。我想通过附加 "variant" + 数字来重命名重复的行。 文件格式如下:
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line
stack overflow is fun
I am a test line
stack overflow is fun
I have more sentences
I am a test line
She is beautiful
Speed for need
stack overflow is fun
Let's stop here
期望的结果:
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line variant 1
stack overflow is fun
I am a test line variant 2
stack overflow is fun variant 1
I have more sentences
I am a test line variant 3
She is beautiful variant 1
Speed for need variant 1
stack overflow is fun variant 2
Let's stop here
#!/usr/bin/python
d = {}
with open("xy.txt") as f:
for line in f:
line = line.strip()
if not line: continue
cnt = d.get(line, 0)
if not cnt:
print line
else:
print " ".join([line, "variant %d" % cnt])
d[line] = cnt + 1
好的,这不是 awk,但很容易阅读。 (好吧,我的想法...)
$ awk 'cnt[[=10=]]++{[=10=]=[=10=]" variant "cnt[[=10=]]-1} 1' file
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line variant 1
stack overflow is fun
I am a test line variant 2
stack overflow is fun variant 1
I have more sentences
I am a test line variant 3
She is beautiful variant 1
Speed for need
stack overflow is fun variant 2
Let's stop here