bash:从头开始抓取特定字符数相同的行

bash: capture lines with the same specific number of characters from the beginning

我想捕获前 n 个字符开头相同的行,并且无论第 n 个字符之后是什么,都只输出其中一行。 如果该行少于第 n 个字符,则将其原样发送到输出。

我尝试使用 grep 捕获第一个特定数量的字符,但它删除了其余字符!

cat myfile.txt | grep -o -P '^{0,41}' 要么 cat myfile.txt | grep -o -P '.{0,0}http.{0,41}'

这里我有一个文件,我想捕获前 41 个字符相同的行,并且只显示其中一个:

https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/first/second/blahblah/?oriwo=asldkjalkdjf2kasd
https://example.com/first/second/blahblah/some/more/dir
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret
https://example.com/third/fourth/something/?cldl=5145652
https://example.com/third/fourth/something/?hfdg=156569&wuew=8428
https://example.com/first/second/blahblah/

期望输出

https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret

谢谢。

只是通常的 sort&uniq 对。

sort file | uniq -w40

您可能想 sort -s -k1.1,1.40 file 做一些事情来稳定排序。


nly output one of those lines no matter what comes after the first nth character. If the line has less than nth chars, then send it to output as it is.

除此之外,还有全能的awk。

awk -v N=41 '
   # Put lines longer then 41 in associative array, if not there already
   length([=11=]) >= N { i = substr([=11=],1,N); if (!(i in a)) a[i] = [=11=] }
   # output lines shorter then 41
   length([=11=]) < N {print}
   # output the array
   END{ for (i in a) print a[i] } ' file
awk '!seen[substr([=10=],1,41)]++' file