如何仅 grep bash 中没有任何扩展名的目录
How to grep only the directories without any extensions in bash
假设我有一个名为 URL.txt 的 URL 列表,我只想输出目录而不是文件或扩展名,例如 .html、.php 等。如果它在 URL 中找到任何扩展名或任何文件,脚本应该继续到下一个 URL
- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html
我想要这样的结果:
- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/
我试过这个命令,但它不会转换成完整的 URL 端点。我想要一个没有任何扩展的完整 URL 端点。
cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev
Perl 来拯救!
perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt
我建议使用awk
:
awk 'BEGIN{FS=OFS="/"}{$NF=""}!seen[[=10=]]++' URLS.txt
解释:
# Set the input field separator (FS) and the
# output fields separator (OFS) to a forward slash /
BEGIN{
FS=OFS="/"
}
{
# NF is a speacial variable and contains the number of fields.
# Therefore $NF is the last field. Assign an empty to string to it
$NF=""
}
# The variable 'seen' is an associative array, initialized on demand
# upon first usage. We are using it as a lookup to prevent printing
# the same url path twice.
!seen[[=11=]]++
PS:你的初始命令几乎可以工作,只是 cut
命令是错误的:你正在使用 cut -f2
,它会打印第二个字段,但你想要 cut -f2-
,这将打印倒数第二个字段:
rev Urls.txt | cut -d'/' -f 2- | sort -u | rev
如果你想把它做成单行,
[gnm]awk 'BEGIN {OFS=FS="/"} (1<NF) && _==__[$(_^--NF)]++'
让我帮忙破译这个:
- awk errors out when u try to assign zero into NF, so (
1 < NF
) is a safety check. making that shorter with $NF check has a pitfall - if the input data in last column resembles a numeric zero, that condition would inadvertently evaluate to False
_
is a variable never initialized, so it would be same as 0/False.
I write it this way cuz my shell scripts act up
occasionally with that "!" mark that bash is too eager to expand
__
is the seen array
--NF
automatically clear out the right-most column, aka basename
- since we've previously ensured NF >= 2, regardless of input,
$(_^--NF)
evaluates to $(0)
, since zero-to-any-non-zero power is always zero.
其他的和上面其他人详细解释的一样
假设我有一个名为 URL.txt 的 URL 列表,我只想输出目录而不是文件或扩展名,例如 .html、.php 等。如果它在 URL 中找到任何扩展名或任何文件,脚本应该继续到下一个 URL
- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html
我想要这样的结果:
- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/
我试过这个命令,但它不会转换成完整的 URL 端点。我想要一个没有任何扩展的完整 URL 端点。
cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev
Perl 来拯救!
perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt
我建议使用awk
:
awk 'BEGIN{FS=OFS="/"}{$NF=""}!seen[[=10=]]++' URLS.txt
解释:
# Set the input field separator (FS) and the
# output fields separator (OFS) to a forward slash /
BEGIN{
FS=OFS="/"
}
{
# NF is a speacial variable and contains the number of fields.
# Therefore $NF is the last field. Assign an empty to string to it
$NF=""
}
# The variable 'seen' is an associative array, initialized on demand
# upon first usage. We are using it as a lookup to prevent printing
# the same url path twice.
!seen[[=11=]]++
PS:你的初始命令几乎可以工作,只是 cut
命令是错误的:你正在使用 cut -f2
,它会打印第二个字段,但你想要 cut -f2-
,这将打印倒数第二个字段:
rev Urls.txt | cut -d'/' -f 2- | sort -u | rev
如果你想把它做成单行,
[gnm]awk 'BEGIN {OFS=FS="/"} (1<NF) && _==__[$(_^--NF)]++'
让我帮忙破译这个:
- awk errors out when u try to assign zero into NF, so (
1 < NF
) is a safety check. making that shorter with $NF check has a pitfall - if the input data in last column resembles a numeric zero, that condition would inadvertently evaluate to False
_
is a variable never initialized, so it would be same as 0/False. I write it this way cuz my shell scripts act up occasionally with that "!" mark that bash is too eager to expand
__
is the seen array
--NF
automatically clear out the right-most column, aka basename
- since we've previously ensured NF >= 2, regardless of input,
$(_^--NF)
evaluates to$(0)
, since zero-to-any-non-zero power is always zero.
其他的和上面其他人详细解释的一样