bash 中的子字符串提取

Question

iamnewbie：这段代码效率低下，但它应该提取子字符串，问题出在最后一个 echo 语句上，需要一些见解。

    function regex {


    #this function gives the regular expression needed



    echo -n \'

    for (( i = 1 ; i <=   ; i++ ))
         do
           echo -n . 
        done

    echo -n '\('

    for (( i = 1 ; i <=  ; i++ ))
         do
           echo -n . 
         done

     echo -n '\)'
     echo -n \'

     }
     # regex function ends


      echo "Enter the string:"

      read stg
      #variable stg holds the string entered

      if [ -z "$stg" ] ; then

           echo "Null string"

      exit

      else

           echo "Length of the $stg is:"

           z=`expr "$stg" : '.*' `

           #variable z holds the length of given string

           echo $z

      fi

      echo "Enter the number of trailing characters to be extracted from  $stg:"

      read n

      m=`expr $z - $n `
      #variable m holds an integer value which is equal to total length - length of characters to be extracted

      x=$(regex $m $n)

      echo ` expr "$stg" : "$x" `
      #the echo statement(above) is just printing a newline!! But not the result

我打算用这段代码做的是，如果我输入 "racecar" 并给出 "3" ，它应该显示 "car" 这是最后三个字符。而不是显示 "car" 它只是打印一个换行符。请更正此代码，而不是提供更好的代码。

Answer 1

怎么样：

$ n=3
$ string="racecar"
$ [[ "$string" =~ (.{$n})$ ]]
$ echo ${BASH_REMATCH[1]}
car

这将查找行尾的最后 n 个字符。在脚本中：

#!/bin/bash

read -p "Enter a string: " string
read -p "Enter the number of characters you want from the end: " n
[[ "$string" =~ (.{$n})$ ]]
echo "These are the last $n characters: ${BASH_REMATCH[1]}"

您可能想要添加更多的错误处理，但这就够了。

Answer 2

我不确定您是否需要循环来完成这项任务。我写了一些示例来从用户那里获取两个参数并根据它来剪切单词。

#!/bin/bash
read -p "Enter some word? " -e stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
  echo "Null string"
  exit 1
fi

read -p "Enter some number to set word length? " -e cutNumber
# check that cutNumber is a number
if ! [ "$cutNumber" -eq "$cutNumber" ]; then
  echo "Not a number!"
  exit 1
fi
echo "Cut first n characters:"
echo ${stg:$cutNumber}
echo 
echo "Show first n characters:"
echo ${stg:0:$cutNumber}

echo "Alternative get last n characters:"
echo -n "$stg" | tail -c $cutNumber
echo

示例：

Enter some word? TheRaceCar
Enter some number to set word length? 7
Cut first n characters:
Car

Show first n characters:
TheRace
Alternative get last n characters:
RaceCar

Answer 3

虽然你没有要求更好的解决方案，但值得一提的是：

$ n=3
$ stg=racecar
$ echo "${stg: -n}"
car

注意${stg: -n}中:后面的space是必填。如果没有 space，参数扩展是默认值扩展而不是子字符串扩展。使用space，它是一个子串扩展； -n 被解释为算术表达式（这意味着 n 被解释为 $n）并且由于结果是负数，它指定从末尾到开始的字符数子串。有关详细信息，请参阅 Bash manual。

您的解决方案基于对以下等价物的评估：

expr "$stg" : '......\(...\)'

具有适当数量的点。了解上述 bash 语法的实际含义很重要。它调用命令 expr，向它传递三个参数：

arg 1:变量的内容stg

参数 2：:

参数 3：......$...$

请注意，没有可见的引号。那是因为引号是 bash 语法的一部分，而不是参数值的一部分。

如果 stg 的值有足够的字符，上述 expr 调用的结果将打印出 stg` 值的第 7、8、9 个字符。否则，它会打印一个空行，然后失败。

但这不是你在做的。您正在创建正则表达式：

'......\(...\)'

其中有单引号。由于单引号不是正则表达式中的特殊字符，因此它们会匹配自己；换句话说，该模式将匹配以单引号开头，后跟九个任意字符，再后跟另一个单引号的字符串。如果字符串匹配，它将打印第二个单引号之前的三个字符。

当然，由于您创建的正则表达式对目标字符串中的每个字符都有 .，因此即使目标以单引号开头，它也不会匹配目标，因为正则表达式中的点太多，无法匹配。

如果您不将单引号放入正则表达式中，那么您的程序将运行，但我不得不说，我很少见过如此迂回曲折的 substring 函数实现。如果您不想赢得混淆的 bash 竞赛（这是一项艰巨的挑战，因为大多数生产 bash 代码本质上都是混淆的），我建议您使用正常的 bash 功能而不是尝试用正则表达式做所有事情。

其中之一是确定字符串长度的语法：

$ stg=racecar
$ echo ${#stg}
7

（尽管如开头所示，您实际上甚至不需要它。）

bash 中的子字符串提取

substring extraction in bash

bash

substring