解析 bash 中的降价文件以获取所有缩进行及其在文件中的位置

Question

我正在尝试获取 bash 中降价文件中的所有缩进行。我需要它们在文件中的位置，以便以后能够在它们的原始位置再次提取或插入它们。

下面是我想要获取所有缩进行的降价文件示例。

# Example bloc code

This is a bloc code

    function display_results() {
        awk '{print [=11=]; system("sleep .5");}' 
        rm 
    }

This code displays results.

below an other example of bloc code

    echo "------------------------------------------"
    echo "              TEST RESULTS"
    echo "------------------------------------------"

Or just one line:

    System.out.println("foo");

blablablab

因为我想要 bloc 的位置，所以我逐行解析文件并使用正则表达式查看该行是否缩进。

Note: It is here mentionned that regex is not the right tool to get bloc code because it can happen that a bloc code be nested . I don´t have to handle this use case, and getting only normal bloc code as presented in the example above will be sufficient.

我的代码是：

# One of the regex I have tested
regex='^[[:blank:]]+'  #Not find any line

while read line; do
  # Try to find indented lines by using regex
  if [[ $line =~ $regex ]]; then
      echo "INDENTED: $line"
  else
      echo "TEXT: $line"
  fi
done < $testFile

其中 $testFile 是我解析的 markdown 文件。

目前我写的最好的正则表达式（基于这个 and this ）只匹配一些行而不是所有行。

以下面的正则表达式为例，我只得到了一些行，但不是全部：

regexblank="[^a-zA-Z#]+[[:blank:]]"
regexspace="[^a-zA-Z#]+[[:space:]]"
blank="[^a-zA-Z#]+[[:blank:]]"

使用上面的正则表达式，结果是：

TEXT: # Example bloc code
TEXT:
TEXT: This is a bloc code
TEXT:
INDENTED: function display_results() {
INDENTED: awk '{print main.sh; system("sleep .5");}'
TEXT: rm
TEXT: }
TEXT:
TEXT: This code displays results.
TEXT:
TEXT: below an other example of bloc code
TEXT:
TEXT: echo "------------------------------------------"
INDENTED: echo "              TEST RESULTS"
TEXT: echo "------------------------------------------"
TEXT:
TEXT: Or just one line:
TEXT:
TEXT: System.out.println("foo");
TEXT:
TEXT: blablablab

如您所见，我必须在三个正则表达式中指定该行不能以字母或 # 开头，否则某些行作为标题会被检测为缩进。

如下使用 awk 给我所有缩进行

awk '/^(\t|\s)+/' $mdFile

但是 awk 只适用于文件，我需要知道每个 bloc 的位置。

如何解析文件并获取所有缩进的行？正如我所解释的，我正在尝试使用正则表达式，但是任何获得缩进行及其在文件中各自位置的解决方案都会很棒。

你可以找到我写的代码和所有正则表达式here

Answer 1

查看每行 line 包含的内容：

$ cat infile
line
    indented
line
$ while read line; do echo "<$line>"; done < infile
<line>
<indented>
<line>

这是因为 read（强调我的）的这种行为：

One line is read from the standard input [...], split into words as described above in Word Splitting, and the first word is assigned to the first name, [...]

为防止出现这种情况，将 IFS 设置为空字符串（并添加 -r 以避免反斜杠解释）：

$ while IFS= read -r line; do echo "<$line>"; done < infile
<line>
<    indented>
<line>

解析 bash 中的降价文件以获取所有缩进行及其在文件中的位置

Parse a markdown file in bash to get all indented lines and their position in the file

bash

markdown

fileparsing