另一个正则表达式。从 markdown 获取图像,如果 markdown 里面有错误

Yet another regex. Getting image from markdown, bugged if markdown inside

我正在尝试从 wiki 获取图像信息,我有一个可用的正则表达式,但是当描述也有降价时我失败了。

Markdown 上的图片格式:

//[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
//[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
//[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]

{{主要文章|[[基督教无政府主义]]和[[无政府主义与宗教]]}}

这是尝试:https://regex101.com/r/pD6nF8/1

我正在尝试:

// \[\[Image:(.*?)\|(.*?)\|(.*?)\|(.*?)\|\[*(.*?)\|*(.*?)\]*
$re = "/\[\[Image:(.*?)\|(.*?)\|(.*?)\|(.*?)\|\[*(.*?)\|*(.*?)\]*/i"; 

它应该为这个测试找到 14 个,但到目前为止我得到 11 个,或者如果我得到 14 个,我也会得到一些噪音,如 ]] 或只是部分描述...

如何在最后一部分中包含像这样的 [[(.*?)]] 的可选案例?

您可以在之前定义嵌套部分,使用这种语法:

$pattern = '~
# definitions
(?(DEFINE)
     (?<nested> \[\[ [^][]*+ (?:\[\[ \g<nested> ]] [^][]*)*+ ]] )
     (?<part>   [^][|]*+ (?: \g<nested> [^][|]* )*+             )
)
# main pattern
\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) ]]
~ix';

demo

显然,您可以更精确。如果你已经知道第4部分是尺寸,你可以替换它:

\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\d+ px) \| (\g<part>) ]]

如果需要,您也可以自由地将某些部分设为可选(例如,可以省略对齐参数):

\[\[ Image: (\g<part>) \| (\g<part>) (?:\| (\g<part>) )? \| (\d+ px) \| (\g<part>) ]]

或者你可以说所有参数都是可选的并且只能出现一次,但在这种情况下你需要精确:

~
(?(DEFINE)
     (?<nested> \[\[ [^][]*+ (?: \[\[ \g<nested> ]] [^][]* )*+ ]] )
     (?<part>   [^][|]*+ (?: \g<nested> [^][|]* )*+               )
)

\[\[Image: (?<name> [^]|]* )
(?:
   \| 
   (?: (?<align>       left|right|center ) |
       (?<type>        thumb             ) |
       (?<size>        \d+[a-z]{0,3}     ) |
       (?<description> \g<part>          )
   )
)*
]]
~ix

demo

好的,如果我没看错,您只需要带样式的图片,不带描述。

所以我认为这可能适合你

\[\[Image:.*?[jpg|svg][^\s]+(?=\|)

然后只需将 ]] 添加到您的匹配项中即可。

描述

这是多行正则表达式,使用以下标志:忽略空格、全局和不区分大小写

[[]{2}Image:
([^|]*\.(?:jpe?g|svg))[|]
([^|]*)[|]
   ((?:[[]{2}[^\]]*\]\]|[^|[])*)[|]
(?:((?:[[]{2}[^\]]*\]\]|[^|[])*)[|])?
   ((?:[[]{2}[^\]]*\]\]|(?:(?!\]|\|).))*)
(?:[|]|\]\])

此正则表达式将执行以下操作:

  • 从示例文本中找到 [[image:....]] 个子字符串
  • 要求图片以下列 .jpg.jpeg.svg 之一结尾。您可以通过删除 \.(?:jpe?g|svg) 构造来删除此行为。
  • 解析各种 | 分隔字段
  • 避免最后几个字段中可能包含额外标记的困难边缘情况

例子

现场演示

https://regex101.com/r/kI2wE5/2

示例文本

我冒昧把14个匹配都拉了,但是live demo还有你的原文

[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]

样本匹配

[0][0] = [[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[0][1] = WilliamGodwin.jpg
[0][2] = thumb
[0][3] = right
[0][4] = 150px
[0][5] = William Godwin

[1][0] = [[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[1][1] = Pierre_Joseph_Proudhon.jpg
[1][2] = 110px
[1][3] = thumb
[1][4] = left
[1][5] = Pierre Joseph Proudhon

[2][0] = [[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[2][1] = BenjaminTucker.jpg
[2][2] = thumb
[2][3] = 150px
[2][4] = left
[2][5] = [[Benjamin Tucker]]

[3][0] = [[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[3][1] = Bakuninfull.jpg
[3][2] = thumb
[3][3] = 150px
[3][4] = right
[3][5] = [[Bakunin|Mikhail Bakunin 1814-1876]]

[4][0] = [[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[4][1] = PeterKropotkin.jpg
[4][2] = thumb
[4][3] = 150px
[4][4] = right
[4][5] = Peter Kropotkin

[5][0] = [[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[5][1] = JohannMost.jpg
[5][2] = left
[5][3] = 150px
[5][4] = thumb
[5][5] = [[Johann Most]] was an outspoken advocate of violence

[6][0] = [[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[6][1] = Flag of Anarcho syndicalism.svg
[6][2] = thumb
[6][3] = 175px
[6][4] = 
[6][5] = The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.

[7][0] = [[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[7][1] = CNT_tu_votar_y_ellos_deciden.jpg
[7][2] = thumb
[7][3] = 175px
[7][4] = 
[7][5] = CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.

[8][0] = [[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[8][1] = CNT-armoured-car-factory.jpg
[8][2] = right
[8][3] = thumb
[8][4] = 270px
[8][5] = [[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.

[9][0] = [[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[9][1] = LeoTolstoy.jpg
[9][2] = thumb
[9][3] = 150px
[9][4] = 
[9][5] = [[Leo Tolstoy|Leo Tolstoy]] 1828-1910

[10][0] = [[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[10][1] = Goldman-4.jpg
[10][2] = thumb
[10][3] = left
[10][4] = 150px
[10][5] = [[Emma Goldman]]

[11][0] = [[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[11][1] = Murray Rothbard Smile.JPG
[11][2] = thumb
[11][3] = left
[11][4] = 150px
[11][5] = [[Murray Rothbard]] (1926-1995)

[12][0] = [[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[12][1] = Hakim Bey.jpeg
[12][2] = thumb
[12][3] = right
[12][4] = 
[12][5] = [[Hakim Bey]]

[13][0] = [[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]
[13][1] = Noam_chomsky.jpg
[13][2] = thumb
[13][3] = 150px
[13][4] = right
[13][5] =  [[Noam Chomsky]] (1928–)

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  [[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
  Image:                   'Image:'
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    [^|]*                    any character except: '|' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      jp                       'jp'
----------------------------------------------------------------------
      e?                       'e' (optional (matching the most
                               amount possible))
----------------------------------------------------------------------
      g                        'g'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      svg                      'svg'
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
  [|]                      any character of: '|'
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    [^|]*                    any character except: '|' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
  [|]                      any character of: '|'
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      [[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
      [^\]]*                   any character except: '\]' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \]                       ']'
----------------------------------------------------------------------
      \]                       ']'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [^|[]                    any character except: '|', '['
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
  [|]                      any character of: '|'
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (                        group and capture to :
----------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more
                               times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
        [^\]]*                   any character except: '\]' (0 or
                                 more times (matching the most amount
                                 possible))
----------------------------------------------------------------------
        \]                       ']'
----------------------------------------------------------------------
        \]                       ']'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        [^|[]                    any character except: '|', '['
----------------------------------------------------------------------
      )*                       end of grouping
----------------------------------------------------------------------
    )                        end of 
----------------------------------------------------------------------
    [|]                      any character of: '|'
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      [[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
      [^\]]*                   any character except: '\]' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      \]                       ']'
----------------------------------------------------------------------
      \]                       ']'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      (?:                      group, but do not capture:
----------------------------------------------------------------------
        (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
          \]                       ']'
----------------------------------------------------------------------
         |                        OR
----------------------------------------------------------------------
          \|                       '|'
----------------------------------------------------------------------
        )                        end of look-ahead
----------------------------------------------------------------------
        .                        any character except \n
----------------------------------------------------------------------
      )                        end of grouping
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [|]                      any character of: '|'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \]                       ']'
----------------------------------------------------------------------
    \]                       ']'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------

如果您只是将它们与此正则表达式匹配会怎么样:\[\[Image\:(.*)\]\] 然后将每个结果与 | 分开。不知道这是否是个好主意,但尝试没有坏处。