headers 的正则表达式 Base64 图像

Question

我遇到了 base64 图像有时无法正确转换的问题。我需要一种方法来在转换图像之前测试图像是否采用正确的 base64 格式，以便我可以尝试进一步研究问题。我在网上找到了一些正则表达式公式，但我认为他们只需要没有 headers 的字符串。我有带有 headers 的字符串。我试图添加 headers，但它总是坏掉。

原文：

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

我添加了 headers 但它不起作用：

^([data:image/png;base64,][A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

谢谢

Answer 1

您可能会注意到在原始正则表达式中 [square brackets] 的使用，这些创建的字符集匹配其中的任何字符，因此 [data:image/png;base64,] 将匹配 d,a,t,a,....,6,4,,。相反，您可能想要创建一个 non-capturing 组 因为我认为您正在尝试使 header 可选，就像这样 (?:data:image/png;base64,)?

^((?:data:image/png;base64,)?[A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

^                                 # Anchors to the beginning to the string.
(                                 # Opens CG1
 (?:data:image/png;base64,        # Opens NCG1
                                    # Literal data:image/png;base64,
 )?                               # Closes NCG1
                                    # ? repeats zero or one times
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {4}                              # Repeats 4 times.
)*                                # Closes CG1
                                    # * repeats zero or more times
(                                 # Opens CG2
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {4}                              # Repeats 4 times.
 |                                # Alt (CG2)
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {3}                              # Repeats 3 times.
 =                                # Literal =
 |                                # Alt (CG2)
 [A-Za-z0-9+/]                    # Character class (any of the characters within)
                                    # Anything between A and Z
                                    # Anything between a and z
                                    # Anything between 0 and 9
                                    # Any of: +/
 {2}                              # Repeats 2 times.
 ==                               # Literal ==
)                                 # Closes CG2
$                                 # Anchors to the end to the string.

但是，如果您需要 header，则可以一起删除 non-capturing 组和 ? 量词。

^(data:image/png;base64,[A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

Answer 2

正则表达式

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

这些字符是什么意思：

^ ... 查找从行或字符串缓冲区的开头开始的字符串。

( ... ) ... 定义一个标记组，用于向后引用由括号内的表达式找到的字符串，或用于应用此处使用的乘数。仅仅为了应用乘数而对表达式进行分组通常比使用非标记组更好，即使用 (?: ... ) ，其中问号和紧跟在左括号后的冒号使该组成为非标记组组.

[ ... ] ... 定义正 class 个字符，这意味着对于正匹配，方括号内的任何字符都应该被找到一次. [^ ... ] 将是一个否定字符 class 定义，这意味着除了方括号中的一个字符之外的任何字符都应该被找到。

[A-Za-z0-9+/] ... 字符可以是来自 ASCII table 的大写或小写字母，也可以是数字、加号或斜杠。

{4} ... 是一个乘数，表示前一个表达式或字符恰好四次。

* ...也是一个乘数，表示前面的表达式或字符0次或多次。

| ... 表示或。

$ ...表示行尾没有匹配的行终止符或字符串缓冲区的结尾。

所以这个表达式的意思是：

查找从行首或字符串缓冲区开始的字符串，
由 0 个或多个子字符串组成，每个子字符串恰好有 4 个字符，每个子字符串本身由字母、数字、加号或斜杠字符组成，
行尾或字符串缓冲区的最后一个子字符串是
- 也是由 4 个字母、数字、加号或斜杠字符组成的字符串，
- OR 仅由 3 个字母、数字、加号或斜线组成的字符串和作为第四个字符的等号，
- OR 仅由 2 个字母、数字、加号或斜杠和两个等号作为第三和第四个字符组成的字符串。

要允许在行或字符串缓冲区的开头附加可选一个header字符串，表达式应修改为：

^(?:data:image/png;base64,)?(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

非标记组后面的问号(?:data:image/png;base64,)在这里表示前面的表达式（只是一个固定的字符串）零次或一次。

如您所见，我还通过在左括号后插入 ?: 将 2 个标记组更改为 2 个非标记组。

headers 的正则表达式 Base64 图像

Regex Base64 image with headers

regex

coldfusion