以下正则表达式的最简化形式/从 nvidia-smi 输出中提取所有值

Question

我正在尝试分析 Python 中包含 nvidia-smi 输出的非常大的文本字符串，但我真的想花更多时间分析数据，而不是研究我的正则表达式技能。我得到了如下的正则表达式，但它在某些行中需要很长时间（这可能是某些行中输入数据的变化），但我想也许我的正则表达式模式也是计算密集型的。

extracted_line1 = r'[=]*[+][=]*[+][=]*\|\n\|(\s+(.*?)\|)+\n\|(\s+(.*?)\|)(\s+(.*?)\|)(\s+(.*?)\|)\n\|'

此模式匹配 table 中的第三行。

下面这个⬇️

 ===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 00000000:04:00.0 Off |                  N/A |
| 27%   20C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |

它适用于大多数行，但随机挂起某些行。这个正则表达式的更简化版本是什么？或者，也许更好的问题是，为每一行（每个 GPU 的相应指标）获取此 table 中每个值的最佳方法是什么？

这里是截断的输入字符串

... bunch of text
nvidia-smi:
Tue Jun  8 15:00:02 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 00000000:04:00.0 Off |                  N/A |
| 27%   20C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    On   | 00000000:05:00.0 Off |                  N/A |
| 27%   23C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

... bunch of text

P.S 我正在尝试提取以下值

        gpu_index = [processed result of regex output here]
        gpu_model_name = [processed result of regex output here]
        persistance_mode = [processed result of regex output here]
        bus_id = [processed result of regex output here]
        display_active = [processed result of regex output here]
        volatile_ecc = [processed result of regex output here]
        fan = [processed result of regex output here]
        temperature = [processed result of regex output here]
        perf = [processed result of regex output here]
        power_usage =  [processed result of regex output here]
        max_power = [processed result of regex output here]
        memory_usage = [processed result of regex output here] 
        available_mem = [processed result of regex output here] 
        gpu_utilization = [processed result of regex output here]
        compute_mode = [processed result of regex output here]
        multiple_instance_gpu_mode = [processed result of regex output here]

Answer 1

我建议使用另一种模式，更容易占用您的计算机资源。

模式

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)\s+(.*?)\s+\|

首先，我摆脱了查找 = 或 + 字符的所有起始模式，因为正则表达式知道如何查找您指示它查找的内容。不需要 'helper' 个句柄。

接下来我发现你只需要掌握英文字符 \w、数字 \d 和空格 \s，所以整个模式非常容易编写。

说明

我正在逐个匹配组构建整个模式匹配，直到获得最终结果。请注意每个解释仅对最后一个匹配组有效，即 (some ReGex expresion in parantesis)

(\d+%) 将匹配任意数量的数字后跟 %

(\d+%)\s+(\d+C) 将匹配未知数量的空格后的任意数字，后跟字母 C

(\d+%)\s+(\d+C)\s+(\w\d) 将匹配任何单个字符后跟任何单个数字

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W) 将匹配未知数量的空格后的任意数量的数字，后跟一个字符

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W) 知道应该有一些空格，然后是 / 和其他一些空格，这个表达式将匹配任意数量的数字，后跟 W

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB) 知道应该有一些空格，然后是 | 和其他一些空格，这个表达式将匹配任意数量的数字，后跟 MiB

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB) 知道应该有一些空格，然后是 / 和其他一些空格，这个表达式将匹配任意数量的数字，后跟 MiB

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%) 知道应该有一些空格，然后是 | 和一些其他空格，这个表达式将匹配任意数量的数字，后跟 %

最后一点

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)\s+(.*?)\s+\| 知道应该有一些空格，这个表达式将匹配任意数量的任何字符，直到它遇到一些未知数量的空格，然后是 |

终于涵盖了下一个变量：

gpu_index = not implemented
gpu_model_name = not implemented
persistance_mode = not implemented
bus_id = not implemented
display_active = not implemented
volatile_ecc = not implemented
fan = (\d+%)
temperature = (\d+C)
perf = (\w\d)
power_usage =  (\d+W)
max_power = (\d+W)
memory_usage = (\d+MiB)
available_mem = (\d+MiB)
gpu_utilization = (\d+\%)
compute_mode = (.*?)
multiple_instance_gpu_mode = not implemented

以下正则表达式的最简化形式/从 nvidia-smi 输出中提取所有值

Most simplified form of the following regex / Extracting all values from nvidia-smi output

python

regex

nvidia

data-analysis

nvidia-smi

模式

说明

最后一点

终于涵盖了下一个变量：