Nsight Compute 无法分析 Waveglow(PyTorch 应用程序)
Nsight Compute can't profile Waveglow (PyTorch application)
我尝试通过此命令分析 https://github.com/NVIDIA/waveglow :
nv-nsight-cu-cli --export ./nsight_output ~/.virtualenvs/waveglow/bin/python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6
Python 命令来自 https://github.com/NVIDIA/waveglow#generate-audio-with-our-pre-existing-model 的指令,
它适用于 Nsight System,而不适用于 Nsight Compute。
分析不会结束打印此日志;所以我按了Ctrl+C。
此外,它只分析一个内核,但我有更多内核。 (由 Nsight Systems 检查)
...
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 286: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 287: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 288: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 289: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 290: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 291: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 292: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 293: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 294: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 295: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 296: 0%....50%...^C
==PROF== Received signal, trying to shutdown target application
- 43 passes
==ERROR== Failed to profile kernel "weight_norm_fwd_first_dim_ker..." in process
==ERROR== An error occurred while trying to profile.
==ERROR== An error occurred while trying to profile
==PROF== Report: nsight_compute_result.nsight-cuprof-report
OS: CentOS Linux 7、Nsight Compute (2019.3.1, Build 26317742),
GPU:Tesla V100-PCIE-32GB
我该如何解决这个问题?
我认为这里没有任何错误,该工具的行为符合预期。它不仅分析了一个内核,它还分析了您的日志输出中已经启动的 296 个内核(它们似乎都来自一个内核函数)。
您可以控制使用例如分析的内核的数量或类型。 --launch-count 或 --kernel-regex 选项。您还可以使用 --metrics 和 --section 控制为每个内核收集的指标,因为收集较少的指标会减少工具的开销。
有关更多可用的命令行选项,请参阅 https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options。
我尝试通过此命令分析 https://github.com/NVIDIA/waveglow :
nv-nsight-cu-cli --export ./nsight_output ~/.virtualenvs/waveglow/bin/python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6
Python 命令来自 https://github.com/NVIDIA/waveglow#generate-audio-with-our-pre-existing-model 的指令, 它适用于 Nsight System,而不适用于 Nsight Compute。
分析不会结束打印此日志;所以我按了Ctrl+C。 此外,它只分析一个内核,但我有更多内核。 (由 Nsight Systems 检查)
...
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 286: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 287: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 288: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 289: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 290: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 291: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 292: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 293: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 294: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 295: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 296: 0%....50%...^C
==PROF== Received signal, trying to shutdown target application
- 43 passes
==ERROR== Failed to profile kernel "weight_norm_fwd_first_dim_ker..." in process
==ERROR== An error occurred while trying to profile.
==ERROR== An error occurred while trying to profile
==PROF== Report: nsight_compute_result.nsight-cuprof-report
OS: CentOS Linux 7、Nsight Compute (2019.3.1, Build 26317742), GPU:Tesla V100-PCIE-32GB
我该如何解决这个问题?
我认为这里没有任何错误,该工具的行为符合预期。它不仅分析了一个内核,它还分析了您的日志输出中已经启动的 296 个内核(它们似乎都来自一个内核函数)。
您可以控制使用例如分析的内核的数量或类型。 --launch-count 或 --kernel-regex 选项。您还可以使用 --metrics 和 --section 控制为每个内核收集的指标,因为收集较少的指标会减少工具的开销。
有关更多可用的命令行选项,请参阅 https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options。