Bash: 如何通过 ssh 发送我自己的自定义函数

Question

我的目标是做到以下几点：

1) 检查特定服务器上每个 GPU 使用了多少内存。我用 (nvidia-smi --query-gpu=memory.free --format=csv).

完成了这个

2) 找到空闲内存最大的GPU。我用 my_cmd() 完成了这个。它适用于我当前登录的远程服务器。

3) 如果我登录的远程服务器上的最大可用内存小于 1000 MiB，请通过 SSH 连接到集群中的每个其他 GPU 服务器以查找可用的最大可用内存。这些服务器根据 to_check.

进行标记

我当前的问题：

当 scriptuse 被赋予 cd 命令等时，下面的代码有效。

当给定 mycmd scriptuse 时，下面的代码失败。它给了我错误：

bash: my_cmd: command not found。

现在，我认为这里有不止一个问题。首先，我认为我没有正确地向 ssh 命令提供 my_cmd。其次，当我使用 my_cmd 时，我认为我没有成功连接到其他服务器。

谁能指出哪里出了问题以及如何解决？

完整的 bash 脚本如下。

#/bin/bash

#

my_cmd()
{
max_idx=0
max_mem=0
idx=0
{
  read _;                         # discard first line (header)
  while read -r mem _; do         # for each subsequent line, read first word into mem
    if (( mem > max_mem )); then  # compare against maximum mem value seen
      max_mem=$mem                # ...if greater, then update both that max value
      max_idx=$idx                # ...and our stored index value.
    fi
    ((++idx))
  done
} < <(nvidia-smi --query-gpu=memory.free --format=csv)
echo "Maximum memory seen is $max_mem, at processor $idx"
}

tocheck=('4' '5' '6' '7' '8')  #The GPUs to check
it1=1

#scriptuse="my_cmd" 
scriptuse= "cd ~/spatial; pwd; echo $gpuval"

while [ $it1 -lt ${#tocheck[@]} ] ; do #While we stil don't have enough free memory
        echo $it1 
        gpuval=${tocheck[$it1]}
        ssh gpu${gpuval} "${scriptuse}"
        it1=$[it1+1]
done

编辑

非常感谢您的帮助，但我的问题还没有解决。我这样做了：

1) 从我的 bash 脚本中删除 my_cmd。现在看起来像这样：

#/bin/bash

#

tocheck=('4' '5' '6' '7' '8')  #The GPUs to check
it1=1

scriptuse= "cd ~/spatial; echo $gpuval"

while [ $it1 -lt ${#tocheck[@]} ] ; do #While we stil don't have enough free memory
        echo $it1 
        gpuval=${tocheck[$it1]}
        ssh gpu${gpuval} "${scriptuse}" /my_script.sh
        it1=$[it1+1]
done

2) 创建一个名为 my_script.sh 的单独 bash 脚本，其中包含 my_cmd:

#/bin/bash

#
max_idx=0
max_mem=0
idx=0
{
  read _;                         # discard first line (header)
  while read -r mem _; do         # for each subsequent line, read first word into mem
    if (( mem > max_mem )); then  # compare against maximum mem value seen
      max_mem=$mem                # ...if greater, then update both that max value
      max_idx=$idx                # ...and our stored index value.
    fi
    ((++idx))
  done
} < <(nvidia-smi --query-gpu=memory.free --format=csv)
echo "Maximum memory seen is $max_mem, at processor $idx"

3) 运行 chmod 确保两个文件都可以运行.

4) 确保集群中的所有 GPU 上都存在这两个文件（它们具有公共存储）。

5) 运行 ./test_run，这是第 1 步中的 bash 脚本。

我收到错误：

./test_run.sh: line 8: cd ~/spatial; echo : No such file or directory
1
bash: /my_script.sh: No such file or directory
2
bash: /my_script.sh: No such file or directory
3
bash: /my_script.sh: No such file or directory
4
bash: /my_script.sh: No such file or directory

编辑：最终解决方案

感谢下面接受的答案和评论中的讨论，这是最终起作用的：

1) 保持 my_script 与之前编辑相同。

2) 文件 test_run 应如下所示：

#/bin/bash

tocheck=('4' '5' '6' '7' '8')  #The GPUs to check
it1=1

while [ $it1 -lt ${#tocheck[@]} ] ; do #While we still don't have enough free memory
        echo $it1 
        gpuval=${tocheck[$it1]}
        ssh gpu${gpuval} ~/spatial/my_script.sh
        it1=$[it1+1]
done

我认为之所以可行，是因为集群上的所有 GPU 都有一个公共存储，因此它们都可以访问 /user/spatial。

Answer 1

您的脚本运行宁在（您的shell）的环境与远程主机运行宁在（远程shell）的环境完全无关).如果您在 shell 中定义一个函数 my_cmd，它将不会通过网络传输到远程主机的 shell。

尝试一个更简单的例子：

$ foo() { echo foo; }
$ foo
foo
$ ssh remote-host foo
bash: foo: command not found

这根本不是 SSH、Bash 和 Linux/POSIX 的设计方式。现在，ssh 确实更新了远程环境的某些部分（详见 man ssh），但这仅限于某些环境变量，而不是函数。

值得注意的是，遥控器 shell 甚至可能与您的 shell 类型不同（例如，您的可能是 Bash，但遥控器 shell 可能是Zsh), 所以通常不可能在 ssh.

之间传输 shell 函数

一个更简单和更可靠的选择是创建一个 shell 脚本（而不是函数），您打算在远程 shell 上成为运行，并确保脚本存在于远程机器上。例如：

# Copy the script to the remote host's /tmp directory
scp my_cmd.sh remote-host:/tmp
# Invoke the script on the remote host
$ ssh remote-host /tmp/my_cmd.sh

编辑：

./test_run.sh: line 8: cd ~/spatial; echo : No such file or directory

您确定 ~/spatial 存在于远程主机上吗？

bash: /my_script.sh: No such file or directory

您确定 /my_script.sh 存在于远程主机上吗？

同样，您的远程主机是一个完全不同的环境。仅仅因为一个文件或目录存在于您的本地机器上并不意味着它存在于远程主机上，除非您将它放在那里。

试试 ssh [remote-host] 'ls ~' 和 ssh [remote-host] 'ls /' - 我打赌你会看到目录和文件不存在。

Bash: 如何通过 ssh 发送我自己的自定义函数

Bash: how to send my own custom function through ssh

bash

ssh-tunnel

编辑