如何在一个程序关闭并重新启动而另一个程序保持运行在Java后恢复RMI通信?

How to resume RMI communication after a program shuts down and restarts while the other remains running in Java?

所以我目前有很多代码,很难将它们全部分解成一个 SSCCE,但如果有必要,我可能稍后会尝试这样做。

无论如何,这里是要点:我有两个进程通过 RMI 进行通信。有用。但是,如果主机进程 (JobViewer) 退出通信,我希望能够继续,然后 returns 全部在客户端进程 (Job) 的生命周期中。

目前,每次作业启动时,我都会将绑定名称保存到文件中,并且 JobViewer 在启动时打开该文件。效果很好,正确的绑定名称有效。但是,每次我尝试恢复与 JobViewer 重新启动时实际上仍然是 运行 的作业的通信时,我都会收到 NotBoundException

我的 JobViewer 使用以下方法实现了一个扩展 Remote 的接口:

public void registerClient(String bindedName, JobStateSummary jobSummary) throws RemoteException, NotBoundException;
public void giveJobStateSummary(JobStateSummary jobSummary) throws RemoteException;
public void signalEndOfClient(JobStateSummary jobSummary) throws RemoteException;

我的 Job 还实现了一个不同的接口,它使用以下方法扩展 Remote:

public JobStateSummary getJobStateSummary() throws RemoteException;
public void killRemoteJob() throws RemoteException;
public void stopRemoteJob() throws RemoteException;
public void resumeRemoteJob() throws RemoteException;

我该如何实现?这是我当前的一些初始化 RMI 的代码,如果有帮助的话...

JobViewer 端:

private Registry _registry;
// Set up RMI
_registry = LocateRegistry.createRegistry(2002);
_registry.rebind("JOBVIEWER_SERVER", this);

工作方面:

private NiceRemoteJobMonitor _server;

Registry registry = LocateRegistry.getRegistry(hostName, port);
registry.rebind(_bindedClientName, this);
Remote remoteServer = registry.lookup(masterName);

_server = (NiceRemoteJobMonitor)remoteServer;
_server.registerClient(_bindedClientName, _jobStateSummary);

I get a NotBoundException every time I try to resume communication with a Job that I know for fact is still running when the JobViewer restarts.

这只有在 JobViewer 启动时没有重新绑定自身的情况下才会发生。当您使用陈旧的存根时,通常会得到 NoSuchObjectException,即远程对象已退出的存根。在这种情况下,您应该重新获取存根,即重做 lookup().

为什么客户端将自己绑定到注册表?如果要注册回调,只需将 this 传递给 registerClient() 方法而不是绑定名称,并相应地调整其签名(使用客户端的远程接口作为参数类型)。无需让服务器查找客户端注册表。根本不需要客户端注册表。

我的解决方案是让 Job 每隔一段时间 ping JobViewer:

  while (true) {

    try {

      _server.ping();
      // If control reaches here we were able to successfully ping the job monitor.

    } catch (Exception e) {

      System.out.println("Job lost contact with the job monitor at " + new Date().toString() + " ...");

      // If control reaches we were unable to ping the job monitor.  Now we will loop until it presumably comes back to life.
      boolean foundServer = false;
      while (!foundServer) {

        try {

          // Attempt to register again.
          Registry registry = LocateRegistry.getRegistry(_hostName, _port);
          registry.rebind(_bindedClientName, NiceSupervisor.this);
          Remote remoteServer = registry.lookup(_masterName);
          _server = (NiceRemoteJobMonitor)remoteServer;
          _server.registerClient(_bindedClientName, _jobStateSummary);

          // Ping the server for good measure.
          _server.ping();

          System.out.println("Job reconnected with the job monitor at " + new Date().toString() + " ...");

          // If control reaches here we were able to reconnect to the job monitor and ping it again.
          foundServer = true;

        } catch (Exception x) {

          System.out.println("Job still cannot contact the job monitor at " + new Date().toString() + " ...");

        }

       // Sleep for 1 minute before we try to locate the registry again.
        try {
          Thread.currentThread().sleep(PING_WAIT_TIME);
        } catch (InterruptedException x) {

        }

     } // End of endless loop until we find the server again.

   }

    // Sleep for 1 minute after we ping the server before we try again.
    try {
      Thread.currentThread().sleep(PING_WAIT_TIME);
    } catch (InterruptedException e) {

    }

  }  // End of endless loop that we never exit.