如何在 java 中定期检查文件夹是否存在

Question

我的目标是检查 Hadoop 路径是否存在，如果存在，则应该下载一些文件。现在每隔五分钟，我想检查该文件是否存在于 Hadoop 路径中。目前，我的代码正在检查文件是否存在，但不是每隔一段时间检查一次。

public boolean checkPathExistance(final org.apache.hadoop.fs.Path hdfsAbsolutePath)
{
    final Configuration configuration = getHdfsConfiguration();
    try
    {
        final FileSystem fileSystems = hdfsAbsolutePath.getFileSystem(configuration);
        if (fileSystems.exists(hdfsAbsolutePath))
        {
            return true;
        }
    }
    catch (final IOException e)
    {
        e.printStackTrace();
    }
    return false;
}

条件是应该每五分钟调用一次 checkPathExistance 方法来检查文件是否存在。当它 return 为真时，应该下载文件。

 public void download(final String hdfs, final Path outputPath)
{
    final org.apache.hadoop.fs.Path hdfsAbsolutePath = getHdfsFile(hdfsLocalPath).getPath();
    logger.info("path check {}", hdfsAbsolutePath.getName());
    final boolean isPathExist =  checkPathExistance(hdfsAbsolutePath);
    downloadFromHDFS(hdfsAbsolutePath, outputPath);
}

我可以在这里得到一些帮助吗？

Answer 1

对于文件复制（而不是 文件夹 复制，如果我在你的问题的上下文中理解正确的话）你可以只使用 copyToLocalFile 中的方法 FileSystem 如 here 所示，通过指定检查是否要删除源文件的布尔值，以及输入 (HDFS)/输出（本地）路径。

至于在 HDFS 中定期检查文件是否存在，您可以使用 ScheduledExecutorService 对象 (Java 8 docs here) 通过指定您想要的函数' 每 5 分钟执行一次运行。

以下程序有两个参数，HDFS 中输入文件的路径和本地输出文件的路径。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class RegularFileCheck
{
    public static boolean checkPathExistence(Path inputPath, Configuration conf) throws IOException
    {
        boolean flag = false;

        FileSystem fs = FileSystem.get(conf);

        if(fs.exists(inputPath))
            flag = true;

        return flag;
    }

    public static void download(Path inputPath, Path outputPath, Configuration conf) throws IOException
    {
        FileSystem fs = FileSystem.get(conf);
        fs.copyToLocalFile(false, inputPath, outputPath);   // don't delete the source input file
        System.out.println("File copied!");
    }

    public static void main(String[] args)
    {
        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);

        Configuration conf = new Configuration();

        ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);

        Runnable task = () ->
        {
            System.out.println("New Timer!");

            try
            {
                if(checkPathExistence(inputPath, conf))
                    download(inputPath, outputPath, conf);
            }
            catch (IOException e)
            {
                e.printStackTrace();
            }
        };

        executor.scheduleWithFixedDelay(task, 0, 5, TimeUnit.MINUTES);
    }
}

控制台输出当然是连续的，如下图所示（test.txt是存储在HDFS中的文件，test1.txt是要复制到本地的文件）。如果你想在文件已经找到并复制后停止重新执行，或者如果你想在一段时间后停止检查文件，你可以另外修改上面的代码。

要停止搜索和复制，只需将上面的代码替换为以下代码段：

Runnable task = () ->
{
    System.out.println("New Timer!");

    try
    {
        if(new File(String.valueOf(outputPath)).exists())
            System.exit(0);
        else if(checkPathExistence(inputPath, conf))
            download(inputPath, outputPath, conf);
    }
    catch (IOException e)
    {
        e.printStackTrace();
    }
};

并且程序将在文件被复制后停止，如控制台输出所示：

如何在 java 中定期检查文件夹是否存在

How to check if a folder exists at a regular interval in java

java

hadoop