Hadoop的WORM原理..具体是什么意思?

WORM principle of Hadoop..What it exactly means?

Hadoop 按照 WORM 原则工作。为什么 Hue 让我编辑文件?我在 HDFS (CDH) 中创建了一个文件 employee.txt。我的印象是 employee.txt 根据 WORM 原则不应该是可编辑的。但是当我使用 Hue -> Edit file 打开文件时,我可以编辑现有的内容。那么WORM原理的思想是什么?

这是因为 Hue 做了一个:

  1. 将内容写入临时文件
  2. 删除旧文件
  3. 将临时文件重命名为文件

这绕过了 WORM 原则。

https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/fsutils.py处的代码:

def _do_overwrite(fs, path, copy_data):
    """
    Atomically (best-effort) save the specified data to the given path
    on the filesystem.
    """
    # TODO(todd) Should probably do an advisory permissions check here to
    # see if we're likely to fail (eg make sure we own the file
    # and can write to the dir)

    # First write somewhat-kinda-atomically to a staging file
    # so that if we fail, we don't clobber the old one
    path_dest = path + "._hue_new"

    # Copy the data to destination
    copy_data(path_dest)

    # Try to match the permissions and ownership of the old file
    cur_stats = fs.stats(path)
    try:
        fs.do_as_superuser(fs.chmod, path_dest, stat_module.S_IMODE(cur_stats['mode']))
    except:
        logging.exception("Could not chmod new file %s to match old file %s" % (path_dest, path))
        # but not the end of the world - keep going

    try:
        fs.do_as_superuser(fs.chown, path_dest, cur_stats['user'], cur_stats['group'])
    except:
        logging.exception("Could not chown new file %s to match old file %s" % (path_dest, path))
        # but not the end of the world - keep going

    # Now delete the old - nothing we can do here to recover
    fs.remove(path, skip_trash=True)

    # Now move the new one into place
    # If this fails, then we have no reason to assume
    # we can do anything to recover, since we know the
    # destination shouldn't already exist (we just deleted it above)
    fs.rename(path_dest, path)