Hadoop的WORM原理..具体是什么意思?
WORM principle of Hadoop..What it exactly means?
Hadoop 按照 WORM 原则工作。为什么 Hue 让我编辑文件?我在 HDFS (CDH) 中创建了一个文件 employee.txt。我的印象是 employee.txt 根据 WORM 原则不应该是可编辑的。但是当我使用 Hue -> Edit file 打开文件时,我可以编辑现有的内容。那么WORM原理的思想是什么?
这是因为 Hue 做了一个:
- 将内容写入临时文件
- 删除旧文件
- 将临时文件重命名为文件
这绕过了 WORM 原则。
https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/fsutils.py处的代码:
def _do_overwrite(fs, path, copy_data):
"""
Atomically (best-effort) save the specified data to the given path
on the filesystem.
"""
# TODO(todd) Should probably do an advisory permissions check here to
# see if we're likely to fail (eg make sure we own the file
# and can write to the dir)
# First write somewhat-kinda-atomically to a staging file
# so that if we fail, we don't clobber the old one
path_dest = path + "._hue_new"
# Copy the data to destination
copy_data(path_dest)
# Try to match the permissions and ownership of the old file
cur_stats = fs.stats(path)
try:
fs.do_as_superuser(fs.chmod, path_dest, stat_module.S_IMODE(cur_stats['mode']))
except:
logging.exception("Could not chmod new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
try:
fs.do_as_superuser(fs.chown, path_dest, cur_stats['user'], cur_stats['group'])
except:
logging.exception("Could not chown new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
# Now delete the old - nothing we can do here to recover
fs.remove(path, skip_trash=True)
# Now move the new one into place
# If this fails, then we have no reason to assume
# we can do anything to recover, since we know the
# destination shouldn't already exist (we just deleted it above)
fs.rename(path_dest, path)
Hadoop 按照 WORM 原则工作。为什么 Hue 让我编辑文件?我在 HDFS (CDH) 中创建了一个文件 employee.txt。我的印象是 employee.txt 根据 WORM 原则不应该是可编辑的。但是当我使用 Hue -> Edit file 打开文件时,我可以编辑现有的内容。那么WORM原理的思想是什么?
这是因为 Hue 做了一个:
- 将内容写入临时文件
- 删除旧文件
- 将临时文件重命名为文件
这绕过了 WORM 原则。
https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/fsutils.py处的代码:
def _do_overwrite(fs, path, copy_data):
"""
Atomically (best-effort) save the specified data to the given path
on the filesystem.
"""
# TODO(todd) Should probably do an advisory permissions check here to
# see if we're likely to fail (eg make sure we own the file
# and can write to the dir)
# First write somewhat-kinda-atomically to a staging file
# so that if we fail, we don't clobber the old one
path_dest = path + "._hue_new"
# Copy the data to destination
copy_data(path_dest)
# Try to match the permissions and ownership of the old file
cur_stats = fs.stats(path)
try:
fs.do_as_superuser(fs.chmod, path_dest, stat_module.S_IMODE(cur_stats['mode']))
except:
logging.exception("Could not chmod new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
try:
fs.do_as_superuser(fs.chown, path_dest, cur_stats['user'], cur_stats['group'])
except:
logging.exception("Could not chown new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
# Now delete the old - nothing we can do here to recover
fs.remove(path, skip_trash=True)
# Now move the new one into place
# If this fails, then we have no reason to assume
# we can do anything to recover, since we know the
# destination shouldn't already exist (we just deleted it above)
fs.rename(path_dest, path)