用户是否需要存在于所有节点才能被 hadoop 集群/HDFS 识别?

Do users need to exist across all nodes to be recognized by the hadoop cluster / HDFS?

在 MapR hadoop 中,为了让用户能够访问 HDFS 或将 YARN 用于程序,他们需要存在于集群中的所有节点(具有相同的 uid和 gid),这包括既不充当数据节点也不充当控制节点的客户端节点(MapR 实际上没有名称节点的概念)。这与 Hortonworks HDP 相同吗?

在 Hortonworks community 网站上找到了这个答案:

User should not have account on all the nodes of the cluster. He should only have account on edge node.

For a new user there are 2 types are directories we need to create before the user access the cluster.

1- User home directory [directory created on Linux Filesystem ie. /home/]

2- User HDFS directory [directory created on HDFS filesystem ie. /user/]

...you only need to create HDFS home directory[ie. /user/] on edge node [not sure the meaning here since HDFS does not seem to have anything to do with any particular edge node]. You can still run jobs with the new user on cluster, even if you haven't created his home directory in linux.

** 更新: 根据用户@cricket_007 的评论,似乎该用户也必须存在于名称节点服务器上。我能找到的最接近于明确说明这一点的文档 says:

Each file or directory operation passes the full path name to the NameNode, and the permissions checks are applied along the path for each operation. The client framework will implicitly associate the user identity with the connection to the NameNode, reducing the need for changes to the existing client API. [...] For instance, when the client first begins reading a file, it makes a first request to the NameNode to discover the location of the first blocks of the file.