安装 Spark Hana 连接器后 HDFS 文件损坏
HDFS Corrupt Files after Spark Hana Connector Install
自从安装了 SAP HANA Spark 连接器后,我在使用基于云的 Hadoop 集群 (HDP 2.3) 时遇到了重大问题。损坏的块导致 NameNode 始终打开安全模式。
hdfs fsck 给我以下信息:
[User@node-aa71f18bd ~]$ FSCK started by hdfs (auth:SIMPLE) from /10.97.20.236 for path / at Wed Nov 18 1 3:49:31 UTC 2015
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ .
-bash: .: filename argument required
.: usage: . filename [arguments]
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001: CORRUP T blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741852
-bash: /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001: MISSIN G 1 blocks of total size 911 B....................
-bash: /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/. regioninfo: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_ 1073741857
-bash: /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/.: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/. regioninfo: MISSING 1 blocks of total size 57 B................................. ........
-bash: /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/.: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin fo: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_10737418 37
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin fo: MISSING 1 blocks of total size 49 B..
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0 53baa46b1875e6506d01ebd12: CORRUPT blockpool BP-1656641573-10.97.31.53-144620602 6510 block blk_1073741922
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0 53baa46b1875e6506d01ebd12: MISSING 1 blocks of total size 40519 B........
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001: CORRUPT bl ockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741842
-bash: /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 838 B......................
-bash: /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain _45454: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073 741887
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain _45454: MISSING 1 blocks of total size 11733 B......
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain: No such file or directory
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain _45454: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073 741912
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain _45454: MISSING 1 blocks of total size 6691 B.......
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain: No such file or directory
[User@node-aa71f18bd ~]$ .............
bash: .............: command not found...
[User@node-aa71f18bd ~]$ /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz: CORRUPT blockpool BP-1656641573-10.97.31. 53-1446206026510 block blk_1073741827
-bash: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz: MISSING 1 blocks of total size 56926645 B ..................................
-bash: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz:: No such file or directory
[User@node-aa71f18bd ~]$ /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar: C ORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741897
-bash: /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar: M ISSING 1 blocks of total size 46057 B...........Status: CORRUPT
-bash: /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar:: No such file or directory
[User@node-aa71f18bd ~]$ Total size: 2217611677 B (Total open files size: 166 B)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ Total dirs: 188
bash: Total: command not found...
[User@node-aa71f18bd ~]$ Total files: 156
bash: Total: command not found...
[User@node-aa71f18bd ~]$ Total symlinks: 0 (Files currently being written: 4)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ Total blocks (validated): 133 (avg. block size 16673772 B) (Total open fil e blocks (not validated): 4)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ ********************************
bash: ********************************: command not found...
[User@node-aa71f18bd ~]$ UNDER MIN REPL'D BLOCKS: 9 (6.766917 %)
> dfs.namenode.replication.min: 1
> CORRUPT FILES: 9
> MISSING BLOCKS: 9
> MISSING SIZE: 57033500 B
> CORRUPT BLOCKS: 9
> ********************************
> Minimally replicated blocks: 124 (93.233086 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.7969925
> Corrupt blocks: 9
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 3
> Number of racks: 1
> FSCK ended at Wed Nov 18 13:49:31 UTC 2015 in 29 milliseconds
>
>
> The filesystem under path '/' is CORRUPT
问题是,集群上没有 "Data"。有些部分似乎是日志文件 - 但对于 uthers 我不确定,如果我会删除所需的系统文件(例如 AppMaster.jar)。如何在不重新设置整个系统的情况下至少恢复重要文件?
感谢您的帮助,
萨沙
用于在云环境中设置集群节点的 Chef 脚本将 VM 的存储设置为 DataNode 的主要存储卷。所以 hdfs 运行 存储不足。但仅在三个附加卷之一中。这些问题与 Hana Spark 连接器无关,特别是任何其他文件迟早会导致同样的问题。
自从安装了 SAP HANA Spark 连接器后,我在使用基于云的 Hadoop 集群 (HDP 2.3) 时遇到了重大问题。损坏的块导致 NameNode 始终打开安全模式。
hdfs fsck 给我以下信息:
[User@node-aa71f18bd ~]$ FSCK started by hdfs (auth:SIMPLE) from /10.97.20.236 for path / at Wed Nov 18 1 3:49:31 UTC 2015
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ .
-bash: .: filename argument required
.: usage: . filename [arguments]
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001: CORRUP T blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741852
-bash: /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001: MISSIN G 1 blocks of total size 911 B....................
-bash: /amshbase/data/default/METRIC_AGGREGATE/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/. regioninfo: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_ 1073741857
-bash: /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/.: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/. regioninfo: MISSING 1 blocks of total size 57 B................................. ........
-bash: /amshbase/data/default/METRIC_AGGREGATE_DAILY/6e87af3b3351ba6f55092465a59053b8/.: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin fo: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_10737418 37
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin fo: MISSING 1 blocks of total size 49 B..
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/.regionin: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0 53baa46b1875e6506d01ebd12: CORRUPT blockpool BP-1656641573-10.97.31.53-144620602 6510 block blk_1073741922
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0 53baa46b1875e6506d01ebd12: MISSING 1 blocks of total size 40519 B........
-bash: /amshbase/data/default/SYSTEM.CATALOG/167feb5a405a77b26fcaea5d560c84b1/0/b6a59d0: No such file or directory
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001: CORRUPT bl ockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741842
-bash: /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 838 B......................
-bash: /amshbase/data/default/SYSTEM.STATS/.tabledesc/.tableinfo.0000000001:: No such file or directory
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain _45454: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073 741887
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain _45454: MISSING 1 blocks of total size 11733 B......
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0002/node-a09295f36.Domain: No such file or directory
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain _45454: CORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073 741912
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain _45454: MISSING 1 blocks of total size 6691 B.......
-bash: /app-logs/ambari-qa/logs/application_1446206072803_0005/node-60c160a97.Domain: No such file or directory
[User@node-aa71f18bd ~]$ .............
bash: .............: command not found...
[User@node-aa71f18bd ~]$ /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz: CORRUPT blockpool BP-1656641573-10.97.31. 53-1446206026510 block blk_1073741827
-bash: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz: MISSING 1 blocks of total size 56926645 B ..................................
-bash: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz:: No such file or directory
[User@node-aa71f18bd ~]$ /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar: C ORRUPT blockpool BP-1656641573-10.97.31.53-1446206026510 block blk_1073741897
-bash: /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar:: No such file or directory
[User@node-aa71f18bd ~]$
[User@node-aa71f18bd ~]$ /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar: M ISSING 1 blocks of total size 46057 B...........Status: CORRUPT
-bash: /user/ambari-qa/DistributedShell/application_1446206072803_0004/AppMaster.jar:: No such file or directory
[User@node-aa71f18bd ~]$ Total size: 2217611677 B (Total open files size: 166 B)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ Total dirs: 188
bash: Total: command not found...
[User@node-aa71f18bd ~]$ Total files: 156
bash: Total: command not found...
[User@node-aa71f18bd ~]$ Total symlinks: 0 (Files currently being written: 4)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ Total blocks (validated): 133 (avg. block size 16673772 B) (Total open fil e blocks (not validated): 4)
-bash: syntax error near unexpected token `('
[User@node-aa71f18bd ~]$ ********************************
bash: ********************************: command not found...
[User@node-aa71f18bd ~]$ UNDER MIN REPL'D BLOCKS: 9 (6.766917 %)
> dfs.namenode.replication.min: 1
> CORRUPT FILES: 9
> MISSING BLOCKS: 9
> MISSING SIZE: 57033500 B
> CORRUPT BLOCKS: 9
> ********************************
> Minimally replicated blocks: 124 (93.233086 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.7969925
> Corrupt blocks: 9
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 3
> Number of racks: 1
> FSCK ended at Wed Nov 18 13:49:31 UTC 2015 in 29 milliseconds
>
>
> The filesystem under path '/' is CORRUPT
问题是,集群上没有 "Data"。有些部分似乎是日志文件 - 但对于 uthers 我不确定,如果我会删除所需的系统文件(例如 AppMaster.jar)。如何在不重新设置整个系统的情况下至少恢复重要文件?
感谢您的帮助, 萨沙
用于在云环境中设置集群节点的 Chef 脚本将 VM 的存储设置为 DataNode 的主要存储卷。所以 hdfs 运行 存储不足。但仅在三个附加卷之一中。这些问题与 Hana Spark 连接器无关,特别是任何其他文件迟早会导致同样的问题。