mount_workspace_dir notebook magic 在 EMR Studio 中不起作用
mount_workspace_dir notebook magic not working in EMR Studio
在 EMR Studio Python3 notebook 中,我执行以下操作:
%mount_workspace_dir .
并收到以下错误:
UsageError: Line magic function `%mount_workspace_dir` not found.
我使用 Cloud Formation 模板为工作室设置了 EMR 集群,Studio 可以通过服务目录访问该模板。 Cloud Formation 模板指定了一个 bootstrap 脚本来安装 s3fs-fuse。该模板还指定了集群启动时要执行的步骤,即使用 pip 安装 emr-notebooks-magics。
当集群启动时,我执行上面的 %mount_workspace_dir 命令并收到指示的错误。我也尝试使用菜单中的 Kernel->Restart Kernel 选项重新启动内核。
这是 Cloud Formation 模板(替换了子网和存储桶名称):
---
AWSTemplateFormatVersion: 2010-09-09
Parameters:
SubnetId:
Type: "String"
Resources:
EmrCluster:
Type: AWS::EMR::Cluster
Properties:
Applications:
- Name: Spark
- Name: Livy
- Name: JupyterEnterpriseGateway
- Name: Hive
- Name: Presto
EbsRootVolumeSize: '50'
Name: !Join ['-', ['emr-studio-', !Select [4, !Split ['-', !Select [2, !Split ['/', !Ref AWS::StackId]]]]]]
JobFlowRole: emr-studio-instance-role
ServiceRole: EMR_DefaultRole
ReleaseLabel: "emr-6.3.0"
VisibleToAllUsers: true
LogUri:
Fn::Sub: 's3://<my-bucket>/'
Instances:
TerminationProtected: false
Ec2SubnetId: '<my-subnet>'
MasterInstanceGroup:
InstanceCount: 1
InstanceType: "m5.xlarge"
BootstrapActions:
- Name: Auto-Termination
ScriptBootstrapAction:
Path: "s3://<my-bucket>/scripts/bootstrap-actions/install-s3fs-fuse.sh"
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip"
- "install"
- "emr-notebooks-magics"
Outputs:
ClusterId:
Value:
Ref: EmrCluster
Description: The ID of the EMR Cluster
这里是 install-s3fs-fuse.sh 脚本的内容:
sudo amazon-linux-extras install epel -y
sudo yum install s3fs-fuse -y
我也尝试过使用 EMR 6.5.0。
有没有我遗漏的步骤?
emr-notebooks-magics 的 setup.py 脚本中似乎存在错误,它没有将 001-setup-emr-notebook-magics.py 脚本复制到正确的位置。我需要添加以下步骤才能使其工作:
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip3"
- "install"
- "emr-notebooks-magics"
- Name: Copy-Magics-Script
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "cp"
- "/mnt/notebook-env/bin/001-setup-emr-notebook-magics.py"
- "/home/emr-notebook/.ipython/profile_default/startup/"
您可以更新 EMR 步骤脚本,以 emr-notebook 用户身份安装软件包,以便将启动文件复制到正确的目录。
sudo -u emr-notebook /mnt/notebook-env/bin/pip install emr-notebooks-magics
在 EMR Studio Python3 notebook 中,我执行以下操作:
%mount_workspace_dir .
并收到以下错误:
UsageError: Line magic function `%mount_workspace_dir` not found.
我使用 Cloud Formation 模板为工作室设置了 EMR 集群,Studio 可以通过服务目录访问该模板。 Cloud Formation 模板指定了一个 bootstrap 脚本来安装 s3fs-fuse。该模板还指定了集群启动时要执行的步骤,即使用 pip 安装 emr-notebooks-magics。
当集群启动时,我执行上面的 %mount_workspace_dir 命令并收到指示的错误。我也尝试使用菜单中的 Kernel->Restart Kernel 选项重新启动内核。
这是 Cloud Formation 模板(替换了子网和存储桶名称):
---
AWSTemplateFormatVersion: 2010-09-09
Parameters:
SubnetId:
Type: "String"
Resources:
EmrCluster:
Type: AWS::EMR::Cluster
Properties:
Applications:
- Name: Spark
- Name: Livy
- Name: JupyterEnterpriseGateway
- Name: Hive
- Name: Presto
EbsRootVolumeSize: '50'
Name: !Join ['-', ['emr-studio-', !Select [4, !Split ['-', !Select [2, !Split ['/', !Ref AWS::StackId]]]]]]
JobFlowRole: emr-studio-instance-role
ServiceRole: EMR_DefaultRole
ReleaseLabel: "emr-6.3.0"
VisibleToAllUsers: true
LogUri:
Fn::Sub: 's3://<my-bucket>/'
Instances:
TerminationProtected: false
Ec2SubnetId: '<my-subnet>'
MasterInstanceGroup:
InstanceCount: 1
InstanceType: "m5.xlarge"
BootstrapActions:
- Name: Auto-Termination
ScriptBootstrapAction:
Path: "s3://<my-bucket>/scripts/bootstrap-actions/install-s3fs-fuse.sh"
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip"
- "install"
- "emr-notebooks-magics"
Outputs:
ClusterId:
Value:
Ref: EmrCluster
Description: The ID of the EMR Cluster
这里是 install-s3fs-fuse.sh 脚本的内容:
sudo amazon-linux-extras install epel -y
sudo yum install s3fs-fuse -y
我也尝试过使用 EMR 6.5.0。
有没有我遗漏的步骤?
emr-notebooks-magics 的 setup.py 脚本中似乎存在错误,它没有将 001-setup-emr-notebook-magics.py 脚本复制到正确的位置。我需要添加以下步骤才能使其工作:
Steps:
- Name: Enable-Notebooks-Magics
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "/mnt/notebook-env/bin/pip3"
- "install"
- "emr-notebooks-magics"
- Name: Copy-Magics-Script
ActionOnFailure: CONTINUE
HadoopJarStep:
Jar: command-runner.jar
Args:
- "sudo"
- "cp"
- "/mnt/notebook-env/bin/001-setup-emr-notebook-magics.py"
- "/home/emr-notebook/.ipython/profile_default/startup/"
您可以更新 EMR 步骤脚本,以 emr-notebook 用户身份安装软件包,以便将启动文件复制到正确的目录。
sudo -u emr-notebook /mnt/notebook-env/bin/pip install emr-notebooks-magics