mount_workspace_dir notebook magic 在 EMR Studio 中不起作用

mount_workspace_dir notebook magic not working in EMR Studio

在 EMR Studio Python3 notebook 中,我执行以下操作:

%mount_workspace_dir .

并收到以下错误:

UsageError: Line magic function `%mount_workspace_dir` not found.

我使用 Cloud Formation 模板为工作室设置了 EMR 集群,Studio 可以通过服务目录访问该模板。 Cloud Formation 模板指定了一个 bootstrap 脚本来安装 s3fs-fuse。该模板还指定了集群启动时要执行的步骤,即使用 pip 安装 emr-notebooks-magics。

当集群启动时,我执行上面的 %mount_workspace_dir 命令并收到指示的错误。我也尝试使用菜单中的 Kernel->Restart Kernel 选项重新启动内核。

这是 Cloud Formation 模板(替换了子网和存储桶名称):

---
AWSTemplateFormatVersion: 2010-09-09

Parameters:
  SubnetId:
    Type: "String"

Resources:
  EmrCluster:
    Type: AWS::EMR::Cluster
    Properties:
      Applications:
        - Name: Spark
        - Name: Livy
        - Name: JupyterEnterpriseGateway
        - Name: Hive
        - Name: Presto
      EbsRootVolumeSize: '50'
      Name: !Join ['-', ['emr-studio-', !Select [4, !Split ['-', !Select [2, !Split ['/', !Ref AWS::StackId]]]]]]
      JobFlowRole: emr-studio-instance-role
      ServiceRole: EMR_DefaultRole
      ReleaseLabel: "emr-6.3.0"
      VisibleToAllUsers: true
      LogUri:
        Fn::Sub: 's3://<my-bucket>/'
      Instances:
        TerminationProtected: false
        Ec2SubnetId: '<my-subnet>'
        MasterInstanceGroup:
          InstanceCount: 1
          InstanceType: "m5.xlarge"
      BootstrapActions:
      - Name: Auto-Termination
        ScriptBootstrapAction:
          Path: "s3://<my-bucket>/scripts/bootstrap-actions/install-s3fs-fuse.sh"
      Steps:
      - Name: Enable-Notebooks-Magics
        ActionOnFailure: CONTINUE
        HadoopJarStep:
          Jar: command-runner.jar
          Args:
          - "sudo"
          - "/mnt/notebook-env/bin/pip"
          - "install"
          - "emr-notebooks-magics"

Outputs:
  ClusterId:
    Value:
      Ref: EmrCluster
    Description: The ID of the EMR Cluster

这里是 install-s3fs-fuse.sh 脚本的内容:

sudo amazon-linux-extras install epel -y
sudo yum install s3fs-fuse -y

我也尝试过使用 EMR 6.5.0。

有没有我遗漏的步骤?

emr-notebooks-magics 的 setup.py 脚本中似乎存在错误,它没有将 001-setup-emr-notebook-magics.py 脚本复制到正确的位置。我需要添加以下步骤才能使其工作:

  Steps:
  - Name: Enable-Notebooks-Magics
    ActionOnFailure: CONTINUE
    HadoopJarStep:
      Jar: command-runner.jar
      Args:
      - "sudo"
      - "/mnt/notebook-env/bin/pip3"
      - "install"
      - "emr-notebooks-magics"
  - Name: Copy-Magics-Script
    ActionOnFailure: CONTINUE
    HadoopJarStep:
      Jar: command-runner.jar
      Args:
      - "sudo"
      - "cp"
      - "/mnt/notebook-env/bin/001-setup-emr-notebook-magics.py"
      - "/home/emr-notebook/.ipython/profile_default/startup/"

您可以更新 EMR 步骤脚本,以 emr-notebook 用户身份安装软件包,以便将启动文件复制到正确的目录。

sudo -u emr-notebook /mnt/notebook-env/bin/pip install emr-notebooks-magics