AWSGLUE python 包 - ls 无法访问目录

AWSGLUE python package - ls cannot access dir

我正在尝试在我的本地机器上安装本地 awsglue 包用于开发目的 (Windows + Git Bash)

https://github.com/awslabs/aws-glue-libs/tree/glue-1.0

https://support.wharton.upenn.edu/help/glue-debugging

Spark目录和下面错误中提到的py4j确实存在但仍然出现错误

我触发 sh 的目录如下:

user@machine xxxx64~/Desktop/lm_aws_glue/aws-glue-libs-glue-1.0/bin
$ ./glue-setup.sh
ls: cannot access 'C:\Spark\spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip': No such file or directory
rm: cannot remove 'PyGlue.zip': No such file or directory
./glue-setup.sh: line 14: zip: command not found

ls 结果:

$ ls -l
total 7
-rwxr-xr-x 1 n1543781 1049089 135 May  5  2020 gluepyspark*
-rwxr-xr-x 1 n1543781 1049089 114 May  5  2020 gluepytest*
-rwxr-xr-x 1 n1543781 1049089 953 Mar  5 11:10 glue-setup.sh*
-rwxr-xr-x 1 n1543781 1049089 170 May  5  2020 gluesparksubmit*

原始安装代码只需稍作调整即可正常运行。仍然需要 zip.

的解决方法
#!/usr/bin/env bash

#original code
#ROOT_DIR="$(cd $(dirname "[=10=]")/..; pwd)"
#cd $ROOT_DIR

#re-written
ROOT_DIR="$(cd /c/aws-glue-libs; pwd)" 
cd $ROOT_DIR

SPARK_CONF_DIR=$ROOT_DIR/conf
GLUE_JARS_DIR=$ROOT_DIR/jarsv1

#original code
#PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
#PYTHONPATH=`ls $SPARK_HOME/python/lib/py4j-*-src.zip`:"$PYTHONPATH"

#re-written
PYTHONPATH="/c/Spark/spark-3.1.1-bin-hadoop2.7/python/:$PYTHONPATH"
PYTHONPATH=`ls /c/Spark/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip`:"$PYTHONPATH"

# Generate the zip archive for glue python modules
rm PyGlue.zip
zip -r PyGlue.zip awsglue
GLUE_PY_FILES="$ROOT_DIR/PyGlue.zip"
export PYTHONPATH="$GLUE_PY_FILES:$PYTHONPATH"

# Run mvn copy-dependencies target to get the Glue dependencies locally
#mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies

export SPARK_CONF_DIR=${ROOT_DIR}/conf
mkdir $SPARK_CONF_DIR
rm $SPARK_CONF_DIR/spark-defaults.conf
# Generate spark-defaults.conf
echo "spark.driver.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
echo "spark.executor.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf

# Restore present working directory
cd -