如何在 Docker 容器中定位 openjdk?
How to locate openjdk in Docker container?
我尝试 运行 pyspark application.For 首先我从 pip 安装了 pyspark 然后拉 openjdk:8 设置 JAVA_HOME 变量
Docker 文件:
FROM python:3
ADD my_script.py /
COPY requirements.txt ./
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python", "./my_script.py" ]
my_script.py :
from pyspark import SparkContext
from pyspark import SparkConf
#spark conf
conf1 = SparkConf()
conf1.setMaster("local[*]")
conf1.setAppName('hamza')
print(conf1)
sc = SparkContext(conf = conf1)
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
print(sqlContext)
Requirements.txt :
pyspark
numpy
出现此错误:
C:\Users\hrafiq\Desktop\sample>docker run -it --rm --name data2 my-python-app
<pyspark.conf.SparkConf object at 0x7f4bd933ba58>
/usr/local/lib/python3.7/site-packages/pyspark/bin/spark-class: line 71:
/usr/lib/jvm/java-8-openjdk-amd64//bin/java: No such file or directory
Traceback (most recent call last):
File "./my_script.py", line 14, in <module>
sc = SparkContext(conf = conf1)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
所以问题是如果找不到 java 文件,那么我将如何找到该文件?我知道它存储在一些我们无权访问的虚拟硬盘中。
任何帮助将不胜感激
谢谢
设置 JAVA_HOME 环境变量是不够的。您需要在 docker 映像中实际安装 openjdk。
您的基础映像 (python:3) 本身是基于 Debian Stretch 映像的。所以你可以使用 apt-get install 来获取 JDK :
FROM python:3
RUN apt-get update && \
apt-get install -y openjdk-8-jdk-headless && \
rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY my_script.py ./
CMD [ "python", "./my_script.py" ]
(在上面我已经优化了层顺序,这样当你的脚本发生变化时你不需要重新构建 pip 安装层)
我尝试 运行 pyspark application.For 首先我从 pip 安装了 pyspark 然后拉 openjdk:8 设置 JAVA_HOME 变量
Docker 文件:
FROM python:3
ADD my_script.py /
COPY requirements.txt ./
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN pip install --no-cache-dir -r requirements.txt
CMD [ "python", "./my_script.py" ]
my_script.py :
from pyspark import SparkContext
from pyspark import SparkConf
#spark conf
conf1 = SparkConf()
conf1.setMaster("local[*]")
conf1.setAppName('hamza')
print(conf1)
sc = SparkContext(conf = conf1)
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
print(sqlContext)
Requirements.txt :
pyspark
numpy
出现此错误:
C:\Users\hrafiq\Desktop\sample>docker run -it --rm --name data2 my-python-app
<pyspark.conf.SparkConf object at 0x7f4bd933ba58>
/usr/local/lib/python3.7/site-packages/pyspark/bin/spark-class: line 71:
/usr/lib/jvm/java-8-openjdk-amd64//bin/java: No such file or directory
Traceback (most recent call last):
File "./my_script.py", line 14, in <module>
sc = SparkContext(conf = conf1)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
所以问题是如果找不到 java 文件,那么我将如何找到该文件?我知道它存储在一些我们无权访问的虚拟硬盘中。
任何帮助将不胜感激
谢谢
设置 JAVA_HOME 环境变量是不够的。您需要在 docker 映像中实际安装 openjdk。
您的基础映像 (python:3) 本身是基于 Debian Stretch 映像的。所以你可以使用 apt-get install 来获取 JDK :
FROM python:3
RUN apt-get update && \
apt-get install -y openjdk-8-jdk-headless && \
rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY my_script.py ./
CMD [ "python", "./my_script.py" ]
(在上面我已经优化了层顺序,这样当你的脚本发生变化时你不需要重新构建 pip 安装层)