Docker(HDFS、Spark、Shiny R)
Docker (HDFS, Spark, Shiny R)
我在同一网络中有 3 个容器:一个 Hadoop 容器、一个 Spark 容器和一个 Shiny R 容器
我想从我的 Shiny 应用程序读取 HDFS 上的一个文件夹。
如果 Hadoop、Spark 和 Shiny R 在同一台服务器上(没有 Docker 容器)我可以使用这个:
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
如果我使用 docker 容器,其中 Hadoop 和 Shiny R 位于不同的容器中,我不能这样做:
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
因为他们是独立的。
你知道我该怎么做吗?
我试图使用 sparklyr 的调用函数,但它没有用。
> library(sparklyr)
>
> conf = spark_config()
>
> sc <- spark_connect(master = "local[*]", config = conf)
Re-using existing Spark connection to local[*]
>
> hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
>
> path <- 'hdfs://namenode:9000/user/root/input2/'
>
> spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
> spath
<jobj[30]>
org.apache.hadoop.fs.Path
hdfs://namenode:9000/user/root/input2
> fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get", hconf)
> fs
<jobj[32]>
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.LocalFileSystem@788cf1b0
> lls <- invoke(fs, "globStatus", spath)
Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
谢谢
为了你的帮助
我用这个解决了这个问题:/var/run/docker.sock。
所以我改变了我的 docker-compose。我闪亮的服务是:
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
我完整的 docker-compose 是:
version: "2"
services:
namenode:
image: anaid/hadoop-namenode:1.1
container_name: namenode
volumes:
- hadoop_namenode:/hadoop/dfs/name
- hadoop_namenode_files:/hadoop/dfs/files
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
ports:
- 9899:9870
datanode:
image: anaid/hadoop-datanode:1.1
container_name: datanode
depends_on:
- namenode
environment:
SERVICE_PRECONDITION: "namenode:9870"
volumes:
- hadoop_datanode1:/hadoop/dfs/data
- hadoop_namenode_files1:/hadoop/dfs/files
env_file:
- ./hadoop.env
mongodb:
image: mongo
container_name: mongodb
ports:
- "27020:27017"
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
nodemanager:
image: anaid/hadoop-nodemanager:1.1
container_name: nodemanager
depends_on:
- namenode
- datanode
env_file:
- ./hadoop.env
historyserver:
image: anaid/hadoop-historyserver:1.1
container_name: historyserver
depends_on:
- namenode
- datanode
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
spark-master:
image: anaid/spark-master:1.1
container_name: spark-master
ports:
- "9090:8080"
- "7077:7077"
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
environment:
- "SPARK_LOCAL_IP=spark-master"
spark-worker-1:
image: anaid/spark-worker:1.1
container_name: spark-worker-1
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8083:8081"
spark-worker-2:
image: anaid/spark-worker:1.1
container_name: spark-worker-2
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8084:8081"
volumes:
hadoop_namenode:
hadoop_datanode1:
hadoop_namenode_files:
hadoop_namenode_files1:
hadoop_historyserver:
shiny_logs:
mongo-config:
然后我不得不在我闪亮的容器中安装 docker。我在 docker 文件上添加了命令。
我闪亮的 docker 文件是:
# get shiny serves plus tidyverse packages image
FROM rocker/shiny:3.6.1
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo
# Anaid added for V8 and sparklyr library
RUN apt-get install -y \
r-cran-xml \
openjdk-8-jdk \
libv8-dev \
libxml2 \
libxml2-dev \
libssl-dev \
libcurl4-openssl-dev \
libcairo2-dev \
libsasl2-dev \
libssl-dev \
vim
RUN sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
# For docker inside the container
# Add Docker’s official GPG key:
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
RUN sudo apt-key fingerprint 0EBFCD88
RUN sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
RUN sudo apt-get update
# Install the latest version of Docker Engine
RUN sudo apt-get install -y \
docker-ce \
docker-ce-cli \
containerd.io
RUN sudo apt-get install -y \
docker-ce=5:19.03.2~3-0~debian-stretch \
docker-ce-cli=5:19.03.2~3-0~debian-stretch \
containerd.io
# Download and install library. They are saved here /usr/local/lib/R/site-library
RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler', 'dbplyr', 'r2d3', 'base64enc', 'devtools', 'zeallot', 'digest', 'jsonlite', 'tibble', 'pkgconfig', 'rlang', 'DBI', 'cli', 'rstudioapi', 'yaml', 'arallel', 'withr', 'dplyr', 'httr_1.4.0', 'generics', 'htmlwidgets', 'vctrs', 'askpass', 'rprojroot', 'tidyselect', 'glue', 'forge', 'R6', 'fansi', 'purrr', 'magrittr', 'backports', 'htmltools', 'ellipsis', 'assertthat', 'config', 'utf8', 'openssl', 'crayon', 'shinydashboard', 'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"
RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"
RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"
## create directories
## RUN mkdir -p /myScripts
## copy files
## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R
## COPY /myScripts/packageList /myScripts/packageList
## install R-packages
## RUN Rscript /myScripts/installMissingPkgs.R
# copy the app to the image
COPY app.R /srv/shiny-server/
# select port
EXPOSE 3838
# allow permission
RUN sudo chown -R shiny:shiny /srv/shiny-server
# run app
CMD ["/usr/bin/shiny-server.sh"]
Using R system function and docker commands inside docker container
然后我在使用我的应用程序中的 R 系统函数时遇到了一些问题。这是错误:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir), :
running command 'docker exec -it namenode hdfs dfs -ls /' had status 1
我通过这些(Shiny 的容器)解决了这个问题:
sudo chmod 666 /var/run/docker.sock
然后,我在我的应用程序上添加了 USER=root:
system("USER=root")
system("docker exec namenode hdfs dfs -ls /", intern = TRUE)
我使用 system() 的简单应用程序的代码:
library(shiny)
library(tools)
library(stringi)
ui <- fluidPage(
h3(textOutput("system"))
)
server <- function(input, output, session) {
rv <- reactiveValues(syst = NULL)
observe({
# pwd
# docker ps working
system("USER=root")
rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) )
})
output$system <- renderText({
rv$syst
})
}
shinyApp(ui, server)
My shiny app running (using system)
我在同一网络中有 3 个容器:一个 Hadoop 容器、一个 Spark 容器和一个 Shiny R 容器
我想从我的 Shiny 应用程序读取 HDFS 上的一个文件夹。 如果 Hadoop、Spark 和 Shiny R 在同一台服务器上(没有 Docker 容器)我可以使用这个:
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
如果我使用 docker 容器,其中 Hadoop 和 Shiny R 位于不同的容器中,我不能这样做:
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
因为他们是独立的。
你知道我该怎么做吗?
我试图使用 sparklyr 的调用函数,但它没有用。
> library(sparklyr)
>
> conf = spark_config()
>
> sc <- spark_connect(master = "local[*]", config = conf)
Re-using existing Spark connection to local[*]
>
> hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
>
> path <- 'hdfs://namenode:9000/user/root/input2/'
>
> spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
> spath
<jobj[30]>
org.apache.hadoop.fs.Path
hdfs://namenode:9000/user/root/input2
> fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get", hconf)
> fs
<jobj[32]>
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.LocalFileSystem@788cf1b0
> lls <- invoke(fs, "globStatus", spath)
Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
谢谢 为了你的帮助
我用这个解决了这个问题:/var/run/docker.sock。 所以我改变了我的 docker-compose。我闪亮的服务是:
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
我完整的 docker-compose 是:
version: "2"
services:
namenode:
image: anaid/hadoop-namenode:1.1
container_name: namenode
volumes:
- hadoop_namenode:/hadoop/dfs/name
- hadoop_namenode_files:/hadoop/dfs/files
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
ports:
- 9899:9870
datanode:
image: anaid/hadoop-datanode:1.1
container_name: datanode
depends_on:
- namenode
environment:
SERVICE_PRECONDITION: "namenode:9870"
volumes:
- hadoop_datanode1:/hadoop/dfs/data
- hadoop_namenode_files1:/hadoop/dfs/files
env_file:
- ./hadoop.env
mongodb:
image: mongo
container_name: mongodb
ports:
- "27020:27017"
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
nodemanager:
image: anaid/hadoop-nodemanager:1.1
container_name: nodemanager
depends_on:
- namenode
- datanode
env_file:
- ./hadoop.env
historyserver:
image: anaid/hadoop-historyserver:1.1
container_name: historyserver
depends_on:
- namenode
- datanode
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
spark-master:
image: anaid/spark-master:1.1
container_name: spark-master
ports:
- "9090:8080"
- "7077:7077"
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
environment:
- "SPARK_LOCAL_IP=spark-master"
spark-worker-1:
image: anaid/spark-worker:1.1
container_name: spark-worker-1
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8083:8081"
spark-worker-2:
image: anaid/spark-worker:1.1
container_name: spark-worker-2
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8084:8081"
volumes:
hadoop_namenode:
hadoop_datanode1:
hadoop_namenode_files:
hadoop_namenode_files1:
hadoop_historyserver:
shiny_logs:
mongo-config:
然后我不得不在我闪亮的容器中安装 docker。我在 docker 文件上添加了命令。 我闪亮的 docker 文件是:
# get shiny serves plus tidyverse packages image
FROM rocker/shiny:3.6.1
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo
# Anaid added for V8 and sparklyr library
RUN apt-get install -y \
r-cran-xml \
openjdk-8-jdk \
libv8-dev \
libxml2 \
libxml2-dev \
libssl-dev \
libcurl4-openssl-dev \
libcairo2-dev \
libsasl2-dev \
libssl-dev \
vim
RUN sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
# For docker inside the container
# Add Docker’s official GPG key:
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
RUN sudo apt-key fingerprint 0EBFCD88
RUN sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
RUN sudo apt-get update
# Install the latest version of Docker Engine
RUN sudo apt-get install -y \
docker-ce \
docker-ce-cli \
containerd.io
RUN sudo apt-get install -y \
docker-ce=5:19.03.2~3-0~debian-stretch \
docker-ce-cli=5:19.03.2~3-0~debian-stretch \
containerd.io
# Download and install library. They are saved here /usr/local/lib/R/site-library
RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler', 'dbplyr', 'r2d3', 'base64enc', 'devtools', 'zeallot', 'digest', 'jsonlite', 'tibble', 'pkgconfig', 'rlang', 'DBI', 'cli', 'rstudioapi', 'yaml', 'arallel', 'withr', 'dplyr', 'httr_1.4.0', 'generics', 'htmlwidgets', 'vctrs', 'askpass', 'rprojroot', 'tidyselect', 'glue', 'forge', 'R6', 'fansi', 'purrr', 'magrittr', 'backports', 'htmltools', 'ellipsis', 'assertthat', 'config', 'utf8', 'openssl', 'crayon', 'shinydashboard', 'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"
RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"
RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"
## create directories
## RUN mkdir -p /myScripts
## copy files
## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R
## COPY /myScripts/packageList /myScripts/packageList
## install R-packages
## RUN Rscript /myScripts/installMissingPkgs.R
# copy the app to the image
COPY app.R /srv/shiny-server/
# select port
EXPOSE 3838
# allow permission
RUN sudo chown -R shiny:shiny /srv/shiny-server
# run app
CMD ["/usr/bin/shiny-server.sh"]
Using R system function and docker commands inside docker container
然后我在使用我的应用程序中的 R 系统函数时遇到了一些问题。这是错误:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir), :
running command 'docker exec -it namenode hdfs dfs -ls /' had status 1
我通过这些(Shiny 的容器)解决了这个问题:
sudo chmod 666 /var/run/docker.sock
然后,我在我的应用程序上添加了 USER=root:
system("USER=root")
system("docker exec namenode hdfs dfs -ls /", intern = TRUE)
我使用 system() 的简单应用程序的代码:
library(shiny)
library(tools)
library(stringi)
ui <- fluidPage(
h3(textOutput("system"))
)
server <- function(input, output, session) {
rv <- reactiveValues(syst = NULL)
observe({
# pwd
# docker ps working
system("USER=root")
rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) )
})
output$system <- renderText({
rv$syst
})
}
shinyApp(ui, server)
My shiny app running (using system)