如何使用 Amazon AMI 在 Amazon AWS EC2 或 EMR 上安装 GUI

How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

我需要 运行 需要 GUI 界面来启动和配置的应用程序。我还需要能够 运行 这个应用程序在 Amazon 的 EC2 服务和 EMR 服务上。 EMR 要求意味着它必须 运行 在 Amazon 的 Linux AMI 上。

经过大量搜索后,我无法找到任何现成的解决方案,尤其是亚马逊 AMI 上 运行 的要求。最接近的匹配和最常引用的解决方案是 here。不幸的是,它是在 RHEL6 实例上开发的,该实例与亚马逊的 AMI 差异很大,因此该解决方案不起作用。

我在下面发布了我的解决方案。希望它能使其他一些人免于花费数小时的实验来想出正确的配方。

这是我在 Amazon 的 AMI 上获得 GUI 运行ning 的解决方案。我使用此 post 作为起点,但必须进行许多更改才能使其在 Amazon 的 AMI 上运行。我还添加了额外的信息,使这项工作以合理的自动化方式进行,因此需要不止一次启动此环境的个人可以轻松完成。

注意:我在这个 post 中包含了很多评论。我提前道歉,但我认为这可能对需要进行修改的人有所帮助,如果他们能够理解为什么一路做出各种选择。

下面包含的脚本会一路安装一些文件。有关这些脚本使用的文件列表和目录结构,请参阅第 4 节。

步骤 1. 安装桌面

执行 'yum update' 后,大多数解决方案都包含一行

sudo yum groupinstall -y "Desktop"

这个看似简单的步骤需要在 Amazon AMI 上付出更多的努力。该组未在 Amazon AMI(从这里开始的 AAMI)中配置。 AAMI 默认安装并启用了 Amazon 自己的存储库。还安装了 epel 存储库,但默认情况下它是禁用的。启用 epel 后,我找到了桌面组,但它没有填充包。我还找到了已填充的 Xfce(另一种桌面替代方案)。最终我决定安装 Xfce 而不是 Desktop。尽管如此,这并不是直截了当的,但它最终导致了解决方案。

这里值得注意的是,我尝试的第一件事是安装 centos 存储库并从那里安装桌面组。最初这似乎很有希望。该组充满了包裹。然而,经过一些努力,我最终决定在 AAMI 上已经安装的依赖项和包之间存在太多版本冲突。

这让我从 epel 仓库中选择了 Xfce。由于 epel 存储库已经安装在 AAMI 上,我认为与 Amazon 存储库的依赖版本协调会更好。这通常是正确的。在 epel 存储库或 Amazon 存储库中发现了许多依赖项。对于那些不是的,我能够在 centos 存储库中找到它们,并且在大多数情况下,它们是叶依赖项。因此,大部分问题来自 centos 仓库中的少数依赖项,这些依赖项具有与 amazon 或 epel 仓库冲突的子依赖项。最后,需要一些 hack 来绕过依赖冲突。我试图尽可能地减少这些。这是安装 Xfce

的脚本

installGui.sh

#!/bin/bash

# echo each command
set -x

# assumes RSRC_DIR and IS_EMR set by parent script
YUM_RSRC_DIR=$RSRC_DIR/yum

sudo yum -y update

# Most info I've found on installing a GUI on AWS suggests to install using
#> sudo yum groupinstall -y "Desktop"
# This group is not available by default on the Amazon Linux AMI.  The group
# is listed if the epel repo is enabled, but it is empty.  I tried installing
# the centos repo, which does have support for this group, but it simply end
# up having to many dependency version conflicts with packages already installed
# by the Amazon repos.
#
# I found the path of least resistance to be installing the group Xfce from
# the epel repo. The epel repo is already included in amazon image, just not enabled.
# So I'm guessing there was at least some consideration by Amazon to align
# the dependency versions of this repo with the Amazon repos.
#
# My general approach to this problem was to start with the last command:
#> sudo yum groupinstall -y Xfce
# which will generate a list of missing dependencies.  The script below
# essentially works backwards through that list to eliminate all the
# missing dependencies.
#
# In general, many of the dependencies required by Xfce are found in either
# the epel repo or the Amazon repos.  Most of the remaining dependencies can be
# found in the centos repo, and either don't have any further dependencies, or if they
# do those dependencies are satisfied with the centos repo with no collisions
# in the epel or amazon repo.  Then there are a couple of oddball dependencies
# to clean up.

# if yum-config-manager is not found then install yum-utils
#> sudo yum install yum-utils
sudo yum-config-manager --enable epel

# install centos repo
# place the repo config @  /etc/yum.repos.d/centos.repo
sudo cp $YUM_RSRC_DIR/yum.repos.d/centos.repo /etc/yum.repos.d/

# The config centos.repo specifies the key with a URL.  If for some reason the key
# must be in a local file, it can be found here: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
# It can be installed to the right location in one step:
#> wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
# Note, a key file must also be installed in the system key ring.  The docs are a bit confusing
# on this, I found that I needed to run both gpg AND then followed by rpm, eg:
#> sudo gpg --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
#> sudo rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

# I found there are a lot of version conflicts between the centos, Amazon and epel repos.
# So I did not enable the centos repo generally.  Instead I used the --enablerepo switch
# enable it explicitly for each yum command that required it.  This only works for yum.  If
# rpm must be used, then yum-config-manager must be used to enable/disable repos as a
# separate step.
#
# Another problem I ran into was yum installing the 32-bit (*.i686) package rather than
# the 64-bit (*.x86_64) verision of the package.  I never figured out why.  So I had
# to specify the *.x86_64 package explicitly.  The search tools (eg. 'whatprovides')
# did not list the 64 bit package either even though a manual search through the
# package showed the 64 bit components were present.
#
# Sometimes it is difficult to determine which package must be in installed to satisfy
# a particular dependency.  'whatprovides' is a very useful tool for this
#> yum --enablerepo centos whatprovides libgdk_pixbuf-2.0.so.0
#> rpm -q --whatprovides libgdk_pixbuf

sudo yum --enablerepo centos install -y gdk-pixbuf2.x86_64
sudo yum --enablerepo centos install -y gtk2.x86_64
sudo yum --enablerepo centos install -y libnotify.x86_64
sudo yum --enablerepo centos install -y gnome-icon-theme
sudo yum --enablerepo centos install -y redhat-menus
sudo yum --enablerepo centos install -y gstreamer-plugins-base.x86_64

# problem when we get to libvte, installing libvte requires expat, which conflicts with amazon lib
# the centos package version was older and did not install right lib version
# but … the expat dependency was coming from a dependency on python-libs.
# the easiest workaround was to install python using the amazon repo, that in turn
# installs a version of python libs that is compatible with the version of libexpat on the system.

sudo yum install -y python
sudo yum --enablerepo centos install -y vte.x86_64

sudo yum --enablerepo centos install -y libical.x86_64
sudo yum --enablerepo centos install -y gnome-keyring.x86_64

# another sticky point, xfdesktop requires desktop-backgrounds-basic, but ‘whatprovides’ does not 
# provide any packages for this query (not sure why).  It turns out this is provided by the centos 
# repo, installing ‘desktop-backgrounds-basic’ will try to install the package redhat-logos, but 
# unfortunately this is obsoleted by Amazon’s generic-logos package
# The only way I could find to get around this was to erase the generic logos package.
# This doesn't seem too risky since this is just images for the desktop and menus.
#
sudo yum erase -y generic-logos

# Amazon repo must be disabled to prevent interference with the install
# of redhat-logos
sudo yum --disablerepo amzn-main --enablerepo centos install -y redhat-logos

# next problem is a dependency on dbus.  The dependency comes from dbus-x11 in 
# centos repo.  It requires dbus version 1.2.24, the amazon image already has
# version 1.6.12 installed.  Since the dbus-x11 is only used by the GUI package,
# easiest way around this is to install dbus-x11 with no dependency checks.
# So it will use the newer version of dbus (should be OK).  The main thing that could be a problem
# here is if it skips some other dependency.  When doing manually, its possible to run the install until
# the only error left is the dbus dependency.  It’s a bit risky running in a script since, basically it’s assuming
# all the dependencies are already in place.
yumdownloader --enablerepo centos dbus-x11.x86_64
sudo rpm -ivh --nodeps dbus-x11-1.2.24-8.el6_6.x86_64.rpm
rm dbus-x11-1.2.24-8.el6_6.x86_64.rpm

sudo yum install -y xfdesktop.x86_64

# We need the version of poppler-glib from centos repo, but it is found in several repos.
# Disable the other repos for this step.
# On EMR systems a newer version of poppler is already installed.  So move up 1 level
# in dependency chain and force install of tumbler.

if [ $IS_EMR -eq 1 ]
then
    yumdownloader --enablerepo centos tumbler.x86_64
    sudo rpm -ivh --nodeps tumbler-0.1.21-1.el6.x86_64.rpm
else
    sudo yum --disablerepo amzn-main --disablerepo amzn-updates --disablerepo epel --enablerepo centos install -y poppler-glib
fi


sudo yum install  --enablerepo centos -y polkit-gnome.x86_64
sudo yum install  --enablerepo centos  -y control-center-filesystem.x86_64

sudo yum groupinstall -y Xfce

以下是 centos 存储库配置文件的内容:

centos.repo

[centos]
name=CentOS mirror
baseurl=http://repo1.ash.innoscale.net/centos/6/os/x86_64/
failovermethod=priority
enabled=0
gpgcheck=1
gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6

如果您只需要一个在 Amazon AMI 上安装桌面程序包的方法,那么您就完成了。 post 的其余部分介绍了如何配置 VNC 以通过 SSH 隧道访问桌面,以及如何打包所有这些以便可以轻松地从脚本启动实例。

步骤 2. 安装和配置 VNC

下面是我用于安装 GUI 的顶级脚本。配置几个变量后,它做的第一件事就是调用上面第 1 步中的脚本。这个脚本有一些额外的包袱,因为我将它构建为在常规 ec2 实例、emr 和 root 或 ec2-user 上工作。基本步骤是

  1. 安装 libXfont
  2. 安装 tiger-vnc-server
  3. 安装 VNC 服务器配置文件
  4. 在用户主目录中创建一个.vnc 目录
  5. 在 .vnc 目录中安装 xstartup 文件
  6. 在 .vnc 目录中安装一个虚拟密码文件
  7. 启动 VNC 服务器

需要注意的几个要点:

这里假定您将通过 SSH 隧道访问 VNC 服务器。最后,这似乎真的是最简单、最可靠的安全方法。由于您可能在安全组规范中为 SSH 打开了一个端口,因此您不必对其进行任何更改。此外,VNC clients/servers 的加密配置并不简单。似乎很容易犯错并使您的通信未加密。此设置位于 vncservers 文件中。 -localhost 开关告诉 vnc 只接受本地连接。 '-nolisten tcp' 告诉关联的 xserver 模块也不接受来自网络的连接。最后,“-SecurityTypes None”开关允许您在不输入密码的情况下打开 VNC 会话,因为进入机器的唯一途径是通过 ssh,额外的密码检查似乎是多余的。

xstartup 文件确定第一次启动 VNC 会话时将启动的内容。我注意到许多关于此主题的 post 都跳过了这一点。如果你不告诉它启动 Xfce 桌面,你将在启动 VNC 时得到一个空白 window。我这里的配置非常简单。

即使我在上面提到 VNC 服务器配置为不提示输入密码,它仍然需要 .vnc 目录中的 passwd 文件才能启动服务器。第一次 运行 脚本会在尝试启动服务器时失败。通过 ssh 和 运行 'vncpasswd' 登录机器。它将在 .vnc 目录中创建一个 passwd 文件,您可以保存该文件以在安装期间用作这些脚本的一部分。请注意,我读过 VNC 不会做任何复杂的事情来保护 passwd 文件。所以我不建议使用您用于其他更重要帐户的密码。

installGui.sh

#!/bin/bash

# echo each command
set -x

BIN_DIR="${BASH_SOURCE%/*}"
ROOT_DIR=$(dirname $BIN_DIR)
RSRC_DIR=$ROOT_DIR/rsrc
VNC_DIR=$RSRC_DIR/vnc

# Install user config files into ec2-user home directory
# if it is available.  In practice, this should always
# be true

if [ -d "/home/ec2-user" ]
then
   USER_ACCT=ec2-user
else
   USER_ACCT=hadoop
fi

HOME_DIR="/home"

# Use existence of hadoop home directory as proxy to determine if
# this is an EMR system.  Can be used later to differentiate
# steps on EC2 system vs EMR.
if [ -d "/home/hadoop" ]
then
    IS_EMR=1
else
    IS_EMR=0
fi


# execute Xfce desktop install
. "$BIN_DIR/installXfce.sh"

# now roughly follow the following from step 3: https://devopscube.com/setup-gui-for-amazon-ec2-linux/

sudo yum install -y pixman pixman-devel libXfont

sudo yum -y install tigervnc-server


# install the user account configuration file.
# This setup assumes the user will always connect to the VNC server
# through an SSH tunnel.  This is generally more secure, easier to
# configure and easier to get correct than trying to allow direct
# connections via TCP.
# Therefore, config VNC server to only accept local connections, and
# no password required.
sudo cp $VNC_DIR/vncservers-$USER_ACCT /etc/sysconfig/vncservers

# install the user account, vnc config files

sudo mkdir $HOME_DIR/$USER_ACCT/.vnc
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc

# need xstartup file to tell vncserver to start the window manager
sudo cp $VNC_DIR/xstartup $HOME_DIR/$USER_ACCT/.vnc/
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc/xstartup

# Even though the VNC server is config'd to not require a passwd, the
# server still looks for the passwd file when it starts the session.
# It will fail if the passwd file is not found.
# The first time these scripts are run, the final step will fail.
# Then manually run
#> vncpasswd
# It will create the file ~/.vnc/passwd.  Then save this file to persistent
# storage so that it can be installed to the user account during
# server initialization.

sudo cp $ROOT_DIR/home/user/.vnc/passwd $HOME_DIR/$USER_ACCT/.vnc/
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc/passwd

# This script will be running as root if called from the EC2 launch
# command.  VNC server needs to be started as the user that
# you will connect to the server as (eg. ec2-user, hadoop, etc.)
sudo su -c "sudo service vncserver start" -s /bin/sh $USER_ACCT

# how to stop vncserver
# vncserver -kill :1

# On the remote client
# 1. start the ssh tunner
#> ssh -i ~/.ssh/<YOUR_KEY_FILE>.pem -L 5901:localhost:5901 -N ec2-user@<YOUR_SERVER_PUBLIC_IP>
#    for debugging connection use -vvv switch
# 2. connect to the vnc server using client on the remote machine.  When
#    prompted for the IP address, use 'localhost:5901'
#    This connects to port 5901 on your local machine, which is where the ssh
#    tunnel is listening.

vnc 服务器

# The VNCSERVERS variable is a list of display:user pairs.
#
# Uncomment the lines below to start a VNC server on display :2
# as my 'myusername' (adjust this to your own).  You will also
# need to set a VNC password; run 'man vncpasswd' to see how
# to do that.  
#
# DO NOT RUN THIS SERVICE if your local area network is
# untrusted!  For a secure way of using VNC, see this URL:
# http://kbase.redhat.com/faq/docs/DOC-7028

# Use "-nolisten tcp" to prevent X connections to your VNC server via TCP.

# Use "-localhost" to prevent remote VNC clients connecting except when
# doing so through a secure tunnel.  See the "-via" option in the
# `man vncviewer' manual page.

# Use "-SecurityTypes None" to allow session login without a password.
# This should only be used in combination with "-localhost"
# Note: VNC server still looks for the passwd file in ~/.vnc directory
# when the session starts regardless of whether the user is
# required to enter a passwd.

# VNCSERVERS="2:myusername"
# VNCSERVERARGS[2]="-geometry 800x600 -nolisten tcp -localhost"
VNCSERVERS="1:ec2-user"
VNCSERVERARGS[1]="-geometry 1280x1024 -nolisten tcp -localhost -SecurityTypes None"

x启动

#!/bin/sh

unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
# exec /etc/X11/xinit/xinitrc
/usr/share/vte/termcap/xterm &
/usr/bin/startxfce4 &

第 3 步。连接到您的实例

在 EC2 上安装 VNC 服务器 运行ning 后,您可以尝试连接到它。首先打开到您的实例的 SSH 隧道。 5901 是 VNC 服务器从 vncservers 文件中侦听显示 1 的端口。它将在端口 5902 等上侦听显示 2。此命令创建从本地计算机上的端口 5901 到实例上的端口 5901 的隧道。

ssh -i ~/.ssh/<YOUR_KEY_FILE>.pem -L 5901:localhost:5901 -N ec2-user@<YOUR_SERVER_PUBLIC_IP>

现在打开您首选的 VNC 客户端。在提示输入服务器 IP 地址的位置输入:

localhost:5901

如果什么都没有发生,则要么是启动 vnc 服务器有问题,要么是连接问题导致客户端无法访问服务器,或者 vncservers 配置文件可能有问题

如果 window 出现,但它只是空白,然后检查 Xfce 安装是否成功完成并安装了 xstartup 文件。

步骤 4. 简化

如果您只需要这样做一次,那么将脚本通过 sftp 传输到您的实例并手动 运行ning 就可以了。否则,当您确实需要使用 GUI 启动实例时,您将希望尽可能自动化,以使其更快、更不容易出错。

自动化的第一步是创建一个 EFS 卷,其中包含可以在实例启动时装载的脚本和配置文件。 Amazon 在创建网络文件系统方面有很多 info。创建卷时需要注意几点。如果您不希望您的卷向全世界开放,您可能需要创建一个自定义安全组以用于您的 EFS 卷。我为我的 EFS 卷(称之为 NFS_Mount)创建了安全组,它只允许端口 2049 上来自我的其他安全组之一的入站 TCP 流量,称之为 MasterVNC。然后在创建实例时,确保将 MasterVNC 安全组与该实例相关联。否则 EFS 卷将不允许您的实例与之连接。

现在挂载 EFS 卷:

sudo mkdir /mnt/YOUR_MOUNT_POINT_DIR
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-YOUR_EFS_ID.efs.us-east-1.amazonaws.com:/ /mnt/YOUR_MOUNT_POINT_DIR

现在使用以下目录结构使用步骤 1 和 2 中提到的 6 个文件填充 /mnt/YOUR_MOUNT_POINT_DIR。回想一下,第一次必须使用命令 'vncpasswd' 创建 passwd 文件。它将在 ~/.vnc/passwd.

创建文件

/mnt/YOUR_MOUNT_POINT_DIR/bin/installGui.sh /mnt/YOUR_MOUNT_POINT_DIR/bin/installXfce.sh

/mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/vncservers-ec2-user /mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/xstartup /mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/passwd

/mnt/YOUR_MOUNT_POINT_DIR/rsrc/yum/yum.repos.d/centos.repo

此时,使用 GUI 设置实例应该非常容易。像往常一样创建您的实例(确保包括 MasterVNC 安全组)、ssh 到实例、安装 EFS 卷和 运行 installGui.sh 脚本。

第 5 步。自动化

您可以更进一步,使用本地计算机上的 AWS CLI 工具一步启动您的实例。为此,您需要使用 AWS CLI 命令的参数安装 EFS 卷和 运行 installGui.sh 脚本。这只需要创建一个顶级脚本并将其传递给 CLI 命令。

当然有一些并发症。 EC2 和 EMR 使用不同的开关和机制来附加脚本。此外,在 EMR 上,我只想将 GUI 安装在主节点(而不是核心或任务节点)上。

启动 EC2 实例需要使用 --user-data 开关将脚本嵌入到命令中。通过在本地计算机上指定脚本文件的绝对路径可以轻松完成此操作。

aws ec2 run-instances --user-data file:///PATH_TO_YOUR_SCRIPT/top.sh  ... other options

EMR 启动不支持从本地文件嵌入脚本。相反,您可以在 bootstrap 操作中指定一个 S3 URI。

aws emr create-cluster --bootstrap-actions '[{"Path":"s3://YOUR_BUCKET/YOUR_DIR/top.sh","Name":"Custom action"}]' ... other options

最后,您会在下面的 top.sh 中看到大部分脚本是用于确定机器是基本 EC2 实例还是 EMR 主机的函数。如果不是这样,脚本可能是 3 行。您可能想知道为什么不直接使用内置的 'run-if' bootstrap 操作而不是编写我自己的函数。内置 'run-if' 脚本有一个错误,无法正确地 运行 位于 S3 中的脚本。

一旦将它们放入 init 序列中,调试它们可能是一个挑战。可以提供帮助的一件事是日志文件:/var/log/cloud-init-output.log。这会捕获 bootstrap 初始化期间脚本 运行 的所有控制台输出。

top.sh

#!/bin/bash

# note: conditional bootstrap function run-if has a bug, workaround ...
# this function adapted from https://forums.aws.amazon.com/thread.jspa?threadID=222418
# Determine if we are running on the master node.
# 0 - running on master, or non EMR node
# 1 - running on a task or core node

check_if_master_or_non_emr() {
    python - <<'__SCRIPT__'
import sys
import json

instance_file = "/mnt/var/lib/info/instance.json"

try:
    with open(instance_file) as f:
        props = json.load(f)
    is_master_or_non_emr = props.get('isMaster', False)

except IOError as ex:
    is_master_or_non_emr = True   # file will not exist when testing on a non-emr machine

if is_master_or_non_emr:
    sys.exit(1)
else:
    sys.exit(0)
__SCRIPT__
}

check_if_master_or_non_emr
IS_MASTER_OR_NON_EMR=$?

# If this machine is part of EMR cluster, then ONLY install on the MASTER node

if [ $IS_MASTER_OR_NON_EMR -eq 1 ]
then
    sudo mkdir /mnt/YOUR_MOUNT_POINT_DIR

    sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-YOUR_EFS_ID.efs.us-east-1.amazonaws.com:/ /mnt/YOUR_MOUNT_POINT_DIR

    . /mnt/YOUR_MOUNT_POINT_DIR/bin/installGui.sh
fi

exit 0