在 Elastic Beanstalk 上安装 Tesseract 的最快方法

Fastest way to install Tesseract on Elastic Beanstalk

我目前在 AWS Elastic Beanstalk(64 位亚马逊 Linux 2016.03 v2.1.2 运行 上使用 Tika 从上传到我的 Rails 应用程序 运行 的文件中提取文本Ruby 2.2).我也想索引扫描图像,所以我需要安装 Tesseract。

我能够通过像这样从源代码安装它来让它工作,但是它增加了 10 分钟的时间来部署到一个新的实例。有更快的方法吗?

.ebextensions/02-tesseract.config

packages:
  yum:
    autoconf: []
    automake: []
    libtool: []
    libpng-devel: []
    libtiff-devel: []
    zlib-devel: []

container_commands:
  01-command:
    command: mkdir -p install
    cwd: /home/ec2-user
  02-command:
    command: cp .ebextensions/scripts/install_tesseract.sh /home/ec2-user/install/
  03-command:
    command: bash install/install_tesseract.sh
    cwd: /home/ec2-user

.ebextensions/scripts/install_tesseract.sh

#!/usr/bin/env bash

cd_to_install () {
  cd /home/ec2-user/install
}

cd_to () {
  cd /home/ec2-user/install/
}

if ! [ -x "$(command -v tesseract)" ]; then
  # Add `usr/local/bin` to PATH
  echo 'pathmunge /usr/local/bin' > /etc/profile.d/usr_local.sh
  chmod +x /etc/profile.d/usr_local.sh

  # Install leptonica
  cd_to_install
  wget http://www.leptonica.org/source/leptonica-1.73.tar.gz
  tar -zxvf leptonica-1.73.tar.gz
  cd_to leptonica-1.73
  ./configure
  make
  make install
  rm -rf /home/ec2-user/install/leptonica-1.73.tar.gz
  rm -rf /home/ec2-user/install/leptonica-1.73

  # Install tesseract ~ the jewel of Odin's treasure room
  cd_to_install
  wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
  tar -zxvf 3.04.01.tar.gz
  cd_to tesseract-3.04.01
  ./autogen.sh
  ./configure
  make
  make install
  ldconfig
  rm -rf /home/ec2-user/install/3.04.01.tar.gz
  rm -rf /home/ec2-user/install/tesseract-3.04.01

  # Install tessdata
  cd_to_install
  wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
  tar -zxvf 3.04.00.tar.gz
  cp /home/ec2-user/install/tessdata-3.04.00/eng.* /usr/local/share/tessdata/
  rm -rf /home/ec2-user/install/3.04.00.tar.gz
  rm -rf /home/ec2-user/install/tessdata-3.04.00
fi

简答

.ebextensions/02-tesseract.config

commands:
  01-libwebp:
    command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
  02-tesseract:
    command: "yum --enablerepo=epel -y install tesseract"

长答案

我不熟悉非 Ubuntu 包管理器或 ebextensions,所以在 some digging, I found that there are precompiled binaries that can be installed on Amazon Linux in the stable EPEL repo.

之后

第一个障碍是弄清楚 how to use the EPEL repo。最简单的方法是在 yum 命令上使用 enablerepo 选项。

这让我们来到这里:

yum --enablerepo=epel install tesseract

接下来,我必须解决这个依赖错误:

[root@ip-10-0-1-193 ec2-user]# yum install --enablerepo=epel tesseract
Loaded plugins: priorities, update-motd, upgrade-helper
951 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package tesseract.x86_64 0:3.04.00-3.el6 will be installed
--> Processing Dependency: liblept.so.4()(64bit) for package: tesseract-3.04.00-3.el6.x86_64
--> Running transaction check
---> Package leptonica.x86_64 0:1.72-2.el6 will be installed
--> Processing Dependency: libwebp.so.5()(64bit) for package: leptonica-1.72-2.el6.x86_64
--> Finished Dependency Resolution
Error: Package: leptonica-1.72-2.el6.x86_64 (epel)
           Requires: libwebp.so.5()(64bit)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

我找到了解决方案here

Just adding the epel repo doesn't solve it, as the packages in the amzn-main repository seem to overrule those in the epel repository. If the libwebp package in the amzn-main repo are excluded it should work

Tesseract 安装在 amzn-main 存储库中有一些依赖项。这就是为什么我首先安装 libwebp--disablerepo=amzn-main

yum --enablerepo=epel --disablerepo=amzn-main install libwebp
yum --enablerepo=epel install tesseract

最后,您可以通过以下方式 install yum packages on Elastic Beanstalk with options:

.ebextensions/02-tesseract.config

commands:
  01-libwebp:
    command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
  02-tesseract:
    command: "yum --enablerepo=epel -y install tesseract"

幸运的是,这也是在 Elastic Beanstalk 上安装 Tesseract 最简单的方法!