在 Elastic Beanstalk 上安装 Tesseract 的最快方法
Fastest way to install Tesseract on Elastic Beanstalk
我目前在 AWS Elastic Beanstalk(64 位亚马逊 Linux 2016.03 v2.1.2 运行 上使用 Tika 从上传到我的 Rails 应用程序 运行 的文件中提取文本Ruby 2.2).我也想索引扫描图像,所以我需要安装 Tesseract。
我能够通过像这样从源代码安装它来让它工作,但是它增加了 10 分钟的时间来部署到一个新的实例。有更快的方法吗?
.ebextensions/02-tesseract.config
packages:
yum:
autoconf: []
automake: []
libtool: []
libpng-devel: []
libtiff-devel: []
zlib-devel: []
container_commands:
01-command:
command: mkdir -p install
cwd: /home/ec2-user
02-command:
command: cp .ebextensions/scripts/install_tesseract.sh /home/ec2-user/install/
03-command:
command: bash install/install_tesseract.sh
cwd: /home/ec2-user
.ebextensions/scripts/install_tesseract.sh
#!/usr/bin/env bash
cd_to_install () {
cd /home/ec2-user/install
}
cd_to () {
cd /home/ec2-user/install/
}
if ! [ -x "$(command -v tesseract)" ]; then
# Add `usr/local/bin` to PATH
echo 'pathmunge /usr/local/bin' > /etc/profile.d/usr_local.sh
chmod +x /etc/profile.d/usr_local.sh
# Install leptonica
cd_to_install
wget http://www.leptonica.org/source/leptonica-1.73.tar.gz
tar -zxvf leptonica-1.73.tar.gz
cd_to leptonica-1.73
./configure
make
make install
rm -rf /home/ec2-user/install/leptonica-1.73.tar.gz
rm -rf /home/ec2-user/install/leptonica-1.73
# Install tesseract ~ the jewel of Odin's treasure room
cd_to_install
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
cd_to tesseract-3.04.01
./autogen.sh
./configure
make
make install
ldconfig
rm -rf /home/ec2-user/install/3.04.01.tar.gz
rm -rf /home/ec2-user/install/tesseract-3.04.01
# Install tessdata
cd_to_install
wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
tar -zxvf 3.04.00.tar.gz
cp /home/ec2-user/install/tessdata-3.04.00/eng.* /usr/local/share/tessdata/
rm -rf /home/ec2-user/install/3.04.00.tar.gz
rm -rf /home/ec2-user/install/tessdata-3.04.00
fi
简答
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
长答案
我不熟悉非 Ubuntu 包管理器或 ebextensions,所以在 some digging, I found that there are precompiled binaries that can be installed on Amazon Linux in the stable EPEL repo.
之后
第一个障碍是弄清楚 how to use the EPEL repo。最简单的方法是在 yum
命令上使用 enablerepo
选项。
这让我们来到这里:
yum --enablerepo=epel install tesseract
接下来,我必须解决这个依赖错误:
[root@ip-10-0-1-193 ec2-user]# yum install --enablerepo=epel tesseract
Loaded plugins: priorities, update-motd, upgrade-helper
951 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package tesseract.x86_64 0:3.04.00-3.el6 will be installed
--> Processing Dependency: liblept.so.4()(64bit) for package: tesseract-3.04.00-3.el6.x86_64
--> Running transaction check
---> Package leptonica.x86_64 0:1.72-2.el6 will be installed
--> Processing Dependency: libwebp.so.5()(64bit) for package: leptonica-1.72-2.el6.x86_64
--> Finished Dependency Resolution
Error: Package: leptonica-1.72-2.el6.x86_64 (epel)
Requires: libwebp.so.5()(64bit)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
我找到了解决方案here
Just adding the epel repo doesn't solve it, as the packages in the
amzn-main repository seem to overrule those in the epel repository. If
the libwebp package in the amzn-main repo are excluded it should work
Tesseract 安装在 amzn-main
存储库中有一些依赖项。这就是为什么我首先安装 libwebp
和 --disablerepo=amzn-main
。
yum --enablerepo=epel --disablerepo=amzn-main install libwebp
yum --enablerepo=epel install tesseract
最后,您可以通过以下方式 install yum packages on Elastic Beanstalk with options:
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
幸运的是,这也是在 Elastic Beanstalk 上安装 Tesseract 最简单的方法!
我目前在 AWS Elastic Beanstalk(64 位亚马逊 Linux 2016.03 v2.1.2 运行 上使用 Tika 从上传到我的 Rails 应用程序 运行 的文件中提取文本Ruby 2.2).我也想索引扫描图像,所以我需要安装 Tesseract。
我能够通过像这样从源代码安装它来让它工作,但是它增加了 10 分钟的时间来部署到一个新的实例。有更快的方法吗?
.ebextensions/02-tesseract.config
packages:
yum:
autoconf: []
automake: []
libtool: []
libpng-devel: []
libtiff-devel: []
zlib-devel: []
container_commands:
01-command:
command: mkdir -p install
cwd: /home/ec2-user
02-command:
command: cp .ebextensions/scripts/install_tesseract.sh /home/ec2-user/install/
03-command:
command: bash install/install_tesseract.sh
cwd: /home/ec2-user
.ebextensions/scripts/install_tesseract.sh
#!/usr/bin/env bash
cd_to_install () {
cd /home/ec2-user/install
}
cd_to () {
cd /home/ec2-user/install/
}
if ! [ -x "$(command -v tesseract)" ]; then
# Add `usr/local/bin` to PATH
echo 'pathmunge /usr/local/bin' > /etc/profile.d/usr_local.sh
chmod +x /etc/profile.d/usr_local.sh
# Install leptonica
cd_to_install
wget http://www.leptonica.org/source/leptonica-1.73.tar.gz
tar -zxvf leptonica-1.73.tar.gz
cd_to leptonica-1.73
./configure
make
make install
rm -rf /home/ec2-user/install/leptonica-1.73.tar.gz
rm -rf /home/ec2-user/install/leptonica-1.73
# Install tesseract ~ the jewel of Odin's treasure room
cd_to_install
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
tar -zxvf 3.04.01.tar.gz
cd_to tesseract-3.04.01
./autogen.sh
./configure
make
make install
ldconfig
rm -rf /home/ec2-user/install/3.04.01.tar.gz
rm -rf /home/ec2-user/install/tesseract-3.04.01
# Install tessdata
cd_to_install
wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
tar -zxvf 3.04.00.tar.gz
cp /home/ec2-user/install/tessdata-3.04.00/eng.* /usr/local/share/tessdata/
rm -rf /home/ec2-user/install/3.04.00.tar.gz
rm -rf /home/ec2-user/install/tessdata-3.04.00
fi
简答
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
长答案
我不熟悉非 Ubuntu 包管理器或 ebextensions,所以在 some digging, I found that there are precompiled binaries that can be installed on Amazon Linux in the stable EPEL repo.
之后第一个障碍是弄清楚 how to use the EPEL repo。最简单的方法是在 yum
命令上使用 enablerepo
选项。
这让我们来到这里:
yum --enablerepo=epel install tesseract
接下来,我必须解决这个依赖错误:
[root@ip-10-0-1-193 ec2-user]# yum install --enablerepo=epel tesseract
Loaded plugins: priorities, update-motd, upgrade-helper
951 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package tesseract.x86_64 0:3.04.00-3.el6 will be installed
--> Processing Dependency: liblept.so.4()(64bit) for package: tesseract-3.04.00-3.el6.x86_64
--> Running transaction check
---> Package leptonica.x86_64 0:1.72-2.el6 will be installed
--> Processing Dependency: libwebp.so.5()(64bit) for package: leptonica-1.72-2.el6.x86_64
--> Finished Dependency Resolution
Error: Package: leptonica-1.72-2.el6.x86_64 (epel)
Requires: libwebp.so.5()(64bit)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
我找到了解决方案here
Just adding the epel repo doesn't solve it, as the packages in the amzn-main repository seem to overrule those in the epel repository. If the libwebp package in the amzn-main repo are excluded it should work
Tesseract 安装在 amzn-main
存储库中有一些依赖项。这就是为什么我首先安装 libwebp
和 --disablerepo=amzn-main
。
yum --enablerepo=epel --disablerepo=amzn-main install libwebp
yum --enablerepo=epel install tesseract
最后,您可以通过以下方式 install yum packages on Elastic Beanstalk with options:
.ebextensions/02-tesseract.config
commands:
01-libwebp:
command: "yum --enablerepo=epel --disablerepo=amzn-main -y install libwebp"
02-tesseract:
command: "yum --enablerepo=epel -y install tesseract"
幸运的是,这也是在 Elastic Beanstalk 上安装 Tesseract 最简单的方法!