Stirling pdf
- 手动搭建
- docker搭建
官网:https://github.com/Stirling-Tools/Stirling-PDF
手动搭建
Ubuntu2404环境
安装所需软件包
apt install -y git automake autoconf libtool libleptonica-dev pkg-config zlib1g-dev make g++ openjdk-21-jdk python3 python3-pip
克隆并构建 jbig2enc,如果拉取失败,可以去gitlab下载,再上传
mkdir ~/.git
cd ~/.git &&\
git clone https://github.com/agl/jbig2enc.git &&\
cd jbig2enc &&\
./autogen.sh &&\
./configure &&\
make &&\
sudo make install
安装 LibreOffice 用于转换,安装 ocrmypdf 用于 OCR,以及安装 opencv 用于模式识别功能
apt install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf pip3 install uno opencv-python-headless unoconv pngquant WeasyPrint --break-system-packages
克隆并构建 Stirling-PDF
cd ~/.git &&\
git clone https://github.com/Stirling-Tools/Stirling-PDF.git &&\
cd Stirling-PDF &&\
chmod +x ./gradlew &&\
./gradlew build
如果报超时:就手动下载/gradle-8.7-bin.zip,然后上传到Stirling-PDF,执行./gradlew build
Exception in thread “main” java.io.IOException: Downloading from https://services.gradle.org/distributions/gradle-8.7-bin.zip failed: timeout
构建过程完成后,目录.jar中将生成一个文件build/libs。可以将此文件移动到所需位置,例如/opt/Stirling-PDF/。还必须将下载的 Stirling-PDF 存储库中的脚本文件夹移动到此目录。使用 OpenCV 的 Python 脚本需要此文件夹
mkdir /opt/Stirling-PDF
mv ./build/libs/Stirling-PDF-*.jar /opt/Stirling-PDF/
mv scripts /opt/Stirling-PDF/
安装语言包
apt install -y 'tesseract-ocr-*'
运行 Stirling-PDF,两者皆可
./gradlew bootRun
java -jar /opt/Stirling-PDF/Stirling-PDF-*.jar
如果出现
[Thread-7] INFO s.s.SPDF.utils.ProcessExecutor - mkdir: cannot create directory ‘/run/user/1501’: Permission denied
则配置
mkdir temp
export DBUS_SESSION_BUS_ADDRESS="unix:path=./temp"
重新启动
java -jar ./Stirling-PDF-*.jar
界面访问:IP:8080
测试
可选:将 Stirling-PDF 作为服务运行
创建一个.env 文件,可以在其中存储环境变量
touch /opt/Stirling-PDF/.env
vim /etc/systemd/system/stirlingpdf.service
[Unit]
Description=Stirling-PDF service
After=syslog.target network.target
[Service]
SuccessExitStatus=143
User=root
Group=root
Type=simple
EnvironmentFile=/opt/Stirling-PDF/.env
WorkingDirectory=/opt/Stirling-PDF
ExecStart=/usr/bin/java -jar Stirling-PDF-0.17.2.jar
ExecStop=/bin/kill -15 $MAINPID
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start stirlingpdf.service
systemctl stop stirlingpdf.service
systemctl restart stirlingpdf.service
docker搭建
Ubuntu配置docker环境
apt -y install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
apt-get -y install docker-ce
设置开机自启
systemctl enable --now docker
拉取镜像
docker pull frooodle/s-pdf:latest
docker直接run
docker run -d \
-p 8080:8080 \
-v ./trainingData:/usr/share/tessdata \
-v ./extraConfigs:/configs \
-v ./logs:/logs \
-e DOCKER_ENABLE_SECURITY=false \
-e INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
-e LANGS=en_GB \
--name stirling-pdf \
frooodle/s-pdf:latest
或者compose也可
version: '3.3'
services:
stirling-pdf:
image: frooodle/s-pdf:latest
ports:
- '8080:8080'
volumes:
- ./trainingData:/usr/share/tessdata #Required for extra OCR languages
- ./extraConfigs:/configs
# - ./customFiles:/customFiles/
# - ./logs:/logs/
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_GB
界面直接访问:IP:8080