pom:
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>5.11.0</version>
</dependency>
部分代码:
package com.zy.datapickcli.sys.controller;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import java.io.File;
public class TempTest {
public static void main(String[] args) throws TesseractException {
File file = new File("D:\\1.png");
System.out.println(recognizeText(file));
}
public static String recognizeText(File imageFile) throws TesseractException {
Tesseract tesseract = new Tesseract();
// 设定训练文件的位置(如果是标准英文识别,此步可省略)
tesseract.setDatapath("D:\\tessdata");
tesseract.setLanguage("chi_sim");
return tesseract.doOCR(imageFile);
}
}
data文件下载地址
https://gitcode.com/tesseract-ocr/tessdata/tree/main
其余参考代码:
@Service
public class OcrService {
public String recognizeText(File imageFile) throws TesseractException {
Tesseract tesseract = new Tesseract();
// 设定训练文件的位置(如果是标准英文识别,此步可省略)
tesseract.setDatapath("你的tessdata各语言集合包地址");
tesseract.setLanguage("chi_sim");
return tesseract.doOCR(imageFile);
}
public String recognizeTextFromUrl(String imageUrl) throws Exception {
URL url = new URL(imageUrl);
InputStream in = url.openStream();
Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
File imageFile = new File("downloaded.jpg");
return recognizeText(imageFile);
}
}
执行效果: