1. Download
1. Go to the official website download page
https://sourceforge.net/projects/tess4j/
2. Click download
3. Unzip after downloading, the directory is as follows, the three circled folders are needed
2. Use Tess4J
1. Import the packages under dist and lib into the java project
2. Copy the tessdata folder into the root directory of the project
3. The demo code is as follows
public class OCRDemo { public static void main(String[] args) { try { double start=System.currentTimeMillis(); File imageFile = new File("C:\\Users\\dan\\Desktop\\12345.png"); // Image location ITesseract instance = new Tesseract(); // instance.setDatapath(""); // Set tessdata location instance.setLanguage("chi_sim"); // Select font file String result = instance.doOCR(imageFile); // Start to recognize double end= System.currentTimeMillis(); System.out.println(result); // Print the picture content System.out.println("time-consuming"+(end-start)/1000+"s" ); } catch (TesseractException e) { e.printStackTrace (); } } }
Precautions:
①If tessdata is not placed in the root directory, be sure to set the location of teedata
instance.setDatapath(""); // Set tessdata location
② There is no need to write a suffix to select the font file. The Chinese package chi_sim may not be included in the default tessdata package, and you need to download it yourself
https://github.com/tesseract-ocr/tessdata
3. Operation results
The recognition rate of the official font library is still low. If you have high precision requirements, you need to train the font library yourself.