MacOS OCR and translate in console

Had received a purple envelope from Belastingdienst.nl with some important information of the 30%, so needed to get this into English.

Made for a good reason to figure this process out.


So it looks like Tesseract-OCR is the best Open Source OCR package I could find and the whole project is at https://github.com/tesseract-ocr/tesseract and it credits Google as the developer at https://en.wikipedia.org/wiki/Tesseract_(software)

For the translation we will use the Translate Shell, found at https://www.soimort.org/translate-shell/. This lovely little script supports a number of translation engines, so one of them should work for you.

1. Scan the letter.



Many options here, so not going to waste time on how to get this done. Just if you have a text option and go for 300DPI. We need TIFF on the other end, so if you can get TIFF, just do that.

2. Install the software

brew install tesseract

brew install translate-shell


3. Run the software

tesseract input.tiff output

trans -i output.txt