Data Science and Machine Learning projects

OCR pipeline

Implementation of a generic and configurable OCR web service for text extraction from scanned documents.

Configurable image processing based preprocessing modules, configurable modules for different OCR engines, Language sensitive word correction post processing modules.Garnishment document classification and enrichment

Testing different enrichment models and perform different tests.

Implementation of Deep learning techniques for document classification and named-entity recognition.

Dockerizing, Integration and testing web services.Stamp recognition and information extraction

Specific stamp recognition from scanned documents and date and time extraction.

Implemented with Keras, scikit-learn, OpenCV.

Managed to achieve 83% accuracy.

Share on

Twitter Facebook LinkedIn

Arnab Dey

Share on