Data Science and Machine Learning projects

OCR pipeline
    Implementation of a generic and configurable OCR web service for text extraction from scanned documents.
    Configurable image processing based preprocessing modules, configurable modules for different OCR engines, Language sensitive word correction post processing modules.
Garnishment document classification and enrichment
    Testing different enrichment models and perform different tests.
    Implementation of Deep learning techniques for document classification and named-entity recognition.
    Dockerizing, Integration and testing web services.
Stamp recognition and information extraction
    Specific stamp recognition from scanned documents and date and time extraction.
    Implemented with Keras, scikit-learn, OpenCV.
    Managed to achieve 83% accuracy.