A System and Method for Image-To-Text ConversionTechnology #ua18-094
Questions about this technology? Ask a Technology Manager
Title: A System and Method for Image-to-Text Conversion
Invention: This technology is a software for converting scanned, printed and handwritten text into Unicode text. Composed of machine learning algorithms and image processing capabilities, this system is an improved optical character recognition (OCR) technology.
Background: OCR technologies provide a simpler alternative to entering metadata manually, revolutionizing document management and transforming many offices into paperless work spaces by digitizing documents. Current OCR algorithms are design for modern layouts meaning limitations exist in converting older documents, which are difficult to convert due to the difference in layout structures. Additionally, OCR software lacks language support for Pashto and traditional Chinese. The technology presented here was developed with the intention of annotating large library collections into accessible formats; particularly the Pashto documents whose originals were destroyed in a fire in Kabul.
- Document preservation
- Transport and logistics
- Annotates small-large documents
- Provides additional language support (Pashto, Traditional Chinese)
- Converts existing documents into unicode text