Programs Home
Overview

Technical Program

Solicitation


|
 |
 |
 |
Multilingual Automatic Document Classification Analysis and Translation (MADCAT)
Program Manager: Dr. Joseph Olive
Objectives:
MADCAT will develop and apply computer software technologies to classify, analyze and interpret analog Arabic text images. Automatic processing "engines" will convert images from hard-copy documents, PDF images or camera-captured pictures (e.g. signs, graffiti, or televised text images) to English text for use by English-speaking military personnel and analysts. MADCAT will develop an integrated system capable of:
- Accepting an image,
- Analyzing the image to determine the language and type of script,
- Classifying the image to determine the kind of material that is being presented (a picture, a newspaper article, a technical memo, a ledger, etc.),
- Segmenting the image and interpreting the different text zones and finally,
- Producing an accurate English translation of the source language text.
MADCAT engines must be able to process documents/images of all of the following types: newspapers, letters, scientific articles, memos, ledgers, maps, diagrams, graffiti, and signs.
MADCAT engines must be:
- Robust, scalable, and portable,
- Able to deal with the full range of source data described above,
- Adaptable to different media and languages (not point solutions specialized to particular languages, scripts or media),
- Domain-independent, and
- Demonstrably language-independent.

|