Click here for the DARPA Information Processing Techniques Office (IPTO) home page. Thrust Areas Programs Solicitations Personnel Home
Programs

Programs Home

Overview

bullet Mission

Technical Program

bullet Approach
bullet Objectives

Solicitation

bullet BAA 07-38 (Closed)



Multilingual Automatic Document Classification Analysis and Translation (MADCAT)

Program Manager: Dr. Joseph Olive

Objectives:

MADCAT will develop and apply computer software technologies to classify, analyze and interpret analog Arabic text images. Automatic processing "engines" will convert images from hard-copy documents, PDF images or camera-captured pictures (e.g. signs, graffiti, or televised text images) to English text for use by English-speaking military personnel and analysts. MADCAT will develop an integrated system capable of:
  • Accepting an image,
  • Analyzing the image to determine the language and type of script,
  • Classifying the image to determine the kind of material that is being presented (a picture, a newspaper article, a technical memo, a ledger, etc.),
  • Segmenting the image and interpreting the different text zones and finally,
  • Producing an accurate English translation of the source language text.
MADCAT engines must be able to process documents/images of all of the following types: newspapers, letters, scientific articles, memos, ledgers, maps, diagrams, graffiti, and signs.

MADCAT engines must be:
  • Robust, scalable, and portable,
  • Able to deal with the full range of source data described above,
  • Adaptable to different media and languages (not point solutions specialized to particular languages, scripts or media),
  • Domain-independent, and
  • Demonstrably language-independent.


Click here to visit the DARPA website.|   Search   |   Contact Us   |   Contact DARPA   |   Privacy and Security Notice   |   Webmaster

gradient
Thrust Areas   |   Programs   |   Solicitations   |   Personnel   |   Home