Total Information Awareness Slide 1 Good afternoon, My name is Brian Sharkey, the Deputy Director of ISO. Today I would like to present to you some concepts in one of the ISO major thrust areas we call Total Information Awareness. Slide 2 In the ISO overview, Dr. Mularie presented aspects of warfare termed force on force and asymmetric. This chart represents the source of the warfare types. Here we've called force-on-force by the title "symmetric". Symmetric warfare refers to a more traditional conflict that usually involves a nation state and large force structures. The Desert War was an example of Symmetric Warfare. Asymmetric warfare refers to warfare activities with fewer and less easily specified objectives and usually involves smaller numbers of actors and /or force participants, using unconventional tactics that often have high impact (political or material) relative to the force level involved. But it's source can be either Nation-State or Trans-National in nature. Slide 3 Given both types of warfare descriptions, let's examine the environment we must operate within. This chart represents both the symmetric and asymmetric warfare environments. The traditional force on force aspects are represented by the white battlefield in the background, while the asymmetric warfare environment is shown by the three bubbles that are labeled near field, transition zone, and far field. In the context of the problem domain, the "Near Field" refers to both physical and temporal nearness, while the "Far Field" refers to those more temporally and spatially distant events. The "Transition Zone" encompasses the interface between the two regions and is considered the area of intense human involvement needed to determine the validity of data and supporting hypotheses. It is important to note that some of the same technology solutions may apply to both warfare environments. It is also important to recognize that some may not. Specific technologies might be applied to asymmetric warfare that would not normally be needed for a force on force engagement. Slide 4 This chart represents our understanding of the differences between the three fields. The vertical axis represents a notional value metric for the four categories of asymmetric warfare measures: situation understanding, targets being attacked, offensive options, and response time. You can see that in the far field our understanding of what is happening is poorer given the large potential association of "indications data" to a potentially larger target set. But our options are just the opposite. In the Near Field, however, while there is less target uncertainty, there are also fewer response options and a greatly reduced response time. Slide 5 The requirements for total information awareness can be defined through four layers of technology. The bottom layer is the data gathering layer where sensors and data bases collect data from events occurring in the real world. The information discovery layer involves the retrieval and conversion of these data into information that is relevant to specific inferencing objectives that are driven by models of potential situations from a higher abstraction layer. This search for information may involve the use of agents, web crawlers, and other familiar methods for finding evidence that support specific arguments. The third layer is focused on the problem of representing the information space within a well defined semantic structure that is an abstract representation of the problem being solved. This abstract representation is most easily understood through the use of models of intent and/or behavior of people and things (model elements) which help to focus the information search on specific arguments being postulated about the model elements. The final layer is defined as collective reasoning, wherein complex arguments from (often ambiguous) data are "vetted" among humans using collaborative tools and "truth maintenance" decision aids. The above processes are data intensive, and thus we are driven to explore automated methods of processing and become most interested in the boundary interface that defines the limits of machine and human driven processes. Slide 6 One can view these technology layers symbolically as shown on this slide. The vertical axis represents semantic content, or the degree to which data can be abstracted to represent information centered around specific arguments being postulated. The horizontal axis represents the continuum of the data space which provides the necessary evidence to support or refute specific argument constructs. Data is gathered from events occurring in the real world as measured by sensors (to include humans) and as available in various open source including the World Wide Web and classified databases. These data contain inherent semantic value that is largely derived from the language which is used to describe them. I have termed this the "index space". You might think of "key words" as one form of "index" derived from an understanding of the meaning of symbols used in the English language as represented in a word document. At a higher level of abstraction, you might define the "index space" using "concepts". I've represented one document as the large white bar in the information space and shown it's key words linked into the index space. Above the Index Space are evidence models which are derived from intent models and argument models for the hypothesis under consideration that enable the association of ("index") evidence to arguments being postulated. These models help both to drive the search for supporting evidence, and provide the metric for scoring the value of new evidence found. A document that has high semantic content has more keywords that are included in the index space than one with lower semantic content and thus is easier to find during search. In general, documents with low semantic content require more knowledge and tailoring of search terms in order to guarantee successful retrieval (template match). The index space provides the basic evidence layer that can be mapped onto projections of the arguments through the use of intent models or other such constructs. These evidence data then are the basis for supporting or refuting arguments postulated and being reasoned. I will now delve deeper into the technologies of each layer. Slide 7 Collecting data that is relevant to arguments is a fundamental first step in reasoning about ones environment and situation. In this slide I've represented the events and/or sources of data in the near field and far field. Near field data sources include identification of objects (people, things and events) that have influence over our immediate environment in the near term. Thus elements of perimeter security and the interpretation of "alert" messages form the basic data sets are relevant. We are interested in technologies associated with the recognition of people and things and the interpretation of documents that may have bearing on an understanding of threats to our immediate environment. In the far field data space, we are interested in technologies that assist in the discovery of relevant information and the organizing of large volumes of multi-media databases. These technologies include data mining and the retrieval of information from heterogeneous data sources. Slide 8 I discussed some of the linkages between the information space and the index space in the overview chart. Here I've added the concept of automation to the search process through the use of agents that specialize in searching for data. These data search agents may be spawned from evidence and intent models or by human analysts laying out hypotheses or argument conditions. We are interested in technologies that make it simple for users to describe evidence models to agents and launch them. Slide 9 Similarly inference agents with higher levels of abstraction using knowledge bases and reasoning logic can be used to evaluate the mapping of evidence that support the instantiation of specific intent models. Agent technology plays an important role in the automated development of alternative hypotheses and evidential reasoning processes. Likewise the development of intent and behavioral models is an equally important technology development area. Slide 10 In order to support a conclusion, one has to generate various hypotheses and then collect evidence that supports or refutes that hypothesis. The evidence can be based on probability that an event has occurred or is about to occur as represented here by various types of models. Inference agents could be spawned to provide the link between the models and evidence. Genoa is an existing ISO project which focuses on collective reasoning -- especially the higher level aspects of this process. Slide 11 The premise of Project Genoa is that the earlier a crisis situation is identified and understood the greater the probability that mitigative or preemptive strategies can be developed. Genoa focuses on collaborative environments, knowledge discovery, structured argumentation, and corporate memory. In the problem domain of crisis management there is often great uncertainty and thus there will be multiple hypotheses given the same set of evidence. It is essential to have a collaborative environment for collective reasoning in which humans can work quickly and effectively to either resolve the differences of opinion in the interpretation of the evidence or spell out the differences to the decision maker in a clear and succinct manner. In the Total Information Awareness thrust, Genoa is at the heart of the Collective Reasoning function. Slide 12 Putting it all back together, the four layers are shown here mapped against the issue of automated versus human processes. Slide 13 The same four layers can also be mapped across the asymmetric environment zones. The near and far fields are centered more around the processes of data gathering and information discovery, while the transition zone is more heavily focused on human processes associated with collective reasoning about evidence through the exploitation of models that describe threat situations of interest. Slide 14 Thus one of the fundamental issues for TIA is the boundary between the automated and human driven processes. The primary question is: “How far can we push automation through the development of intelligent search and inference agents, to take the burden of finding relevant evidence that the human can reason about within a collaborative decision environment?” Total Information Awareness is NOT an approved program initiative. Rather it is a technology focus for information science related project for the Information Systems Office and a starting point for reshaping the direction of existing programs and launching new efforts in the future under the auspices of future approved BAA's We encourage you to submit ideas and technical concepts in support of TIA to the ISO office.