Rapid Knowledge Formation Slide 1 The objective of the RKF program is to enable distributed teams of subject matter experts to enter knowledge directly and easily without the need for knowledge engineers to act as intermediaries. In addition we are looking to create knowledge bases that exceed 1 million axioms in less than 1 year's time, through a variety of techniques including: • reuse of knowledge bases and problem solving methods, • theory composition and merging of component elements, • parallel entry • well structured upper ontologies and theories RKF is a follow-on to the High Performance Knowledge Bases program. Slide 2 Our vision is that scientific, technical, and military experts would encode massive amounts of knowledge into reusable knowledge bases, distributed throughout the Web. These knowledge bases would be available to provide specific answers to questions and could be applied in many different problem-solving situations. We would like to seed this activity with tools, techniques and an initial critical mass of reusable knowledge. Once appropriately seeded, we would expect that the technical communities themselves would take over the task of developing, extending, and expanding this knowledge infrastructure. Hence, RKF is intended to provide an initial technical foundation for a knowledge-based infrastructure for the next generation internet. Slide 3 The most difficult challenges involve bridging the chasm between human thinking and the formal logic of machines. Humans naturally reason spatially in three dimensions, possess vast amounts of common sense acquired through years of experience, employ analogies, and use natural language to think and express themselves. Machines are constrained to techniques of mathematical logic, require precise representations and usually employ deductive theorem proving. The complexity of the knowledge bases required for difficult problems can grow exponentially. I will now address the activities and technologies we see to address these challenges. Slide 4 A coherent system of axioms about a topic is called a theory. In order to author a new theory, a human expert must be provided tools that enables him or her to comprehend what is already in the knowledge base, enter new knowledge, and correct errors. We have categorized the technology into four areas. Human-knowledge base interaction. Knowledge formation Theory manipulation Knowledge base content I will go into each area in the following slides Slide 5 Technology R&D is required to translate natural language into statements of logic as well as to generate natural language from logic. Discourse understanding technology is required to extract input information from multiple input sentences. Additional dialog technologies are necessary for user-knowledge base interaction to clarify input statements and repair input errors. Sketching and diagram input are important and natural ways to provide knowledge input and are required as well. Explanation technologies are required to generate succinct summaries of the logical theories contained within a knowledge base in order to help the user identify and locate relevant knowledge as well as to understand the line of reasoning employed in a formal deduction. Slide 6 Knowledge formation includes the technology to automatically elaborate partial user input into a more extensive set of axioms, including functionality to find theories structurally similar to the input and between which similar components can be mapped. Other functionality needed includes: Capabilities to infer general axioms from positive and negative examples. And, techniques to manage the incremental knowledge formation process by managing representations of partial theories in a temporary store until they are sufficiently complete and consistent that they can be added to the knowledge base. Slide 7 Theory manipulation includes functionality to decompose theories into primitives, a well as, to map and merge sub-theories. It also includes technology to generate test cases and evaluate forming theories, detect and resolve conflicts between new and old theories, maintain consistencies within subsets of axioms, AND represent and organize alternative belief structures. Slide 8 We partition the knowledge base hierarchically into component elements associated with problem solving and reasoning methods, general ontologies, middle-level theories, domain specific theories, and data bases of individual instances or ground facts. Many of the more general components were developed or enhanced within the High Performance Knowledge Bases program. Problem solving methods are libraries of general purpose techniques that can be instantiated, related to specific ontologies, and applied to numerous specific applications. The upper ontology consists of the most general terms and concepts and their definitions. The mid-level theories include a core set of basic theories for space, time, causality, motion, and objects. These, too, are applicable to most any domains. We believe the authoring of reusable but domain specific theories, directly by subject matter experts, to be a key leverage point for RKF technology. This knowledge should be reusable for answering a broad range of questions and solving a large number of problems within the specific domain of interest. Our goal is to enable subject matter experts to extend the ontologies, build and reuse domain specific theories, and enter ground fact instances within this knowledge infrastructure. We also believe natural language and multi-media extraction technology can rapidly substantially extend the scope of domain facts and data available to the knowledge base. Slide 9 An overall problem, relevant to the Department of Defense related to Chemical and Biological warfare, has been selected and specified to drive the technology development and knowledge base construction. This has been selected to insure the relevance and scalability of the technology. The problem has been structured in such a way that no single technology alone will solve the problem, but rather, teams of developers will have to collaborate in order to produce a solution. The solutions will require technology from all the previously described technology areas. There will be annual formal evaluations to empirically validate our claims of scientific progress. Metrics will be collected on knowledge base size, development rates, and competency. The problems to be solved will be made more challenging each year in order to "raise the bar" in knowledge base technology. Slide 10 The problem selected deals with chemical and biological weapons. An extensive amount of knowledge, information, and data exists about this topic in open sources such as textbooks, field manuals, and the WWW. Subject matter experts are also available within government, industry, and academia. A draft specification of the problem will be made available with the BAA. Slide 11 The critical milestones for the program are as follows: BAA this summer. Awards in first quarter of FY 00. The first year will be dedicated to technology development with individual component evaluation experiments planned for the end of the fiscal year. At the end of the program's second year we envision testing a single user knowledge entry capability with all the components previously described. We hope to achieve a rate of knowledge entry of 2,000 axioms a month. At the end of the program's third year, we envision testing a parallel knowledge entry capability enabling 25 to 50 individuals to work asynchronously in a distributed fashion. In the program's final year we envision the construction of a very large and competent chemical and biological weapons knowledge base by teams of domain experts who are AI novices. This concludes my briefing. Thank you for your interest.