Process and Human Factors Engineering
line
Research and Technology 2002
 
Searchable Answer-Generating Environment (SAGE) Expert-Finder
 

SAGE is a repository of experts information within the State of Florida. The SAGE Expert-Finder Knowledge Management (KM) System creates an integrated database by masking multiple databases as if they were one.


SAGE has been on-line since August 1999 and can be found at http://sage.fiu.edu. (See figure 1.) Originally, SAGE unified the researcher databases from the State of Florida University System. Currently, SAGE has access to sponsored research data from private institutions as well, such as Florida Institute of Technology, Florida Memorial College, and Embry-Riddle Aeronautical University.


SAGE gives university researchers more visibility and simultaneously allows interested parties to identify available expertise within Florida universities. This application helps to identify a researcher’s proficiency within a discipline and to facilitate a point of contact.


Benefits that SAGE provides include the following:

  • SAGE helps locate researchers in Florida for collaboration with industry and Federal agencies, thereby increasing potential research funding to the Florida universities.
  • SAGE combines and unifies existing data from multiple sources into one Web-accessible interface.
  • SAGE incorporates a File Transfer Protocol (FTP) client, an application that will be resident at each of the participating universities to automate data transfer to the SAGE server.


In October 2001, the SAGE graphical user interface was completely redesigned to maintain the uniformity visible throughout all KM Lab-hosted Web sites. The new interface provides an enhanced flash capability as well as a basic html version. SAGE also includes a thesaurus, which is a collection of concepts forming an ontology that upon request can perform a search on similar words based on the keyword in use. A consistent taxonomy is applied to improve upon the usual keyword- and full-text-based techniques. It allows an end user to retrieve information using appropriate terminology and avoids problems of poor selectivity and quality of results caused by missing, inconsistent, or conflicting vocabulary.


The use of a thesaurus can extend the capability of the Web site by generating new keywords from an existing input provided by the user. The thesaurus provides a standardized means of organizing many kinds of information, including both conceptual and taxonomic. In other words, the thesaurus is a tool designed to aid users in finding their way around a vocabulary database. In addition to its traditional use as an authority for the terms used in indexing the database, it offers suggestions of terms the user might not even have considered.


The construction of the thesaurus is accomplished using Perl programming language because of its powerful text processing capabilities. The script uses an existing pool of information from Wordsmyth Educational Dictionary-Thesaurus (WEDT) that includes over 50,000 headwords and very precisely defined and hyperlinked synonyms. It retrieves an extended set of related terms or a set of synonyms. ColdFusion 4.5 is used to cache the results of the query. In addition, the new output is used in conjunction with the Verity Search Engine that utilizes a stoplist and stemming.


To achieve results, we use interprocess communication (IPC). When a user submits a search, the script will issue an http request to a remote server by communicating through a socket (connection). The http request queries an external search engine that resides on the remote server. The script retrieves the html document generated by the search engine, parses the document by using regular expressions, and retrieves the desired information. Basically, since it is issuing a request, the script is acting as Web client.

 

Current SAGE Architecture

Current SAGE Architecture

 

The script relies on the structure of the document, Web structure mining. In this case the document is a raw html document. To retrieve data from an html document efficiently, the document must have a uniform format or a “cue” for where to start looking for the necessary data. So the programming task is trivial.

With the goal of always trying to improve and facilitate the process of keeping the SAGE database updated, the latest development is the SAGE Automatic FTP Client version 3.0 (figure 2). SAGE Auto FTP Client version 3.0 allows the universities that are part of the SAGE community to transfer their researcher’s information file directly from their servers to the SAGE server residing at the KM Lab in an easy and efficient manner. This data can be received in a variety of different formats, such as Excel or Access, and even in a tab-delimited text file. Once this data is obtained, the KM Lab members convert it to a standard format to be displayed in the SAGE Web site.


The SAGE Auto FTP Client version 3.0 offers a simple, user-friendly interface. It also contains a Help File where the user can find answers to frequently asked questions. Every university is provided with a unique password to ensure the accuracy of the transfer. The SAGE Auto FTP Client version 3.0 guarantees the SAGE database will have the most up-to-date information pertaining to researchers, their projects, and their funding all year long.


The current version of SAGE includes a Newsletter that allows the users to keep up to date with the information regarding existing and upcoming SAGE news. To subscribe, the user just clicks on the SAGE Newsletter icon that appears in the SAGE home page and provides a name and e-mail address. The user will periodically receive the latest information concerning SAGE (figure 3).


SAGE is hosted from the KM Lab, and it is available 24 hours a day, 7 days a week. Currently SAGE receives an average of 40 users per day. About 54 percent of the total hits that SAGE gets daily originate from commercial sites (.com and .net), 39 percent are from educational institutions (.edu), and 2 percent are from Government and military organizations (.gov and .mil). Visitors originate from the United States and around the world, including Japan, France, Austria, Switzerland, Bahamas, Mexico, and the United Kingdom.


Contacts: Dr. O. Melendez (Orlando.Melendez-1@ksc-nasa.gov), YA-F2-C, (321) 867-9407; and S.H. Chance, BA-C, (321) 867-4194
Participating Organization: Florida International University (Dr. I. Becerra-Fernandez)

     
line
Home
Biological Sciences
Range Technologies
Spaceport Structures and Materials
CCM
Fluid System Technologies
Process and Human Factors Eng