Enterprise wide information retrieval system LO1321

John Conover (john@johncon.com)
Fri, 19 May 95 21:48 PDT

Attached is a description of a enterprise wide information retrieval
and documentation system (eg., "memo machine," "information robot,"
"infobot," or "mailbot,") that I have used with good results for many
years. It has been used for context management of projects, in
geographically, and ethnically disperse organizations, that are
involved in the very dynamic electronic component market,
(microprocessor market, to be more specific.) The "C" sources to the
system are available at no charge by sending an email with the
following format:

To: info-request@johncon.johncon.com
Subject: archive

get rel

The sources constitute about 400K bytes, and will be returned to you,
by return email, in two sections. (You will be mailing into the system
described below, BTW, only in this case, it is functioning as a source
repository-so the search functions are disabled.)

You may want to skip the 5 numbered paragraphs, below, to get to the
description of the conceptual application of the system. There are
several attachments that describe the technical evolution of the
system. I think it might be interesting reading for those involved in
modern organizational theory and informatics-particularly if the
market place served is very dynamic, (the revenue's of the
microprocessor industry have a time series that is fractal, BTW.) It
should be apparent that this system is not meant to replace the
traditional content database systems used in MIS organizations, but is
complementary to them. The system is capable of operating in a
distributed, interoperable client-server environment.

The second paragraph of numbered paragraph 5, below, concerning the
concept of "moving orthogonal in information space" may be of interest
to those involved in OD when doing organizational effectiveness
evaluations.

john@johncon.com (John Conover)

John Conover, 631 Lamont Ct., Campbell, CA., 95008, USA.
VOX 408.370.2688, FAX 408.379.9602
john@johncon.com

______________________________________________________________________________

>From the README in /rel/example.app in the sources:
______________________________________________________________________________

The objective of this application of the rel program, in conjunction
with the procmail/smartlist programs, is to construct an enterprise
wide, full text information retrieval system that uses the Unix MTA
(Message Transfer Agent,) as a delivery, query, and distribution
system-in a sense, what is currently termed "groupware."

There is one very powerful operation that can be performed with the
system. Suppose that when examining the messages in order of
relevance, some interest is generated in one of the messages, and it
is desired to find out more about the "context" of the message. The
messages can be rearranged, and ordered by the author to further find
out the the "context" of the messages from the author's point of
view. The messages can also be rearranged by date, so that a short
"context" window can be derived around the temporal issues of the
message. These operations are termed "moving orthogonally in
information space," eg., you were moving through the documents in
order of relevance, then investigated the concerns of a specific
author, then investigated the author's concerns in relation to the
rest of the group over a period of time, etc.

Note that the process outlined really constitutes nothing more than an
electronic literature search, and is a similar process to what a
historian would do when researching a subject, (presumably because an
understanding of the "context" of the subject was desired.) The only
difference is that the process is highly automated. The proposed
system can be thought of as an electronic filling cabinet that can be
searched, electronically. The concept is not new-it was proposed by
Vanavar Bush in the 1940's, (the Memex Machine,) and later modified by
Douglas Engelbart in the early 1970's.

Note that what is being proposed is a new administrative paradigm-one
that addresses context as opposed to traditional content issues in
organizations, and is compatible with the contemporary concepts of
"Empowerment," and "Total Quality Management." Administration is the
mechanization of the flow information through an organization. What is
being proposed is to use an "information machine," ie., computer, to
search, collate, and distribute information. In some sense, it is a
"memo machine" that can transcend organizational and parochial
boundaries. Note that the memos do not have to be structured-how
information is structured will be specified at query time, not the
time of composition of the memo. (Note that it is not a "hypertext"
system, since the "links" are constructed, dynamically, at the time of
the query-not at the time a document is composed.)

A key issue in automating the process is the capability for the
uninitiated to create "context" queries that are representative of the
information desired-the rel syntax is powerful, intuitive, and easy to
use on an operational basis; most managers already understand how to
use email, and the rel query syntax is similar to the one used in
algebraic calculators-in point of fact, it is identical to the syntax
used in calculators from Texas Instruments and Casio, except that
numbers are replaced by words, and mathematical operators are replaced
by boolean operators-a short hand natural language query.

As a concluding remark, note that the system can not be a substitute
for good organizational practices and disciplines-as Doug Engelbart
stated "if you automate a big mess, you end up with a very fast big
mess."

Attachment:
______________________________________________________________________________

Attached is a brief synopsis of an asynchronous conferencing system
(also known as an information retrieval system, electronic literature
search system, or corporate repository,) that I used in cross
functional program management, in another life, a long time ago. The
objective was to find a methodology to relate the corporate
information repository to the management structure, (we did not
consider the technical issues to be significant.) The general concept
was to add sufficient functionality to the Unix email system to turn
it into an electronic literature search system.

The attached is a "cut and stick" from some of the reports on the
system's development. The project/program team supported by this
system consisted of little over a hundred professionals, from
approximately 20 specialties, and 4 core corporate functions. They
were geographically, and ethnically, disperse.

Information systems are used in program management, which must
coordinate the various activities of the corporate functions (ie.,
engineering, marketing, sales, etc.) involved in development
projects. After researching the issues, (see below,) We concluded that
a distributed full text system that uses the mail (MTA) system as a
communication medium is the desirable direction to pursue. Our
reasoning is as follows:

1) The Unix MTA is almost universal, and will operate
effectively over uucp and/or ethernet connectivities in a
non-homogeneous hardware environment.

2) Each transaction is logged, with a date/time stamp, and who
created the transaction.

3) The MTA already has remedial file storage capabilities,
which can be used to query/respond to transactions at a later
date.

4) Most(?) computers are already connected together, and users
are familiar with how to use the system.

5) The MTA database can be NFS'ed to conserve machine
resources.

6) It is a text based system.

We discounted the "hyper text" type of systems, because the links must be
established before the document is stored-which is fine if you know what
you are going to query for. In a general management application, this is
seldom the case. We set up a prototype system, using the following
(readily available) programs:

1) elm, because it has a slightly more sophisticated file
storage structure, and a very powerful aliasing capability
that can alias team members as a group. Additionally, it has
limited query capabilities, and can, through its forms
capabilities, send mail transactions in a structured format.
(Which is advantageous if the transactions are used for
notification of schedule milestone completion, etc.) Eudora
was used on the PC's and MAC's, using POP3 as the
communications environment between the PC's and the Unix MTA.

2) The dbm library to build an extensible hash query system
into the file storage structure made by elm. This was
operated in two ways, by an RPC direct call, and a mail daemon
that "read" incoming mail (to a query "account") and returned
(via mail) all transactions that satisfied boolean
conditionals on requested words. (A data dictionary was added
later, so that the dictionary could be scanned for matches to
regular expressions, which were then passed to the extensible
hash system, but for some reason, this was seldom used.) The
query was made through a very simple natural language
interface, ie.,

send john and c.*r not January

would return all transactions containing john, excepting those
written in January. (We did not attempt phrases, it looked
complicated-this is ill advised by Tenopir, etc. below.)
This program contained approximately 350 lines of C code. A
soundex algorithm was added later to overcome spelling
errors-the full text database contained the soundex of the
words in a document, and any words searched for were converted
to soundex prior to the query. (See the works by Knuth for
details of the soundex algorithm.) Also a parser was added so
that the boolean search words could be grouped in postfix
expressions, eg., ((john & conover) ! (January | march)). The
order that the documents were returned in is in order of
relevance.

This prototype was well received, and was used as follows:

1) Management "decreed" that the system would be used as a
management tool, and all data had to be entered, or
transcribed into the system (including the minutes of
meetings, etc.) If it didn't exist in the system, it did not
exist. All discussions, and reasons for decisions had to be
placed in the system. ALL team members and upper management
had identical access to ALL transactions. (Mail could be used
for private correspondence, such as politicking, etc. but all
decisions, and the reasons for the decisions had to be placed
in the system.) The guiding rule was that at the end of the
project, the system contained a complete play by play
chronology and history of all decisions, and reasoning
concerning the project, and, by the way, who was responsible
for the decisions. On each Monday, everyone entered into the
system, his/her objectives for the week, and when each
objective was finished, she/he mailed the milestone into the
system-ie., all group members and management could thus find
out the exact status of the project at any time (ie., a
"social contract" was made with management and the rest of the
members of the team.) In some sense, it is really nothing more
than an automated, real-time MBO system. At any time, a
discussion could be initiated on problems/decisions in the
system by anyone. The project manager was assigned the
responsibility of "moderator," or chair person for his/her
section of the project. Each Friday, the system was queried
for project status, and the status plumbed to TeX for
formating, and printed for official documentation. This
document was discussed at a late Friday people-to-people staff
meeting. (The reason for setting things up this way can be
found in Davido, below.)

2) Marketing was responsible for acquiring all market data on
magnetic media, (from services like Data Quest, the Department
of Commerce, etc.) and each document was "mailed" into the
system so that the information was available for retrieval by
anyone. All had access to the progress made by engineering,
and can contribute information on issues as the program
develops-ie., this was a "concurrent engineering" environment.

3) Engineering was responsible for maintaining schedules, and
reflecting those schedules in the system-if slippages occurred
the situation could be addressed immediately by management,
and a suitable cross functional resolution could be arrived
at.

4) Sales was responsible for adding customer inputs,
concerning the project, into the system, so customer
definitions could be retrieved by all project members. This
included the customer data, such as who has buying authority
in the customer's organization, who has signature, etc.

The results were very impressive not only by productivity standards, but
also by "correctness to fit and form" standards (ie., the right product
was in the market at the right time, the first time.) This has becoming a
central agenda, as outlined in Davido, below.

Bibliography:

"Computer-Supported Cooperative Work," Irene Greif
"A model for Distributed Campus Computing," George A. Champine
"Enterprise Networking," Ray Grenier and George Metes
"Connections," Lee Sproull and Sara Kiesler
"5th Generation Management," Charlse M. Savage
"Intellectual Teamwork," Jolene Galegher, Robert E. Krout and Carmen Egido
"In the Age of the Smart Machine," Shoshana Zuboff
"The Virtual Corporation," William H. Davido and Michael S. Malone
"Accelerating Innovation," Marvin L. Patterson
"Paradigm Shift," Don Tapscott and Art Caston
"Developing Products in Half the Time," Preston G. Smith and Donald G. Reinertsen
"Full Text Databases," Carol Tenopir and Jung Soon Ro
"Text and Context," Susan Jones
"From Memex to Hypertext," James M. Nyce and Paul Kahn
"The Corporation of the 1990's," Michael S. Scott Morton
"Computer Augmented Teamwork," Robert P. Bostrom, Richard T. Watson, Susan T. Kinney
"Engineering Information Management Systems," John Stark
"CE Concurrent Engineering," Donald E. Carter and Barbara Stilwell Baker
"Information Retrieval," William B. Brakes and Ricardo Baeza-Yates
"Text Information Retrieval Systems," Charles T. Meadow
"Leading Self-Directed Work Teams," Kimball Fisher

______________________________________________________________________________