Information Storage and Retrieval: From MEDLARS to Twitter
Each decade, technology breakthroughs update our lexicon with new words and phrases. In some instances, we remember the timeline in which we learned about new technology. I recall learning about SIMSCRIPT, "reservoir modeling", cybernetics, "real-time", "information storage and retrieval", APL, and "database" in roughly that order. Of course, it's taken years for me to be able to use real-time and database in the same sentence when describing an application.
An exploration of modern information storage and retrieval encompasses data models, data stores, query techniques, data mining, search engines, automated indexing and classification, machine learning, and a host of topics that could keep a blog busy for years. In looking at information storage and retrieval, one cannot help but take note of MEDLARS and Twitter, two systems born 50 years apart.
One of the most notable achievements in information storage and retrieval was the creation of a centralized database of health sciences bibliographic information. The US National Library of Medicine (NLM), part of the US National Institute of Health (NIH), has been indexing and abstracting medical literature for decades. The Medical Literature Analysis and Retrieval System (MEDLARS) database was developed at NLM. To publish a variety of documents, including quarterly editions of Index Medicus, NLM linked MEDLARS and the GRACE photocomposition system. Perhaps because of need and good design principles, this information retrieval system has served multiple generations of users.
The MEDLARS example is interesting because it represents an information storage and retrieval technology that has stored data and answered queries for a period of decades. Interactive access to the MEDLARS database became available with MEDLARS Online (MEDLINE). Today, web access to 21 million journal articles is available through the PubMed portal. The original medical literature database has been augmented with database services related to toxicology data, clinical trials, chemical identification, genetic taxonomies, genome mapping, molecular biology. and other knowledge domains.
Figure 1 shows the number of articles related to "Heart" that have been added to the database each year. The current total exceeds 1 million bibliographic citations since 1950.
Vocabulary for Searching
Since the 1960s, NLM has operated computer systems that provide online searches using a controlled, domain-specific vocabulary known as MeSH (Medical Subject Headings). In recent years, the information retrieval capabilities have been augmented to encompass semantic search techniques — finding information based on concepts and not just strict matches against search criteria.
Because the vocabulary is extensive, NLM provides a MeSH Browser for finding descriptors, qualifiers, and other concepts of interest. Today's MeSH vocabulary is hierarchical information that conforms to this tree structure:
- Anatomy [A]
- Organisms [B]
- Diseases [C]
- Chemicals and Drugs [D]
- Analytical, Diagnostic, and Therapeutic Techniques and Equipment [E]
- Psychiatry and Psychology [F]
- Phenomena and Processes [G]
- Disciplines and Occupations [H]
- Anthropology, Education, Sociology, and Social Phenomena [I]
- Technology, Industry, Agriculture [J]
- Humanities [K]
- Information Science [L]
- Named Groups [M]
- Health Care [N]
- Publication Characteristics [V]
- Geographicals [Z]