Channels ▼

Ken North

Dr. Dobb's Bloggers

Information Storage and Retrieval: From MEDLARS to Twitter

March 01, 2012

Each decade, technology breakthroughs update our lexicon with new words and phrases. In some instances, we remember the timeline in which we learned about new technology. I recall learning about SIMSCRIPT, "reservoir modeling", cybernetics, "real-time", "information storage and retrieval", APL, and "database" in roughly that order. Of course, it's taken years for me to be able to use real-time and database in the same sentence when describing an application.

An exploration of modern information storage and retrieval encompasses data models, data stores, query techniques, data mining, search engines, automated indexing and classification, machine learning, and a host of topics that could keep a blog busy for years. In looking at information storage and retrieval, one cannot help but take note of MEDLARS and Twitter, two systems born 50 years apart.

One of the most notable achievements in information storage and retrieval was the creation of a centralized database of health sciences bibliographic information. The US National Library of Medicine (NLM), part of the US National Institute of Health (NIH), has been indexing and abstracting medical literature for decades. The Medical Literature Analysis and Retrieval System (MEDLARS) database was developed at NLM. To publish a variety of documents, including quarterly editions of Index Medicus, NLM linked MEDLARS and the GRACE photocomposition system. Perhaps because of need and good design principles, this information retrieval system has served multiple generations of users.

The MEDLARS example is interesting because it represents an information storage and retrieval technology that has stored data and answered queries for a period of decades. Interactive access to the MEDLARS database became available with MEDLARS Online (MEDLINE). Today, web access to 21 million journal articles is available through the PubMed portal. The original medical literature database has been augmented with database services related to toxicology data, clinical trials, chemical identification, genetic taxonomies, genome mapping, molecular biology. and other knowledge domains.

Figure 1 shows the number of articles related to "Heart" that have been added to the database each year. The current total exceeds 1 million bibliographic citations since 1950.

[Click image to view at full size]

Vocabulary for Searching

Since the 1960s, NLM has operated computer systems that provide online searches using a controlled, domain-specific vocabulary known as MeSH (Medical Subject Headings). In recent years, the information retrieval capabilities have been augmented to encompass semantic search techniques — finding information based on concepts and not just strict matches against search criteria.

Because the vocabulary is extensive, NLM provides a MeSH Browser for finding descriptors, qualifiers, and other concepts of interest. Today's MeSH vocabulary is hierarchical information that conforms to this tree structure:

  • Anatomy [A]
  • Organisms [B]
  • Diseases [C]
  • Chemicals and Drugs [D]
  • Analytical, Diagnostic, and Therapeutic Techniques and Equipment [E]
  • Psychiatry and Psychology [F]
  • Phenomena and Processes [G]
  • Disciplines and Occupations [H]
  • Anthropology, Education, Sociology, and Social Phenomena [I]
  • Technology, Industry, Agriculture [J]
  • Humanities [K]
  • Information Science [L]
  • Named Groups [M]
  • Health Care [N]
  • Publication Characteristics [V]
  • Geographicals [Z]

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Dr. Dobb's TV