The relational database

It’s hard to imagine the modern world without relational databases. The ease and efficiency they afford for businesses have made them a multibillion-dollar industry and the standard for processing financial records, personnel data and logistical information. They’re the key to seamlessly accessing everything from bank accounts and credit card data to making travel reservations, executing online purchases and trading stocks.

But in 1970, this foundational method of data storage and retrieval was only a theory. That’s when Edgar F. “Ted” Codd, an Oxford-educated mathematician working at the IBM San Jose Research Lab in San Jose, California, published a paper about a system that could potentially store and access information in large databases without providing a formal organizational structure or even recording exact locations.

An acute business problem

Managing huge volumes of data

Codd’s notion was borne of an acute business problem. In the 1960s, information systems were beginning to manage huge volumes of data. The process of retrieving that data required advanced technical mastery, not to mention time and money. Mainframe computers, which were relatively new on the scene, cost hundreds of dollars per minute to operate, largely due to the complexity of database management. The early databases used rigid hierarchical structures and convoluted navigational plans to indicate the physical linking or nesting of the data on magnetic tapes. As a result, computer specialists often needed to write entire programs just to access an exact bit of information.

In his 1970 paper “A Relational Model of Data for Large Shared Data Banks,” Codd envisioned a software architecture that would enable users to access information without expertise in, or even knowledge of, the database’s physical blueprint. He introduced a concept for a database that could organize information into linkable — or related — tables based on common characteristics, making it possible for users to retrieve an entirely new table from data in one or more tables with a single query. It also greatly benefited businesses, providing them a better means of understanding the relationships among all of their data in order to gain insights for making decisions or identifying opportunities.

“Ted’s basic idea was that relationships between data items should be based on the item’s values, and not on separately specified linking or nesting,” said Don Chamberlin, best known as co-inventor with Raymond Boyce of Structured Query Language (SQL). “This greatly simplified the specification of queries and allowed unprecedented flexibility to exploit existing data sets in new ways. He believed that computer users should be able to work at a more natural language level and not be concerned about the details of where or how the data was stored.”

Ted’s basic idea was that relationships between data items should be based on the items’ values, and not on separately specified linking or nesting Don Chamberlin

Co-inventor with Raymond Boyce of Structured Query Language (SQL)

Overcoming market skepticism

The database industry takes off

Codd’s novel concept had to be tested to address skepticism and prove its power and scalability. A group of programmers in 1973 undertook an industrial-strength implementation: the System R project. The team included Chamberlin and Boyce, as well as Patricia Selinger, who developed a cost-based optimizer (modifying the software to increase storage and require fewer resources) that made relational databases more efficient. Raymond Lorie, who invented a compiler that could save database query plans for later use, also contributed.

Concurrently, Chamberlin and Boyce developed SQL and systems for automatically translating high-level queries into efficient plans for execution. The successful effort led to a host of IBM products, including the IBM DB2 database management system. (Larry Ellison’s company Relational Software, later renamed Oracle, produced the first commercially available relational database in 1977).

DB2 was first shipped in 1983 on the MVS mainframe platform. It became widely recognized as the premier database management product for mainframes and spread to the worlds of parallel processors and desktop operating systems. Today, it is used on everything from handheld devices to supercomputers and remains a foundational component for countless data transactions, including at ATMs and for online purchases.

Codd was named an IBM Fellow in 1976 and received the Turing Award from the Association of Computing Machinery in 1981. He died in 2003 at age 79. At the time of his passing, Janet Perna, who was then responsible for IBM’s relational database products, summarized his accomplishments: “His remarkable vision and intellectual genius ushered in a whole new realm of innovation that has shaped the world of technology today, but perhaps his greatest achievement is inspiring generations of people who continue to build on the foundations he laid.”

His remarkable vision and intellectual genius ushered in a whole new realm of innovation that has shaped the world of technology today Janet Perna

General manager, IBM Information Management, in 2003

The relational database

An acute business problem

Overcoming market skepticism

Related stories