We enable business and digital transformation decisions through the delivery of cutting-edge ICT solutions and products...
A database is a collection of data that is stored for a specific purpose and organized in a manner that allows its contents to be easily accessed, managed, and updated. Although this definition includes stored data collections such as libraries, file cabinets, and address books, when we talk about databases we almost invariably mean a collection of data that is stored on a computer. There are two basic categories of a database. The most commonly encountered category is the transactional database, used to store dynamic data, such as inventory contents, which is subject to change on an ongoing basis. The other category is the analytical database, used to store static data, such as geographical or chemical test results, which is rarely altered.
Strictly speaking, a database is just the stored data itself, although the term is often used, erroneously, to refer to a database and its management system (DBMS).
The first attempts at computer databases arose around the mid-twentieth century. Early versions were file-oriented. A database file became known as a table because its structure was the same as a paper-based data table. For the same reason, the columns within a table were called fields and the rows were called records. Computers were evolving during that same time period, and their potential for data storage and retrieval was becoming recognized.
The earliest computer databases were based on a flat file model, in which records were stored in text format. In this model, no relationships are defined between records. Without defining such relationships, records can only be accessed sequentially. For example, if you wanted to find the record for the fiftieth customer, you would have to go through the first 49 customer records in sequence first. The flat file model works well for situations in which you want to process all the records, but not for situations in which you want to find specific records within the database.
The hierarchical model, widely used in mainframe environments, was designed to allow structured relationships that would facilitate data retrieval. Within an inverted tree structure, relationships in the hierarchical model are parent-child and one-to-many. Each parent table may be related to multiple child tables, but each child table can only be related to a single parent table. Because table structures are permanently and explicitly linked in this model, data retrieval was fast. However, the model’s rigid structure causes some problems. For example, you can’t add a child table that is not linked to a parent table: if the parent table was “Doctors” and the child table was “Patients,” you could not add a patient record independently. That would mean that if a new patient came into a community’s health care system, under this system, their record could not be added until they had a doctor. The hierarchical structure also means that if a record is deleted in a parent table, all the records linked to it in child tables will be deleted as well.
Also based on an inverted tree structure, the next approach to database design was the network model. The network model allowed more complex connections than the hierarchical model: several inverted trees might share branches, for example. The model connected tables in sets, in which a record in an owner table could link to multiple records in a member table. Like the hierarchical model, the network model enabled very fast data retrieval. However, it also had a number of problems. For example, a user would need a clear understanding of the database structure to be able to get information from the data. Furthermore, if a set structure was changed, any reference to it from an external program would have to be changed as well.
In the 1970s, the relational database was developed to deal with data in more complex ways. The relational model eventually dominated the industry and has continued to do so through to the present day. We’ll explore the relational database in some detail in the next segment.
In the relational database model, data is stored in relations, more commonly known as tables. Tables, records (sometimes known as tuples), and fields (sometimes known as attributes) are the basic components. Each individual piece of data, such as the last name or a telephone number, is stored in a table field and each record comprises a complete set of field data for a particular table. In the following example, the table maintains customer shipping address information. Last_Name and other column headings are the fields. A record, or row, in the table, comprises the complete set of field data in that context: all the address information that is required to ship an order to a specific customer. Each record can be identified by, and accessed through, a unique identifier called a primary key. In the Customer_Shipping table, for example, the Customer_ID field could serve as a primary key because each record has a unique value for that field’s data.
ID | First | Last | Apt. | Address | City | State | Zip |
101 | John | Smith | 147 | 123 1st Street | Chicago | IL | 60635 |
102 | Jane | Doe | 13 C | 234 2nd Street | Chicago | IL | 60647 |
103 | June | Doe | 14A | 243 2nd Street | Chicago | IL | 60647 |
104 | George | Smith | N/A | 345 3rd Street | Chicago | IL | 60625 |
The term relational comes from set theory, rather than the concept that relationships between data drive the database. However, the model does, in fact, work through defining and exploiting the relationships between table data. Table relationships are defined as one-to-one (1:1), one- to-many (1: N), or (uncommonly) many-to-many (N: M):
If a pair of tables has a one-to-one relationship, each record in Table A relates to a single record in Table B. For example, in a table pairing consisting of a table of customer shipping addresses and a table of customer account balances, each single customer ID number would be related to a single identifier for that customer’s account balance record. The one-to-one relationship reflects the fact that each individual customer has a single account balance.
If a pair of tables has a one-to-many relationship, each individual record in Table A relates to one or more records in Table B. For example, in a table pairing consisting of a table of university courses (Table A) and a table of student contact information (Table B), each single course number would be related to multiple records of student contact information. The one-to-many relationship reflects the fact that each individual course has multiple students enrolled in it.
If a pair of tables has a many-to-many relationship, each individual record in Table A relates to one or more records in Table B, and each individual record in Table B relates to one or more records in Table A. For example, in a table pairing consisting of a table of employee information and a table of project information, each employee record could be related to multiple project records and each project record could be related to multiple employee records. The many-to-many relationship reflects the fact that each employee may be involved in multiple projects and that each project involves multiple employees.
The relational database model developed from the proposals in “A Relational Model of Data for Large Shared Databanks,” a paper presented by Dr. E. F. Codd in 1970. Codd, a research scientist at IBM, was exploring better ways to manage large amounts of data than were currently available. The hierarchical and network models of the time tended to suffer from problems with data redundancy and poor data integrity. By applying relational calculus, algebra, and logic to data storage and retrieval, Codd enabled the development of a more complex and fully articulated model than had previously existed.
One of Codd’s goals was to create an English-like language that would allow non-technical users to interact with a database. Based on Codd’s article, IBM started their System R research group to develop a relational database system. The group developed SQL/DS, which eventually became DB2. The system’s language, SQL, became the industry’s de facto standard. In 1985, Dr. Codd published a list of twelve rules for an ideal relational database. Although the rules may never have been fully implemented, they have provided a guideline for database developers for the last several decades.
Although the relational model is by far the most prevalent one, there are a number of other models that are better suited to particular types of data. Alternatives to the relational model include:
5. What languages are used to interact with databases?
SQL (Structured Query Language) is by far the most common language used to interact with relational databases. Originally developed for use with IBM’s DB2, the standard — often pronounced “sequel” — is promoted in various formats by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
SQL commands are fairly straightforward and easy to understand. For example, if you wanted a list of all your customers within a specific zip code area, the following command (based on the table in the response to question #2, above) for example, will return that information, which in this case would be “George Smith.”
select First_Name, Last_Name from Customer_Shipping where Zip = '60625';
Most databases use SQL, although many use proprietary extensions specific to their own products.
6. How can I ensure a good database design?
Hands down, the most important thing you can do to ensure a successful database design is to put enough resources into the planning stage. The proliferation of off- the-shelf databases and database applications has led many people to a number of erroneous conclusions, such as:
Databases came into being because of the computer, and the two have enjoyed a mutually beneficial symbiotic relationship ever since, each helping the other grow by leaps and bounds. Somewhat ironically, however, the best way to start a plan for database development is to take out paper, a pencil — and a big eraser.
Database development is a three-phase process. In the first phase, you should create the logical design for the database, based solely on the data you want to store, rather than thinking of the specific software that will be used to create it, or the types of reports that will be created from it. In this phase, you define tables and fields and establish primary and foreign keys and integrity constraints. In the second phase, you implement your plan within the database software program, and in the third phase, you develop the end-user application that will allow your user(s) to interact with the database.
Create your design on paper first, as completely as possible.
In short:
Well, normalized data makes programming (relatively) easy, and works very well in multi-platform, enterprise wide environments. Non-normalized data leads to heartbreak. — Steve Litt
Normalization is a guiding process for database table design that ensures, at four levels of stringency, increasing confidence that results of using the database are unambiguous and as intended. Basically a refinement process, normalization tests a table design for the way it stores data, so that it will not lead to the unintentional deletion of records, for example, and that it will reliably return the data requested.
First normal form (1NF). This is the “basic” level of normalization and generally corresponds to the definition of any database, namely:
Second normal form (2NF). In addition to 1NF rules, each field in a table that does not determine the contents of another field must itself be a function of the other fields in the table. For example, in a table with three fields for customer ID, the product sold, and price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product.
Third normal form (3NF). In addition to 2NF rules, each field in a table must depend on the primary key. For example, using the customer table just cited, removing a record describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain price. In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately. The customer information would depend on the primary key of that table, Customer_ID, and the pricing information would depend on the primary key of that table, which might be Invoice_Number.
Domain/key normal form (DKNF). In addition to 3NF rules, a key, which is a field used for sorting, uniquely identifies each record in a table. A domain is the set of permissible values for a field. By enforcing key and domain restrictions, the database is assured of being freed from modification anomalies. DKNF is the normalization level that most designers aim to achieve.
Browse through database vocabulary in a handy printable glossary.
Contact Musato Technologies today to learn more and to find out how we can help technologically your business.