Integrity Constraints and Data Validation

Gerd Wagner

4.73/5 (15 votes)

Apr 3, 2014

CPOL

19 min read

42500

What are the most important types of integrity constraints to be checked in any data management app?

Introduction

This article is an excerpt from JavaScript Frontend Web Apps Tutorial Part 2: Adding Constraint Validation, which is part of the book Building Frontend Web Apps with Plain JavaScript.

For detecting non-admissible and inconsistent data and for preventing such data to be added to an application's database, we need to define suitable integrity constraints that can be used by the application's data validation mechanisms for catching these cases of flawed data. Integrity constraints are logical conditions that must be satisfied by the data of the business objects stored in the application's database. For instance, if an application is managing data about persons including their birth dates and their death dates, if they have already died, then we must make sure that for any person object with a death date, this date is not before that person object's birth date.

Integrity constraints may take many different forms. For instance, property constraints define conditions on the admissible property values of an object of a certain type. We focus on the most important kinds of property constraints:

String Length Constraints: require that the length of a string value for an attribute has a maximum, or a minimum.
Mandatory Value Constraints: require that a property must have a value. For instance, a person must have a name, so the name attribute must not be empty.
Range Constraints: require that a property must have a value from the value space of the type that has been defined as its range. For instance, an integer attribute must not have the value "aaa".
Interval Constraints: require that an attribute's value must be in a specific interval.
Pattern Constraints: require that a string attribute's value must match a certain pattern, typically defined by a regular expression.
Cardinality Constraints: apply to multi-valued properties, only, and require that the cardinality of a multi-valued property's value set is not less than a given minimum cardinality or not greater than a given maximum cardinality.
Uniqueness Constraints: require that a property's value is unique among all instances of the given object type.
Referential Integrity Constraints: require that the values of a reference property refer to an existing object in the range of the reference property.
Frozen Value Constraints: require that the value of a property with this constraint must not be changed after it has been assigned initially.

The visual language of UML class diagrams supports defining integrity constraints with the help of invariants expressed either in plain English or in the Object Constraint Language (OCL) and shown in a special type of rectangle attached to the model element concerned. We use UML class diagrams for modeling constraints in design models that are independent of a specific programming language or technology platform.

UML class diagrams provide special support for expressing multiplicity (or cardinality) constraints. This type of constraint allows to specify a lower multiplicity ( minimum cardinality) or an upper multiplicity (maximum cardinality), or both, for a property or an association end. In UML, this takes the form of a multiplicity expression l..u where the lower multiplicity l is either a non-negative integer or the special value *, standing for unbounded, and the upper multiplicity u is either a positive integer not smaller than l or the special value *. For showing property multiplicity constrains in a class diagram, multiplicity expressions are enclosed in brackets and appended to the property name in class rectangles, as shown in the Person class rectangle below.

Since integrity maintenance is fundamental in database management, the data definition language part of the relational database language SQL supports the definition of integrity constraints in various forms. On the other hand, however, integrity constraints and data validation are not supported at all in common programming languages such as PHP, Java, C# or JavaScript. It is therefore important to choose an application development framework that provides sufficient support for constraint validation. Notice that in HTML5, there is some support for validation of user input data in form fields.

In the following sections, we discuss the different types of property constraints listed above in more detail. We also show how to express them in UML class diagrams, in SQL table creation statements, in JavaScript model class definitions, as well as in the annotation-based frameworks Java Persistency API (JPA) and ASP.NET's DataAnnotations.

String Length Constraints

The length of a string value for a property such as the title of a book may have to be constrained, typically rather by a maximum length, but possibly also by a minimum length. In an SQL table definition, a maximum string length can be specified in parenthesis appended to the SQL datatype CHAR or VARCHAR, as in VARCHAR(50).

Mandatory Value Constraints

A mandatory value constraint requires that a property must have a value. This can be expressed in a UML class diagram with the help of a multiplicity constraint expression where the lower multiplicity is 1. For a single-valued property, this would result in the multiplicity expression 1..1, or the simplified expression 1, appended to the property name in brackets. For example, the following class diagram defines a mandatory value constraint for the property name:

Whenever a class rectangle does not show a multiplicity expression for a property, the property is mandatory (and single-valued), that is, the multiplicity expression 1 is the default for properties.

In an SQL table creation statement, a mandatory value constraint is expressed in a table column definition by appending the key phrase NOT NULL to the column definition as in the following example:

CREATE TABLE persons(
  name  VARCHAR(30) NOT NULL,
  age   INTEGER
)

According to this table definition, any row of the persons table must have a value in the column name, but not necessarily in the column age.

In JavaScript, we can encode a mandatory value constraint by a class-level check function that tests if the provided argument evaluates to a value, as illustrated in the following example:

Person.checkName = function (n) {
  if (n === undefined) {
    return "A name must be provided!"; // constraint violation error message
  } else {
    return ""  // no constraint violation
  }
};

In JPA, the mandatory property name is annotated with NotNull in the following way:

@Entity
public class Person {
    @NotNull
    private String name;
    private int age;
}

The equivalent ASP.NET's Data Annotation is Required as shown in

public class Person{
    [Required]
    public string name { get; set; }
    public int age { get; set; }
}

Range Constraints

A range constraint requires that a property must have a value from the value space of the type that has been defined as its range. This is implicitly expressed by defining a type for a property as its range. For instance, the attribute age defined for the object type Person in the diagram above has the range Integer, so it must not have a value like "aaa", which does not denote an integer. However, it may have values like -13 or 321, which also do not make sense as the age of a person. In a similar way, since its range is String, the attribute name may have the value "" (the empty string), which is a valid string that does not make sense as a name.

We can avoid allowing negative integers like -13 as age values, and the empty string as a name, by assigning more specific datatypes to these attributes, such as NonNegativeInteger to age, and NonEmptyString to name. Notice that such more specific type definitions are neither predefined in SQL nor in common programming languages, so we have to implement them either by user-defined types, as supported in SQL-99 database management systems such as PostgreSQL, or by using suitable additional constraints such as interval constraints, which are discussed in the next section.

In JavaScript, we can encode the range constraint NonEmptyString as illustrated in the following example:

Person.checkName = function (n) {
  if (!(typeof(n) === "string") || n.trim() === "") {
    return "Name must be a non-empty string!";
  } else return "";
};

This check function detects and reports a constraint violation if the given value for the name property is not of type "string" or is an empty string.

In JPA, for declaring empty strings as non-admissible we must set the context parameter

javax.faces.INTERPRET_EMPTY_STRING_SUBMITTED_VALUES_AS_NULL

to true in the web deployment descriptor file web.xml.

In ASP.NET's Data Annotations, empty strings are non-admissible by default.

Interval Constraints

An interval constraint requires that an attribute's value must be in a specific interval, which is specified by a minimum value or a maximum value, or both. Such a constraint can be defined for any attribute having an ordered type, but normally we define them only for numeric datatypes or calendar datatypes. For instance, we may want to define an interval constraint requiring that the age attribute value must be in the interval [0,120]. In a class diagram, we can define such a constraint as an "invariant" and attach it to the Person class rectangle, as shown in the following diagram:

In an SQL table creation statement, an interval constraint is expressed in a table column definition by appending a suitable CHECK clause to the column definition as in the following example:

CREATE TABLE persons(
  name  VARCHAR(30) NOT NULL,
  age   INTEGER CHECK (age >= 0 AND age <= 120)
)

In JavaScript, we can encode an interval constraint in the following way:

Person.checkAge = function (a) {
  if (a < 0 || a > 120) {
    return "Age must be between 0 and 120!";
  } else return "";
};

In JPA, we express this interval constraint by adding the annotations Min(0) and Max(120) to the property age in the following way:

@Entity
public class Person {
    @NotNull
    private String name;
    @Min(0)
    @Max(120)
    private int age;
}

The equivalent ASP.NET's Data Annotation is Range(0,120) as shown in

public class Person{
    [Required]
    public string name { get; set; }
    [Range(0,120)]
    public int age { get; set; }
}

Pattern Constraints

A pattern constraint requires that a string attribute's value must match a certain pattern as defined by a regular expression. For instance, for the object type Book we define an isbn attribute with the datatype String as its range and add a pattern constraint requiring that the isbn attribute value must be a 10-digit string or a 9-digit string followed by "X" to the Book class rectangle shown in

In an SQL table creation statement, a pattern constraint is expressed in a table column definition by appending a suitable CHECK clause to the column definition as in the following example:

CREATE TABLE books(
  isbn   VARCHAR(10) NOT NULL CHECK (isbn ~ '^\d{9}(\d|X)$'),
  title  VARCHAR(50) NOT NULL
)

The ~ (tilde) symbol denotes the regular expression matching predicate and the regular expression ^\d{9}(\d|X)$ follows the syntax of the POSIX standard (see, e.g. the PostgreSQL documentation).

In JavaScript, we can encode a pattern constraint by using, the built-in regular expression function test, as illustrated in the following example:

Book.checkIsbn = function (id) {
  if (!/\b\d{9}(\d|X)\b/.test( id)) {
    return "The ISBN must be a 10-digit string or a 9-digit string followed by 'X'!";
  } else return "";
};

In JPA, this pattern constraint for isbn is expressed with the annotation Pattern in the following way:

@Entity
public class Book {
    @NotNull
    @Pattern(regexp="^\\(\d{9}(\d|X))$")
    private String isbn;
    @NotNull
    private String title;
}

The equivalent ASP.NET's Data Annotation is RegularExpression as shown in

public class Book{
    [Required]
    [RegularExpression(@"^(\d{9}(\d|X))$")]
    public string isbn { get; set; }
    public string title { get; set; }
}

Cardinality Constraints

A cardinality constraint requires that the cardinality of a multi-valued property's value set is not less than a given minimum cardinality or not greater than a given maximum cardinality. In UML, cardinality constraints are called multiplicity constraints, and minimum and maximum cardinalities are expressed with the lower bound and the upper bound of the multiplicity expression, as shown in the following class diagram, which contains two examples of properties with cardinality constraints:

The attribute definition nickNames[0..3] in the class Person specificies a minimum cardinality of 0 and a maximum cardinality of 3, with the meaning that a person may have no nickname or at most 3 nicknames. The reference property definition members[3..5] in the class Team specificies a minimum cardinality of 3 and a maximum cardinality of 5, with the meaning that a team must have at least 3 and at most 5 members.

It's not obvious how cardinality constraints could be checked in an SQL database, as there is no explicit concept of cardinality constraints in SQL, and the generic form of constraint expressions in SQL, assertions, are not supported by available DBMSs. However, it seems that the best way to implement a minimum (resp. maximum) cardinality constraint is an on-delete (resp. on-insert) trigger that tests the number of rows with the same reference as the deleted (resp. inserted) row.

In JavaScript, we can encode a cardinality constraint validation by testing the size of the property's value set, as illustrated in the following example:

Person.checkNickNames = function (nickNames) {
  if (nickNames.length > 3) {
    return "There must be no more than 3 nicknames!";
  } else return "";
};

With Java Bean Validation annotations, we can specify

@Size( max=3) 
List<String> nickNames
@Size( min=3, max=5) 
List<Person> members

Uniqueness Constraints

A uniqueness constraint (or key constraint) requires that a property's value is unique among all instances of the given object type.

For instance, for the object type Book we should define the isbn attribute to be a key (unique). We can express this in a UML class diagram by appending the (user-defined) property modifier keyword key in curly braces to the attribute's definition in the Book class rectangle shown in the following diagram:

In an SQL table creation statement, a uniqueness constraint is expressed by appending the keyword UNIQUE to the column definition as in the following example:

CREATE TABLE books(
  isbn   VARCHAR(10) NOT NULL UNIQUE,
  title  VARCHAR(50) NOT NULL
)

In JavaScript, we can encode this uniqueness constraint by a check function that tests if there is already a book with the given isbn value in the books table of the app's database.

In JPA, this uniqueness constraint for isbn is expressed with the annotation Column(unique=true) in the following way:

@Entity
public class Book {
    @NotNull
    @Column(unique=true)
    private String isbn;
    @NotNull
    private String title;
}

Apparently, there is no equivalent ASP.NET's Data Annotation. Consequently, a custom ValidationAttribute would have to be defined.

Frozen Value Constraints

A frozen value constraint defined for a property requires that the value of this property must not be changed after it has been assigned initially. Typical examples of properties with a frozen value constraint are event properties, since the semantic principle that the past cannot be changed prohibits that the property values of past events can be changed.

Standard Identifiers (Primary Keys)

An attribute (or, more generally, a combination of attributes) can be declared to be the standard identifier for objects of a given type, if it is mandatory and unique. We can indicate this in a UML class diagram with the help of the property modifier {id} appended to the declaration of the attribute isbn as shown in the following diagram:

Notice that a standard identifier declaration implies both a mandatory value and a uniqueness constraint on the attribute concerned.

Standard identifiers are called primary keys in relational databases. We can declare an attribute to be the primary key in an SQL table creation statement by appending the phrase PRIMARY KEY to the column definition as in the following example:

CREATE TABLE books(
  isbn   VARCHAR(10) PRIMARY KEY,
  title  VARCHAR(50) NOT NULL
)

In JavaScript, we cannot easily encode a standard identifier declaration, because this would have to be part of the metadata of the class definition, and there is no standard support for such metadata in JavaScript. However, we should at least check if the given argument violates the implied mandatory value or uniqueness constraints by invoking the corresponding check functions discussed above.

In JPA, a standard identifier declaration for isbn is expressed with the annotation Id in the following way:

@Entity
public class Book {
    @Id
    private String isbn;
    @NotNull
    private String title;
}

The equivalent ASP.NET's Data Annotation is Key as shown in

public class Book{
    [Key]
    public string isbn { get; set; }
    public string title { get; set; }
}

Referential Integrity Constraints

References are important information items in an application's database. However, they are only meaningful, when their referential integrity is maintained by the app. This requires that for any reference, there is a referenced object in the database.

A reference can be either human-readable or an internal object references. Human-readable references refer to identifiers that are used in human communication, such as the unique names of astronomical bodies, the ISBN of books and the employee numbers of the employees of a company. Internal object references refer to the memory addresses of objects, thus providing an efficient mechanism for accessing objects in the main memory of a computer.

Some languages, like SQL and XML, support only human-readable, but not internal references. Human-readable references are called foreign keys, and the identifiers they refer to are called primary keys, in SQL. In XML, human-readable references are called ID references and the corresponding attribute type is IDREF.

Objects can be referenced either with the help of human-readable references (such as integer codes) or with internal object references, which are preferable for accessing objects efficiently in main memory. Following the XML terminology, we also call human-readable references ID references and use the suffix IdRef for the names of human-readable reference properties. When we store persistent objects in the form of records or table rows, we need to convert internal object references, stored in properties like chair, to ID references, stored in properties like chairIdRef. This conversion is performed as part of the serialization of the object by assigning the standard identifier value of the referenced object to the ID reference property of the referencing object.

In object-oriented languages, a property is defined for an object type, or class, which is its domain. The values of a property are either data values from some datatype, in which case the property is called an attribute, or they are object references referencing an object from some class, in which case the property is called a reference property. For instance, the class Committee shown in the following class diagram contains the reference property chair, which references objects of type ClubMember:

The object reference property chair could be replaced by an ID reference property chairIdRef which would need to be annotated as referencing the standard identifier attribute of the referenced class as in the following diagram:

Both ways of expressing the reference property are informationally equivalent. In an SQL table definition, we append the keyword REFERENCES to the column definition of an ID reference property, as in the following example:

CREATE TABLE committees(
  name   VARCHAR(20) PRIMARY KEY,
  chairIdRef  INTEGER NOT NULL REFERENCES club_members
)

In JPA, we use one of the annotations OneToOne, OneToMany, ManyToOne or ManyToMany for annotating a reference property and the functionality type of the association represented by it. In ASP.NET we use one of the annotations ForeignKey or InverseProperty.

Any reference property p with domain class C and range class D implies a referential integrity constraint that has to be checked whenever

a new object of type C is created,
the value of p is changed for some object of type C,
an object of type D is destroyed.

A referential integrity constraint also implies two change dependencies:

An object creation dependency: an object with a reference to another object can only be created after the referenced object has been created.
An object destruction dependency: an object that is referenced by another object can only be destroyed after
1. the referencing object is destroyed first, or
2. the reference in the referencing object is either deleted or replaced by a another reference.

For every reference property in our app's model classes we have to choose, which of these two possible deletion policies applies.

In certain cases, we may want to relax this strict regime and allow creating objects that have non-referencing values for an ID reference property.

Typically, object creation dependencies are managed in the user interface by not allowing the user to enter a value of an ID reference property, but only to select one from a list of all existing target objects.

Defining and Using Constraint Violation Classes

While JPA and ASP.NET annotations support (almost) all important cases of property constraints, many other, especially JavaScript- and PHP-based, frameworks do not provide much support for standard constraints. In these cases it is helpful to define your own constraint violation classes that allow you to throw corresponding errors whenever a constraint has been violated. The following list of JavaScript class definitions exemplifies this approach:

function NoConstraintViolation() {
  this.message = "";
};
function ConstraintViolation( msg, culprit) {
  this.message = msg;
  if (culprit) this.culprit = culprit;
};

function MandatoryValueConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
MandatoryValueConstraintViolation.prototype = new ConstraintViolation();
MandatoryValueConstraintViolation.prototype.constructor = MandatoryValueConstraintViolation;

function RangeConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
RangeConstraintViolation.prototype = new ConstraintViolation();
RangeConstraintViolation.prototype.constructor = RangeConstraintViolation;

function IntervalConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
IntervalConstraintViolation.prototype = new ConstraintViolation();
IntervalConstraintViolation.prototype.constructor = IntervalConstraintViolation;

function PatternConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
PatternConstraintViolation.prototype = new ConstraintViolation();
PatternConstraintViolation.prototype.constructor = PatternConstraintViolation;

function UniquenessConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
UniquenessConstraintViolation.prototype = new ConstraintViolation();
UniquenessConstraintViolation.prototype.constructor = UniquenessConstraintViolation;

function OtherConstraintViolation( msg, culprit) {
  ConstraintViolation.call( this, msg, culprit);
};
OtherConstraintViolation.prototype = new ConstraintViolation();
OtherConstraintViolation.prototype.constructor = OtherConstraintViolation;

An example of how these classes can be used in a function for checking the mandatory value constraint and the range constraint for the property title of a model class Book is the following:

Book.checkTitle = function (t) {
  if (!t) {
    return new MandatoryValueConstraintViolation("A title must be provided!");
  } else if (typeof(t) !== "string" || t.trim() === "") {
    return new RangeConstraintViolation("The title must be a non-empty string!");
  } else {
    return new NoConstraintViolation();
  }
};

Constraint Validation in MVC Applications

Unfortunately, many MVC application development frameworks do not provide sufficient support for integrity constraints and data validation.

Integrity constraints should be defined in the model classes of an MVC app since they are part of the business semantics of a model class (representing a business object type). However, a more difficult question is where to perform data validation? In the database? In the model classes? In the controller? Or in the user interface? Or in all of them?

A relational database management system (DBMS) performs data validation whenever there is an attempt to change data in the database, provided that all relevant integrity constraints have been defined in the database. This is essential since we want to avoid, under all circumstances, that invalid data enters the database. However, it requires that we somehow duplicate the code of each integrity constraint, because we want to have it also in the model class to which the constraint belongs.

Also, if the DBMS would be the only application component that validates the data, this would create a latency, and hence usability, problem in distributed applications because the user would not get immediate feedback on invalid input data. This problem is well-known from classical web applications where the frontend component submits the user input data via HTML form submission to a backend component running on a remote web server. Only this backend component validates the data and returns the validation results in the form of a set of error messages to the front end, Only then, typically several seconds later, and in the hard-to-digest form of a bulk message does the user get the validation feedback. This approach is no longer considered acceptable today. Rather, the user should get immediate validation feedback on each single input data item.

So, we need a validation mechanism in the user interface (UI). Fortunately, the new HTML5 form validation API provides help (also the jQuery Validation Plugin supports form validation). But it is not sufficient to perform data validation in the user interface. We also need to do it in the model classes, and in the DBMS, for making sure that no flawed data enters the application's persistent data store. This creates the problem of how to maintain the constraint definitions in one place (the model), but use them in two or three places (at least in the model classes and in the UI, and possibly also in the database). I have called this the multiple validation problem in this stackoverflow post where I also sketch a solution for JavaScript frontend apps. The multiple validation problem can be solved in different ways:

Define the constraints in a declarative language (such as Java Persistency Annotations or ASP.NET Data Annotations) and generate the backend/model and frontend/UI validation code (typically, in a backend application programming language such as Java or C#, and in JavaScript).
Keep your validation functions in the model on the backend, and invoke them from the UI via XHR.
Use JavaScript as your backend application programming language (such as with NodeJS), then you can encode your validations in JavaScript and execute them on your backend or in the user interface on the frontend.

The simplest, and most responisve, approach is the third one, using only JavaScript both in the backend and frontend components.

You can find more web engineering resources, including open access books, on web-engineering.info.

History

3^rd April, 2014
- Version 1 created
4^th April, 2014
- Improved English
- Added SQL example for a reference property
- Added links to a related stackoverflow post and to web-engineering.info
- Replaced SVG image links with uploaded PNG images
- Improved the content of the last subsection

9^th April, 2014
- Added JPA and ASP.NET annotation examples
- Added section on "Defining and Using Constraint Violation Classes"
- Added links to HTML5 form validation and to the jQuery Validation Plugin
- Added a short discussion on how to approach the multiple validation problem
23^rd June 2014
- Added String Length Constraints, updated the code of Constraint Violation Classes
1st October 2014
- Added a discussion of Cardinality Constraints and Frozen Value Constraints
22 September 2015
- Updated hyperlinks
- Updated diagrams (using the UML property modifiers "key" and "id")
- Improved JS code fragments