Entity - Relationship Modelling - Developing a conceptual data model
Establishing requirements
There is no one, correct model. The final model arrived at will have been the subject of choices. As with most things there are guidelines/rules in helping to limit the choices available.
- A statement of data requirements is a document written in natural language. Open to ambiguity and interpretation. Subjects have to be distinguished from data to be recorded and connections found between subjects.
- The variety of sources of requirements has an influence. Many users with varying requirements. How the data is passed between users and the files for storing data is another varying requirement.
Language can also present a barrier:
- One group of users may use one word to mean one thing and a second group may use the same word with a different meaning. (Homonym)
- Likewise different people may use different words for the same subject. (Synonym)
- Properties/identifiers should be examined to establish if synonyms are being used.
- If a single entity type shows data about two different subjects with multiple candidate identifiers, the candidate identifiers should be examined. They may identify different subjects
A statement of requirements should not be viewed as something 'set in stone'. Assumptions may have to be made.
- The statement of requirements may not reflect the need for flexibility should any future change be required.
- Making assumptions reflect the developers choice and not the users and should be avoided if possible.
The users should be consulted and further discussion undertaken in an effort to resolve data requirement difficulties and in order to clarify any assumptions made. Further discussion may lead to an overall improvement in the users view of the data and/or new data. Developers have to work with users in order to arrive at a conceptual data model.
Database design and development
A conceptual data model is used to form the basis of the database design. Of importance is the fact that the conceptual data model should not be influenced by the design.
A conceptual data model is - 'a representation of what data a database should contain, expressed in terms that are independent of how it should be realized':
- The data requirements are formally represented in a way understable to people and not computers. It is simple, avoiding unneccessary complexity. Names of entities and attributes should reflect subjects and data being modelled.
- Users requirements relevant only to the design and implementation of a database must not be included. In particular, storage and processing, data formatting, volumes, usage and access.
A conceptual data model is independent of any changes in the database design, implementation and the DBMS used to format and process the data.
The design of a database is however dependent on the data model. The meaning and interpretation of data in a database must always reflect the semantics of the data as expressed by the data model. Any change to a database that affects the meaning of the data is a change to the data requirements and ultimately the data model.
A good conceptual data model should:
- represent all users data requirements
- have no duplication
- include all constraints
- be general (not restricted by initial requirements)
- be understable (simple, no unneccessary complexity)
Summary
Data modelling has variations in its methods. Data requirements can be modelled in different ways. An initial solution will probably not be a final solution. Any model should reflect the principles of a good conceptual data model. The steps outlined below are not prescriptive or definitive but form a starting point.
- Establish potential entity types: subjects with a common basis referred to by a noun (noun phrase) having data associated with them.
- Establish potential relations between these entity types: connections between subjects referred to by a verb.
- Establish potential attributes of these entity types: data about subjects particularly data that marks individual subjects from each other.
- Produce an initial E-R model with entity types plus their identifiers, relationships by degree and include assumptions.
- Refine the initial E-R model to include participation conditions for relationships together with constraints and assumptions.
- Review the data model. Eliminate redundant relationships. Resolve m:n relationships. Examine possible complex data. Remove derived data. Consider using entity subtypes.
- Check that the model reflects the principles of a good conceptual data model. Improve the data model where appropriate.
Comments, suggestions, ideas to
Stuart Banner
