Integration Component
Integration Component
There are two main aspects to consider regarding integration. These are:
- Format Integration.
- Semantic Integration.
Format Integration
Format integration is best described as ensuring 'domain consistency'. Attributes in one system may not share the same domain as those in another system. Examples:
- Bank account or telephone numbers can be stored as type CHAR or INTEGER.
- Sex can be stored as 'male', 'female', 'm', 'f', 'M', 'F' or 1 or 0.
- Dates can be stored in a variety of methods including timestamps or integers of seconds from a particular time.
- Money can be stored as integers or real numbers.
- String values for address details can be different sizes.
Format mismatches are extremely common where data is extracted from the following situations:
- Underlying hardware is different.
- Operating system is different.
- Application software is different.
Integration rules ensure that all data that is loaded into the data warehouse is standardized into the same formats. Same date values, smae monetary values etc. The rules are a form of mapping specifying how the data has to be converted before inclusion in the data warehouse.
Semantic Integration
Semantics are the meaning of data. If there are no semantics the data is meaningless. This is particularly important in a data warehouse as the data in it will be used to enable decision making that may have far reaching consequences. The wrong decision based upon inaccurate semantics may be very damaging to a business.
In order to assist in the semantics a data warehouse will include a catalogue. This will describe precisely each attribute in the warehouse. The catalogue is metadata, data about data.
Comments, suggestions, ideas to
Stuart Banner
