Distributed Database
Location Independence
A user process has no requirement to know where the data is located. Therefore there is no requirement for connection management. An additional layer of software is required to provide the location independence - a DDBMS (Distributed Database Management System).
A further requirement is a Global Logical Schema and a Distribution Schema. The Global Schema fulfils the requirements of a logical schema to describe all data for a distributed user. A Distribution Schema fulfils the role of a storage schema in defining where data is located in a distributed database.
Properties of a DDBMS
- A single view of data. User has privileges. No need to know where data is.
- Support transactions, maintain integrity, recover from network/system failures.
- Security process that prevents unauthorised access while allowing uniform access for users.
- Operate over wide variety of platforms and networks.
Location of Global and Distributed Schemas
- Replicated copies of schemas in all locations
- One copy in one location
- Different parts in different locations
Advantages and disadvantages of one copy in one location:
- Advantages - Prevention of inconsistencies from duplication and ease of maintenance.
- Disadvantages - All user processes require access causing processing bottlenecks. If their is a hardware/software failure, user processes will not be able to access any database including their own local database.
Local Autonomy
Local users can access their own local database direct through their own DBMS, though they cannot access the distributed databases. Distributed users still need to use the DDBMS even though they are at the same location.
Fragmentation
Data distribution to different locations can be achieved by:
- Horizontal fragmentation - a subset of complete rows is stored in one location - Use unions
- Vertical fragmentation - some columns of a table in one location and some in others - Use joins
- A combination of both the above
Moving data between systems is the slowest part of the process therefore - place data so that most processing is done locally
Distribution Optimization
Minimize data transfer over the network
Do most processing locally
Updating
This is the responsibilty of the DDBMS and not the user.
Replication under a DDBMS
One location holds all the primary copies of data. This can introduce bottlenecking, overloading and ultimate failure.
Distributing primary copies across the network frees up lock co-ordination and helps to reduce bottlenecks.
Both methods can have back-up sitesto aid recovery.
Comments, suggestions, ideas to
Stuart Banner
