Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Querying distributed heterogeneous structured and semi-structured data sources

Al-Wasil, Fahad M. 2007. Querying distributed heterogeneous structured and semi-structured data sources. PhD Thesis, Cardiff University.

[thumbnail of U584905.pdf] PDF - Accepted Post-Print Version
Download (9MB)

Abstract

The continuing growth and widespread popularity of the internet means that the collection of useful data available for public access is rapidly increasing both in number and size. These data are spread over distributed heterogeneous data sources like traditional databases or sources of various forms containing unstructured and semi-structured data. Obviously, the value of these data sources would in many cases be greatly enhanced if the data they contain could be combined and queried in a uniform manner. The research work reported in this dissertation is concerned with querying and integrating a multiplicity of distributed heterogeneous structured data residing in relational databases and semi-structured data held in well- formed XML documents produced by internet applications or human- coded. In particular, we have addressed the problems of: (1) specifying the mappings between a global schema and the local data sources' schemas, and resolving the heterogeneity which can occur between data models, schemas or schema concepts (2) processing queries that are expressed on a global schema into local queries. We have proposed an approach to combine and query the data sources through a mediation layer. Such a layer is intended to establish and evolve an XML Metadata Knowledge Base (XMKB) incrementally which assists the Query Processor in mediating between user queries posed over the global schema and the queries on the underlying distributed heterogeneous data sources. It translates such queries into sub-queries -called local queries- which are appropriate to each local data source. The XMKB is built in a bottom-up fashion by extracting and merging incrementally the metadata of the data sources. It holds the data source's information (names, types and locations), descriptions of the mappings between the global schema and the participating data source schemas, and function names for handling semantic and structural discrepancies between the representations. To demonstrate our research, we have designed and implemented a prototype system called SISSD (System to Integrate Structured and Semi- structured Databases). The system automatically creates a GUI tool for meta-users (who do the metadata integration) which they use to describe mappings between the global schema and local data source schemas. These mappings are used to produce the XMKB. The SISSD allows the translation of user queries into sub-queries fitting each participating data source, by exploiting the mapping information stored in the XMKB. The major results of the thesis are: (1) an approach that facilitates building structured and semi-structured data integration systems (2) a method for generating mappings between a global and local schemas' paths, and resolving the conflicts caused by the heterogeneity of the data sources such as naming, structural, and semantic conflicts which, may occur between the schemas (3) a method for translating queries in terms of a global schema into sub-queries in terms of local schemas. Hence, the presented approach shows that: (a) mapping of the schemas' paths can only be partially automated, since the logical heterogeneity problems need to be resolved by human judgment based on the application requirements (b) querying distributed heterogeneous structured and semi-structured data sources is possible

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
ISBN: 9781303207846
Date of First Compliant Deposit: 30 March 2016
Last Modified: 09 Jan 2018 19:15
URI: https://orca.cardiff.ac.uk/id/eprint/56144

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics