Wednesday, June 5, 2019

OLAP Multidimensional Database Concept

OLAP Multidimensional informationbase opinionCHAPTER 2LITERATURE REVIEW2.1 INTRODUCTIONThis chapter is designed to provide background information and reviewing the characteristics of information store, OLAP 4-dimensional entropybase Concept, info tap instance and the application of information mining. Within this research, the concept, design and implementation approaches in developing a complete entropy w arho white plague engine room theoretical account for deploying a supremacyful model with the integration of OLAP Multidimensional Database and selective information mining model. voice 2.2 discussed about the fundamental of selective information w atomic number 18house, entropy w arhouse model and likewise the Extract, Transform and Loading (ETL) of raw informationbase to selective information storage store. It includes research and study on existing data w arehouse models authored by William Inmon, Ralph Kimb in every last(predicate) and non-homogeneous scholars venturing into data storage store models. Section 2.3 introduces background information of OLAP. It includes the studies and research on various OLAP models, OLAP computer architecture and concept on handleing multidimensional databases, multidimensional database schemas design and implementation in this research. It includes studies and research on schema design and implementation. Section 2.4 introduces fundamental information of data mining. It includes studies and research on the available techniques, method and process for OLAP Data Mining. Section 2.5 discussed the product comparisons for data warehouse, data mining and OLAP by Mitch Kramer. It includes the reason why Microsoft is used to design and implement the new proposed model.In this literature review, introduction to the relationships mingled with data warehouse, OLAP multidimensional database and data mining model for deploying four experimental applications for benchmarking. This research also proves th at the new proposed model data warehouse technology framework is ready to transmute any type of raw data into useful information. It will also help us to review the new proposed model of apiece existing data warehouse OLAP Multidimensional database framework.2.2 DATA WAREHOUSEAccording to William Inmon (1999), cognize as the Father of Data Warehousing, data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of the managements decision-making process. Data warehouse is a database containing data that usually represents the production line history of an administration. This historical data is used for analysis that supports muckleiness decisions at umpteen levels, from strategic planning to performance evaluation of a discrete organisational unit.Data store is a type of database carcass aimed at strong integration of in operation(p) databases into an environment that enables strategic use of data (G. Zhou et al., 1995 ). These technologies include relational and multidimensional database management systems, client/ master of ceremonies architecture, meta-data modelling and repositories, graphical user interface and much more (J. Hammer et al., 1995 V. Harinarayan et al., 1996).Data warehouse currently are much a subject of researched is non only vulgarly used in business or finance sector but can buoy be applied appropriately in various sectors. Data warehouse are designed for analyzing or processing of data into useful information using data mining tools for lively decision-making. Data warehouse provides access to difficult environments of an enterprise dataIn these literature studies, two authorised authors are identified as the main contributors and co-founder in the part of Data warehouse, William Inmon (1999 2005) and Ralph Kimball (1996, 2000). Both author perceptions on data warehouse design and architecture differ from unity another. According to Inmon (1996), data warehouse is a dependent data mart structure, whereas Kimball (1999) moldd data warehouse as a bus structure which is a combination of data mart populated together as a data warehouse. put over 2.1 discussed the differences in data warehouse ideology between William Inmon and Ralph Kimball.Table 2.1 William Inmon and Ralph Kimball Data Warehouse DifferencesWilliam InmonRalph KimballParadigmInmons Paradigm An enterprise has one data warehouse, and data marts source their information from the data warehouse. Information is stored in 3rd normal form.Kimballs Paradigm Data warehouse is the collection of heterogeneous data marts within the enterprise. Information is always stored in the dimensional model.ArchitectureArchitecture Using TOP-DOWN approachArchitecture Using Bottom-up approachConceptDatas integration from various systems to centralized repositoryConcept of dimensional modelling (Bridging between Relational and multidimensional DB) stick outThe design pattern dependent on 3rd normalization form, conception is for data granularity.Datas marts are connected in a bus structure. Datas marts are the union of data warehouse. This approach is cognize also as Virtual Data Warehouse.ETL MethodsDatas extraction from operational data sources. Data are feed in staging database area. Data are then change, integrate, and unify and transfer to Operational Data Store database. Data are then load to data mart.Data extracted from legacy system and then consolidated and support in staging database. Data feed into ODS and more data us added/updated. Operational Data Store contains fresh counterpart data that is integrated and qualifyed to the data mart structure.Data martData Marts are available as a subset of the data warehouse.Data Marts can be placed at divers(prenominal) at disparate servers or in geographical locations.Based on this Data Warehouse literature, both Inmon (2005) and Kimball (2000) have different philosophies, but they do have similar agreement on a self-mad e design and implementation of data warehouse and data marts are mainly depending on the effective collection of operational data and validation of data mart. Both approaches having the same database staging concepts and ETL process of data from a database source. They also have a common understanding that independent data marts or data warehouses cannot fulfil the requirements of end users on an enterprise level for precise, timed and relevant data.2.2.1 DATA WAREHOUSE architectureData warehouse architecture is a wide research area. It has many different sub-areas and it can be treated with different approaches in terms or analysis, design and implementation by different enterprise. In this research studies, the aim is to provide a complete view on data warehouse architecture. dickens important scholars Thilini (2005) and Eckerson (2003) from TDWI will discussed in more details on the topic on data warehouse architecture.According to Eckerson (2003), onwards implementing a supr emacyful business intelligence systems where users can use programs like specialized reporting tools, OLAP tools and data mining tools upfront, a data warehouse architecture model mainly slim down on the database staging process from different integrated OLTP systems is responsible for the ETL to the whole process workable. Thilini (2005) conducted a two stage study survey on investigate which factors whitethorn influence the trainion of data warehouse architecture. In Thilini literature study, there are five data warehouse architectures that are practice today as shown in Table 2.2.Table 2.2 Data Warehouse Architectures (Adapted from Thilini, 2005)Data Warehouse ArchitectureIndependent Data MartsIndependent data marts also known as localized and small sized data warehouses. It is mainly used by departments, divisions or regions of company to provide own operational databases. The data marts are different as the structures are different from different location with in concordant database design which makes it difficult to analyze across the data marts. Thilini (2005) cited the work of Winsberg (1996) and Hoss (2002) that It is common for memorial tabletal units to develop their own data marts. Data marts are best used as a prototype for adhoc data warehouse and as for evaluation forrader building a real data warehouse.Data Mart Bus ArchitectureKimball (1996) pioneered the designed and architecture of data warehouse with unions of data marts which are known as the bus architecture. Bus architecture Data Warehouse is derived from the unions of the data marts which are also known as Virtual Data Warehouse.Bus architecture allows data marts not only located in one server but it can be also being located on different server. This allows the data warehouse to functions more as virtual reality mode and gathered all data marts and process as one data warehouse.hub-and-spoke architectureInmon (2005) developed Hub and Spoke architecture. The hub is the central ser ver taking care of information exchange and the spoke handle data transformation for all regional operation data stores. Hub and Spoke mainly boil downed on building a scalable and maintainable infrastructure for data warehouse.Centralized Data Warehouse ArchitectureCentral data warehouse architecture nearly similar to hub-and-spoke architecture without the dependent data marts. This architecture copies and stores heterogeneous operational and out-of-door data to a single and consistent data warehouse. This architecture has only one data model which are consistent and complete from all data sources.According to Inmon (1999) and Kimball (2000), central data warehouse should have Database staging or known as Operational Data Store as an intermediate full point for operational processing of data integration before transform into the data warehouse.Federated ArchitectureAccording to Hackney (2000), Federated Data Warehouse is a integration of multiple heterogeneous data marts, datab ase staging or Operational data store, combination of analytical application and reporting systems. The concept of federated focus on framework of integration to make data warehouse as greatest as possible. Jindal (2004) conclude that federated data warehouse approach are a practical approach for a data warehouse architecture as it is focus on higher reliability and provide excellent evaluate if it is well defined, documented and integrated business rules.Thilini (2005) conclude that hub and spoke and centralized data warehouse architectures are similar and the survey scores are al more or less the same. Hub and spoke is faster and easier to implement because no data mart are required. For centralized data warehouse architecture scored higher than hub and spoke as for spur needs for relatively fast implementation approach.A data warehouse is a read-only data source where end-users are not allow to change the determine or data elements. Inmons (1999) data warehouse architecture st rategy are different from Kimballs (1996). Inmons data warehouse model splits data marts as a copy and distributed as an interface between data warehouse and end users. Kimballs views data warehouse as a unions of data marts. The data warehouse is the collections of data marts harmonize into one central repository. plat 2.1 illustrates the differences between Inmons and Kimballs data warehouse architecture adapted from Mailvaganam, H. (2007)Diagram 2.1 Inmons and Kimballs Data Warehouse Architecture (adapted from Mailvaganam, 2007)In this work, it is very important to identify which data warehouse architecture that is robust and scalable in terms of building and deploying enterprise wide systems. According to Laney (2000) and Watson, H. (2005), it is important to understand and select the appropriate data warehouse architecture and the success of the various architectures acclaimed by Watson. Analysis of this research proved that the most popular data warehouse architecture is hub- and-spoke proposed by Inmon as it is a centralized data warehouse with dependant data marts and second is the data mart bus architecture with dimensional data marts proposed by Kimball. The selection of the new proposed model will use the combination data warehouse architecture of hub-and-spoke and data mart bus architecture as the new proposed model data warehouse architecture are designed with centralized data warehouse and with data marts that can are used for multidimensional database modelling.2.2.2 DATA WAREHOUSE EXTRACT, TRANSFORM, LOADINGData warehouse architecture begins with extract, transform, fill (ETL) process to vouch the data passes the quality threshold. According to Evin (2001), it is prerequisite that right data are important and critical for the success on an enterprise. ETL are an important tool in data warehouse environment to ensure data in the data warehouse are cleansed from various systems and locations. ETLs are also responsible for running scheduled tas ks that extract data from OLTPs. Typically, a data warehouse is populated with historical information from within a particular organization (Bunger, C. J et al., 2001). The complete process descriptions of ETL are discussed in table 2.3.Table 2.3 Extract, Transform, and Load Process in Data Warehouse architectureProcessDescriptionsExtractExtract are the first process which involve in contemptible data from operational databases into database staging area or operational data store before populating into the data warehouse. In this stage, operational databases data need to be examined by extracting into the staging area for handling exceptions and fix all errors before it enters into data warehouse as this will save lots of time when loading into the data warehouse.TransformIn completion of data extraction in database staging area, it is then transform to ensure data integrity within the data warehouse. Transformation of data can be done in several methods such as filed mapping and a lgorithm comparisons.LoadAfter extract and transform of data, it is finally loaded into data warehouse (in Inmons model) or into data marts (in Kimballs model). Data loaded into data warehouse are quality data after the process of extraction where erroneous data are removed and data are transform to ensure integrity of the data.Calvanese, D. et al. (2001) highlight an enterprise data warehouse database tables whitethorn be populated with a wide variety of data sources from different locations and often including data providing information concerning a competitor business. Collecting all the different data and storing it in one central location is an extremely gainsay task where ETL can make it possible. ETL process as depicts in Diagram 2.2 begins with data extract from operational databases where data cleansing and scrubbing are done, to ensure all datas are validated. Then it is transformed to meet the data warehouse standards before it is loaded into data warehouse.Diagram 2.2Ex tract, Transport, Load ProcessG. Zhou et al.(1995) emphasise on data integration in data warehousing stress that ETL can assist in import and export of operational data between heterogeneous data sources using OLE-DB (Object linking and embedding database) based architecture where the data are transform to populate all quality data into data warehouse. This is important to ensure that there are no restrictions on the size of the data warehouse with this approach.In Kimball (2000) data warehouse architecture model depict in Diagram 2.3, the model focus in two important modules, the back room exhibit server and the front room. In the back room process, where the data staging swear outs in charge of gathering all source systems operational databases to perform extraction of data from source systems from different file format from different systems and platforms. Second step is to run the transformation process to ensure all inconsistency are removed to ensure data integrity. Finally, it is loaded into data marts. The ETL processes are commonly executed from a job control via scheduling task. The insertion server is the data warehouse where data marts are stored and process here. Data stored in star schema consist of dimension and fact tables. This is where data are then process of in the front room where it is access by query services such as reporting tools, desktop tools, OLAP and data mining tools.Diagram 2.3 Data Warehouse Architecture (adapted from Kimball, 2000)Nicola, M (2000) explains the process of retrieving data from the warehouse can vary greatly depending on the desired results. There are many form of possible retrieval from a data warehouses and it is flexibility that will drive how this retrieving process can be implemented. There are many tools for retrieving the data warehouse, such as building simple query and reporting done SQL subjectments. The tools may expand to OLAP and data mining, where the structure includes many more ordinal party tools. There are many inherent problems associated with data, which includes the limited amount of portability, and the often-vast amount of data that must be sifted through for from each one query.Essentially, ETL are obligatory for data warehouse to ensure data integrity. There are many factors to be considered such as complexness and scalability are among the two major issues that most enterprise faces by integrating information from different sources in order to have a clean and reliable source of data for mission critical business decisions. One way to achieve a scalable, non-complex solution is to adopt a hub-and-spoke architecture for the ETL process. According to Evin (2001), ETL best operates in hub-and-spoke architecture because of its flexibility and efficiency. Because of its centralized data warehouse design, it can influence the maintaining full access control of ETL processes. Also, empowers the usage of analytical and data mining tools by companionship workers.In this study on ETL for effective data warehouse architecture, it is known that Hub-and-spoke is best for data integration as it has the similarity in Inmon and Kimball architecture. The hub is the data warehouse after processing data from operational database to staging database and the spoke(s) are the data marts for distributing data. Inmon and Kimball also recommend same ETL processes to enable hub-and-spoke architecture. Sherman, R (2005) state that hub-and-spoke approach uses one-to-many interfaces from Data warehouse to many data marts. One-to-many are simpler to implement, cost effective in a long run and ensure consistent dimensions. Compared to many-to-many approach it is more complicated and costly.In this work on the new proposed model, hub-and-spoke architecture are use as Central repository service, as many scholars including Inmon, Kimball, Evin, Sherman and Nicola adopt to this data warehouse architecture. This approach allows locating the hub (data warehouse) and spo kes (data marts) centrally and can be distributed across local or wide area network depending on business requirement. In designing the new proposed model, the hub-and-spoke architecture clearly identifies six important data warehouse components that a data warehouse should have, which includes ETL, Staging Database or operational database store, Data marts, multidimensional database, OLAP and data mining end users applications such as Data query, reporting, analysis, statistical tools. However, this process may differ from organization to organization. Depending on the ETL setup, some data warehouse may overwrite old data with new data and in some data warehouse may only maintain history and audit trial of all changes of the data. Diagram 2.4 depicts the concept of the new proposed model data warehouse architecture.Diagram 2.4 New Proposed Model Data Warehouse Architecture2.2.3 DATA WAREHOUSE FAILURE AND SUCCESS FACTORSBuilding a data warehouse is indeed challenging as data warehou se ejection inheriting a unique characteristic that may impact the overall setup if the analysis, design and implementation phase are not properly done. In this research effort, it discusses the studies on failure and success factors in data warehouse project. Section 2.2.3.1 focuses on the investigation on data warehouse project failure and section 2.2.3.2 discuss and investigate mainly on the success factors by implementing the correct model to support a successful data warehouse project implementation.2.2.3.1 DATA WAREHOUSE FAILURE FACTORSHayen, R.L. (2007) studies shows that implementing a data warehouse project is costly and risky as a data warehouse project can cost over $1 million in the first year. It is estimated that one-half ad two-thirds of the effort of setting up the data warehouse projects attempt will fail eventually. Hayen R.L. (2007) citied on the work of Briggs (2002) and noticed three factors for the failure of data warehouse project that is Environment, Project and technical factors as shown in Diagram 2.5 and table 2.4 discussed the factors in more details.Diagram 2.5 Factors for Data Warehouse Failures (adapted from Briggs, 2002)Table 2.4 Factors for Data Warehouse Failures (adapted from Briggs, 2002)FactorsDescriptionsEnvironmentThis leads to organization changes in business, politics, mergers, takeovers and lack of top management support. Also, including human error, corporate culture, decision making and change management.TechnicalTechnical factors of a data warehouse project complexness and workload are taken too lightly where high expenses involving in hardware/software and people. Problems occurred when assigning a Project manager with lack of intimacy and project interpret in data warehouse costing may lead to impediment of quantifying the return on investment (ROI). Also, failure of managing a data warehouse projects also includes Challenge in setting up a competent operational and development team up plus not having a data warehouse manager or expert that is politically sound. Having an extended timeframe for development and delivery of data warehouse system may due to lack of experience and knowledge for selection of data warehouse products and end-user tools.* Failure to manage the scope of data warehouse project.ProjectPoor knowledge on the requirements of data definitions and data quality on different organization business departments. Also, Running a data warehouse projects with incompetent and insufficient knowledge in what technology to use may lead into problems later on data integration, data warehouse model and data warehouse applications.Vassiliadis (2004) studies shows that data warehouse project failures are an great threat and threatened by factors such as design, technical, procedural and socio-technical as illustrated in Diagram 2.6. These factors of failures are vital in finding any undesirable action for success. apiece factor group is described in table 2.5.Diagram 2.6 Factors for Data Warehouse Failures (adapted from Vassiliadis, 2007)Table 2.5 Factors for Data Warehouse Failures (adapted from Vassiliadis, 2007)FactorsDescriptionsDesignDesign factors in data warehouse project can put up with No Standard techniques or design methodologies. A data warehouse project when doing the analysis and design phase may accept ideas on metadata techniques or languages and data engineering techniques. Also, a proprietary solutions and also recommendations from vendors or in-house experts may define the design of the data warehouse blueprint landscape.TechnicalTechnical factors associate to the lack of know-how experience in evaluation and choices of hardware setup for data warehouse systemsProceduralProcedural factors concerning on the imperfection of data warehouse deployment. This factor focuses on training the end-users extensively on the new technology and the design of data warehouse which are completely different than the conventional IT solutions. users communitie s plays a vital role and are crucial in this factor.Socio-TechnicalSocio-technical factors in a data warehouse project may lead into problems on violation of organization modus operandi where the data warehouse systems will lead into restructuring or reorganization on the way organization operates by introducing changes to the user community.According to Vassiliadis (2007) also, another potence factors for the failure of data warehouse projects are the data ownership and access. This is considered vulnerable within the organization and one mustnt share nor acquire someone else data as this is comparable with losing authority on the data ownership and access. Also, restrict any departmental solution or request to own a total ownership of pure clean and error free data as this might cause potential problem on ownership data rights.Watson (2004) stress that the general factors for the failures in data warehouse project success comprises of weak sponsorship and top management support, shortsighted funding and users participation and organizational politic.2.2.3.2 DATA WAREHOUSE SUCCESS FACTORSData Warehouse Failures can lead into disastrous implementation if careful factors or measures are not taken into serious considerations as discussed in section 2.2.3.1 based on Briggs (2002) and Vassiliadis (2004) studies that may lead into data warehouse failures. According to Hwang M.I. (2007), data warehouse implementations are an important area of research and industrial practices but only few researches made an assessment in the critical success factors for data warehouse implementations. No doubt there is procedure for data warehouse design and implementation but only certain guidelines are subjected for experimental testing. So, it is best to settle down and choose the proper data warehouse model for implementation success.In this study on identifying and filling the gap analysis of the data warehouse success factors, a number of success factors are gathered from d ata warehouse scholars and professionals (Watson Haley, 1997 Chen et al., 2000 Wixom Watson, 2001 Watson et al., 2001 Hwang Cappel, 2002 Shin, 2003) to validate their experimental work and research strength individually on various characteristics of data warehouse success. This study beneficial in planning and implementing data warehouse projects and direct into the success of designing and implementing the new proposed model in this research.There are several success factors in designing and implementing data warehouse solutions and the most important success factors depend on the data warehouse model selection, as different organization may have different scope and road maps in the development of data warehouse. The results of building a successful data warehouse are then used to quantify the factors that are used and also prioritize those factors that are beneficial for continued research purpose to improve and enhanced the data warehouse model success.According to Hayen, R.L. (2007), data warehouse is a complex system which can complicate business procedures. The complexity of data warehouse prevents companies from changing data or transaction which are necessary. It is important then to analyze on which data warehouse model to be used for such complex systems that are sound critical to an organization. Hwang M.I. (2007) conducted a study on data warehousing model and success factors as a critical area of practice and research but only a few studies have been accomplish to measure the data warehouse projects and success factors. umteen scholars had conducted a profound research in the area of data warehouse and may have succeeded or failed due to possible reasons based on each scholars outcomes on the research. It is useful inspect a few case studies on a selected companied data warehouse implementation and to experiment the failure and success factors through survey. (Winter, 2001 Watson et al., 2004)Hwang M. I. (2007) conducted a survey study on six d ata warehouse scholars (Watson Haley, 1997 Chen et al., 2000 Wixom Watson, 2001 Watson et al., 2001 Hwang Cappel, 2002 Shin, 2003) on the success factors in a data warehouse project. Each scholar has different success factors that are measures in a project. Table 2.6 shows the mentioned six scholars survey study on data warehouse, Watson (1997) measures data warehouse success factors, Chen et al. (2000), Watson et al. (2001) and Shin (2003) measures data warehouse implementation factors and Hwang (2002) measures through development and management practices. Only Wixom (2001) as shown in diagram 2.7 measures both Data warehouse implementation and success factors which can be used as a model for a successful data warehouse implementation. Study shown in all 6 scholars review, without having data warehouse implementation and success factors, the consequences of any factors on a data warehouse success cannot be validated.Table 2.6 Factors for Data Warehouse Success (adapted from Hwan g M.I., 2007)StudyData Warehouse Success FactorsData Warehouse Implementation FactorsResults ReportedWatson Haley (1997)Focus on user involvement and support by having a clear and understandable business needs. Using methodology and modelling methods in data warehouse by targeting on clean data. Thus, support from speed management to contribute on the success.N/AOrdered list of successChen et al.(2000)N/AFocused on exactness and clearcutness of User satisfaction by Support and realization of end users needs.Support for end users affects user satisfactionWixom Watson (2001)Implementation factors include management support, resources, User participation, team skills, Source systemsaand development technology which contribute to the implementatio

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.