What is ETL?
ETL stands for Extraction, Transformation and Loading.
ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter.
Extraction: Take data from an external source and move it to the warehouse pre-processor database.
Transformation: Transform data task allows point-to-point generating, modifying and transforming data.
Loading: Load data task adds records to a database table in a warehouse.
What is ETL testing process?
Analyzing the requirement – Understanding the business structure and their particular requirement.
Validation and Test Estimation – An estimation of time and expertise required to carry on with the procedure.
Test Planning and Designing the testing environment – Based on the inputs from the estimation, an ETL environment is planned and worked out.
Test Data preparation and Execution – Data for the test is prepared and executed as per the requirement.
Summary Report: Upon the completion of the test run, a brief summary report is prepared for improvising and concluding.
What are the ETL testing operations includes?
ETL testing includes
- Verify whether the data is transforming correctly according to business requirements
- Verify that the projected data is loaded into the data warehouse without any truncation and data loss
- Make sure that ETL application reports invalid data and replaces with default values
- Make sure that data loads at expected time frame to improve scalability and performance
What are the different types of ETL testing?
- Accuracy Testing
- Data validation
- Completeness
- Software
- Syntax
- Metadata
- Interface
- Reference
- Performance
What are the various tools used in ETL?
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI
- SAS business warehouse
- SAS Enterprise ETL server
What is Fact? What are the types of facts?
It is a central component of a multi-dimensional model which contains the measures to be analysed. Facts are related to dimensions.
Types of facts are
- Additive Facts
- Semi-additive Facts
- Non-additive Facts
Where do we use Semi and Non Additive Facts?
Additive: A measure can participate arithmetic calculations using all or any dimensions.
Ex: Sales profit
Semi additive: A measure can participate arithmetic calculations using some dimensions.
Ex: Sales amount
Non Additive: A measure can’t participate arithmetic calculations using dimensions.
Ex: temperature
What are Cubes and OLAP Cubes?
Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It provides multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.
What is Ods (operation Data Source)?
ODS – Operational Data Store.
ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity. Once data was populated in ODS aggregated data will be loaded into EDW through ODS.
What is tracing level and what are the types?
Tracing level is the amount of data stored in the log files. Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.
What is Grain of Fact?
Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity
What fact less fact schema is and what is Measures?
A fact table without measures is known as Fact less fact table. It can view the number of occurring events. For example, it is used to record an event such as employee count in a company. The numeric data based on columns in a fact table is known as Measures.
What are the Modules in Power Mart?
- PowerMart Designer
- Server
- Server Manager
- Repository
- Repository Manager
What is the difference between Power Center & Power Mart?
PowerCenter – Ability to organize repositories into a data mart domain and share metadata across repositories.
PowerMart – only local repository can be created.
What is transformation?
A transformation is a repository object which generates, modifies or passes data. Transformation is of two types Active and Passive
What are Active Transformation / Passive Transformations?
Active transformation can change the number of rows that pass through it. (Decrease or increase rows)
Passive transformation cannot change the number of rows that pass through it.
What is the use of Lookup Transformation?
The Lookup Transformation is useful for
- Getting a related value from a table using a column value
- Update slowly changing dimension table
- Verify whether records already exist in the table
What is partitioning, hash partitioning and round robin partitioning?
To improve performance, transactions are sub divided, this is called as Partitioning. Partioning enables Informatica Server for creating of multiple connection to various sources the types of partitions are
Round-Robin Partitioning: By informatica data is distributed evenly among all partitions in each partition where the number of rows to process are approximately same this partioning is applicable
Hash Partitioning: For the purpose of partitioning keys to group data among partitions Informatica server applies a hash function It is used when ensuring the processes groups of rows with the same partitioning key in the same partition need to be ensured
What is the advantage of using Data Reader Destination Adapter?
The advantage of using the Data Reader Destination Adapter is that it populates an ADO record set (consist of records and columns) in memory and exposes the data from the Dataflow task by implementing the Data Reader interface, so that other application can consume the data.
What is data source view?
A data source view allows to define the relational schema which will be used in the analysis services databases. Rather than directly from data source objects, dimensions and cubes are created from data source views.
What is the difference between OLAP tools and ETL tools?
The difference between ETL and OLAP tool is that
ETL tool is meant for the extraction of data from the legacy systems and load into specified data base with some process of cleansing data.
Example: Data stage, Informatica etc.
While OLAP is meant for reporting purpose in OLAP data available in multi-directional model.
Example: Business Objects, Cognos etc.
What staging area is and what is the purpose of a staging area?
Data staging is an area where you hold the data temporary on data warehouse server. Data staging includes following steps
- Source data extraction and data transformation ( restructuring )
- Data transformation (data cleansing, value transformation )
- Surrogate key assignments
What is Bus Schema?
For the various business process to identify the common dimensions, BUS schema is used. It comes with a conformed dimension along with a standardized definition of information
What is data purging?
Data purging is a process of deleting data from data warehouse. It deletes junk data’s like rows with null values or extra spaces.
What are Schema Objects?
Schema objects are the logical structure that directly refers to the databases data. Schema objects includes tables, views, sequence synonyms, indexes, clusters, functions packages and database links
What is Session, Worklet, Mapplet and Workflow?
Session: It is a set of parameters that tells the server how to move data from sources to target
Workflow: It’s a set of instructions that tell the server how to execute tasks
Mapplet: It arranges or creates sets of transformation
Worklet: It represents a specific set of tasks given
What are the Various Tools?
- Abinitio
- DataStage
- Informatica
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI (Extreme Insight)
- SAP Business Warehouse
- SAS Enterprise ETL Server
What are few Test cases and explain them?
Mapping Doc Validation – Verifying if the ETL information is provided in the Mapping Doc.
Data Check – Every aspect regarding the Data such as Data check, Number Check, Null check are tested in this case
Correctness Issues – Misspelled Data, Inaccurate data and null data are tested.
List few ETL bugs
- Calculation Bug
- User Interface Bug
- Source Bugs
- Load condition bug
- ECP related bug
How you can extract SAP data using Informatica?
With the power connect option you extract SAP data using informatica
- Install and configure the Power Connect tool
- Import the source into the Source Analyzer. Between Informatica and SAP Power connect act as a gateway. The next step is to generate the ABAP code for the mapping then only informatica can pull data from SAP
- To connect and import sources from external systems Power Connect is used.