Explain Pentaho?
It addresses the blockades that block the organization’s ability to get value from all our data. Pentaho is discovered to ensure that each member of our team from developers to business users can easily convert data into value.
Explain the important features of Pentaho?
- Pentaho is capable of creating Advanced Reporting Algorithms regardless of their input and output data format.
- It supports various report formats, whether Excel spreadsheets, XMLs, PDF docs, CSV files.
- It is a Professionally Certified DI Software rendered by the renowned Pentaho Company headquartered in Florida, United States.
- Offers enhanced functionality and in-Hadoop functionality.
- Allows dynamic drill down into larger and greater information.
- Rapid Interactive response optimization.
- Explore and view multidimensional data.
Define Pentaho BI Project?
The Pentaho BI Project is a current effort by the Open Source communal to provide groups with best-in-class solutions for their initiative Business Intelligence (BI) needs.
What major applications comprises of Pentaho BI Project?
The Pentaho BI Project encompasses the following major application areas:
- Business Intelligence Platform
- Data Mining
- Reporting
- Dashboards
What is Pentaho Data Integration?
PDI stands for Pentaho Data Integration is a part of the Pentaho Open Source Business intelligence (BI) suite. It includes software for all aspects of supporting business decision making: Data mining, Data warehouse, managing, Data integration, Analytics tools.
What is Pentaho Reporting?
Pentaho Reporting(PR) is a suite of tools for creating pixel perfect reports. With Pentaho Reporting you are able to transform data into meaningful information tailored to your customers. You can create HTML, Excel, and PDF, Text or printed reports. If you are a developer, you can also produce CSV and XML reports to feed other systems.
Which platform benefits from the Pentaho BI Project?
Java developers who generally use project components to rapidly assemble custom BI solutions
ISVs who can improve the value and ability of their solutions by embedding BI functionality
End-Users who can quickly deploy packaged BI solutions which are either modest or greater to traditional commercial offerings at a dramatically lower cost
Is Pentaho a Trademark?
Yes, Pentaho is a trademark.
What do you understand by Pentaho Metadata?
Pentaho Metadata is a piece of the Pentaho BI Platform designed to make it easier for users to access information in business terms.
How does Pentaho Metadata work?
With the help of Pentaho’s open source metadata capabilities, administrators can outline a layer of abstraction that presents database information to business users in familiar business terms.
What is the Pentaho Reporting Evaluation?
Pentaho Reporting Evaluation is a particular package of a subset of the Pentaho Reporting capabilities, designed for typical first-phase evaluation activities such as accessing sample data, creating and editing reports, and viewing and interacting with reports.
Explain MDX?
Multidimensional Expressions (MDX) is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.
Define Tuple?
Finite ordered list of elements is called as tuple.
What kind of data, cube contain?
The Cube will contain the following data:
3 Fact fields – Sales, Costs and Discounts
Time Dimension – with the following hierarchy: Year, Quarter and Month
2 Customer Dimensions – one with location (Region, Country) and the other with Customer Group and Customer Name
Product Dimension – containing a Product Name
Differentiate between transformations and jobs?
Transformations is moving and transforming rows from source to target.
Jobs are more about high level flow control.
How to do a database join with PDI?
If we want to join 2 tables from the same database, we can use a “Table Input” step and do the join in SQL itself.
If we want to join 2 tables that are not in the same database. We can use the “Database Join”.
How we can use database connections from repository?
We can create a new conversion or close and re-open the ones we have loaded in Spoon.
Why can’t we duplicate field names in a single row?
We can’t. if we have duplicate fieldnames. Before PDI v2.5.0 we were able to force duplicate fields, but also only the first value of the duplicate fields could ever be used.
What are the benefits of Pentaho?
- Open Source
- Have community that support the users
- Running well under multi-platform (Windows, Linux, Macintosh, Solaris, Unix, etc)
- Have complete package from reporting, ETL for warehousing data management,
- OLAP server data mining also dashboard.
Differentiate between Arguments and variables?
Arguments are command line arguments that we would normally specify during batch processing.
Variables are environment or PDI variables that we would normally set in a previous transformation in a job.
What do you understand by the term Pentaho Dashboard?
Pentaho Dashboards give business users the critical information they need to understand and improve organizational performance.
What is the use of Pentaho reporting?
Pentaho Reporting allows organizations to easily access, format and deliver information to employees, customers and partners.
Define Pentaho Schema Workbench?
Pentaho Schema Workbench offers a graphical edge for designing OLAP cubes for Pentaho Analysis.
Define Pentaho Data mining?
Pentaho Data Mining used the Waikato Environment for Information Analysis to search data for patterns. It has functions for data processing, regression analysis, classification methods, etc.
Explain Pentaho report Designer (PRD)?
PRD is a graphic tool to execute report-editing functions and create simple and advanced reports and help users export them in PDF, Excel, HTML and CSV files. PRD consists of Java-based report engine offering data integration, portability and scalability. Thus, it can be embedded in Java web applications and also other application servers like Pentaho BAserver.
What do you understand by the term ETL?
It is an entri level tool for data manipulation.
What do you understand by hierarchical navigation?
A hierarchical navigation menu allows the user to come directly to a section of the site several levels below the top.
What are the steps to Decrypt a folder or file?
- Right-click on the folder or file we want to decrypt, and then click on Properties option.
- Click the General tab, and then click Advanced.
- Clear the Encrypt contents to secure data check box, click OK, and then click OK again.
Explain Encrypting File system?
It is the technology which enables files to be transparently encrypted to secure personal data from attackers with physical access to the computer.
What do you mean by repository?
Repository is a storage location where we can store the data safely without any harmness.
Explain why we need ETL tool?
ETL Tool is used to get data from many source systems like RDBMS, SAP, etc. and convert them based on the user requirement. It is required when data float across many systems.
What is ETL process? Write the steps also?
ETL is extraction, transforming, loading process the steps are:
- define the source
- define the target
- create the mapping
- create the session
- create the work flow
What is metadata?
The metadata stored in the repository by associating information with individual objects in the repository.
What are snapshots?
Snapshots are read-only copies of a master table located on a remote node which can be periodically refreshed to reflect changes made to the master table.
What is data staging?
- Data staging is actually a group of procedures used to prepare source system data for loading a data warehouse.
- Full Load means completely erasing the insides of one or more tables and filling with fresh data.
- Incremental Load means applying ongoing changes to one or more tables based on a predefined schedule.
Define mapping?
Dataflow from source to target is called as mapping.
Explain session?
It is a set of instruction which tell when and how to move data from respective source to target.
What is Workflow?
It is a set of instruction which tell the informatic server how to execute the task.
Define mapplet?
It creates and configures the set of transformation.
What do you understand by three tier data warehouse?
A data warehouse is said to be a three-tier system where a middle system provides usable data in a secure way to end users. Both side of this middle system are the end users and the back-end data stores.
What is ODS?
ODS is Operational Data Store which comes in between of data warehouse and staging area.
Differentiate between Etl tool and OLAP tool?
ETL Tool is used for extracting data from the legecy system and load it into specified database with some processing of cleansing data.
OLAP Tool is used for reporting process. Here data is available in multidimensional model hence we can write simple query to extract data from database.
What is XML?
XML is an extensible mark-up language which defines a set of rule for encoding documents in both formats which is human readable and machine readable.
How to perform database join with PDI (Pentaho Data Integration)?
PDI supports joining of two tables form the same databse using a ‘Table Input’ method, performing the join in SQL only.
On the other hand, for joining two tables in different databases, users implement ‘Database Join’ step. However, in database join, each input row query executes on the target system from the main stream, resulting in lower performance as the number of queries implement on the B increases.
To avoid the above situation, there is yet another option to join rows form two different Table Input steps. You can use ‘Merge Join ‘step, using the SQL query having ‘ORDER BY’ clause. Remember, the rows must be perfectly sorted before implementing merge join.
How to sequentialize transformations?
Since PDI transformations support parallel execution of all the steps/operations, it is impossible to sequentialize transformations in Pentaho. Moreover, to make this happen, users need to change the core architecture, which will actually result in slow processing.
Explain Hierarchy Flattening?
It is just the construction of parent child relationships in a database. Hierarchy Flattening uses both horizontal and vertical formats, which enables easy and trouble-free identification of sub elements. It further allows users to understand and read the main hierarchy of BI and includes Parent column, Child Column, Parent attributes and Child attributes.
Define Pentaho Report types?
There are several categories of Pentaho reports:
Transactional Reports: Data to be used form transactions. Objective is to publish detailed and comprehensive data for day-to-day organization’s activities like purchase orders, sales reporting.
Tactical Reports: data comes from daily or weekly transactional data summary. Objective is to present short-term information for instant decision making like replacing merchandize.
Strategic Reports: data comes from stable and reliable sources to create long-term business information reports like season sales analysis.
Helper Reports: data comes from various resources and includes images, videos to present a variety of activities.