Abinitio Interview Questions and Answers

The Ab Initio software is a fourth generation data analysis, batch processing, data manipulation graphical user interface (GUI)-based parallel processing product which is commonly used to extract, transform, and load (ETL) data. Ab Initio Software is an American multinational enterprise software corporation based in Lexington, Massachusetts. The Company specializes in high-volume data processing applications and enterprise application integration. It was founded in 1995 by the former CEO of Thinking Machines Corporation, Sheryl Handler, and several other former employees after the bankruptcy of that company.

Can you explain Abinitio?

Abinitio is a Latin word meaning from the beginning Abinitio is a tool used to extract, transform and load data (ETL). It is a BI platform comprised of six data processing products such as co>operating system, the component library, data profiler, conduct, graphical development environment, and enterprise Meta> environment. It is also used for data analysis, data manipulation, batch processing, and graphical user interface based parallel processing.

What is the architecture of Abinitio ETL?

What is the role of Co-operating system in Abinitio?

Can you explain AbInitio GDE?

GDE (Graphical Development Enviroment) is a graphical application for developers which are used for designing and running AbInitio graphs. It also provides:

The ETL process in AbInitio is represented by AbInitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.

A user-friendly frontend for designing Ab Initio ETL graphs

Ability to run, debug Ab Initio jobs and trace execution logs

GDE AbInitio graph compilation process results in generation of a UNIX shell script which may be executed on a machine without the GDE installed

Can you explain Component Library?

The Ab Initio Component Library is a reusable software module for sorting, data transformation, and high-speed database loading and unloading. This is a flexible and extensible tool which adapts at runtime to the formats of records entered and allows creation and incorporation of new components obtained from any program that permits integration and reuse of external legacy codes and storage engines.

What are the layouts does Ab initio support?

What does dependency analysis mean in Abinitio?

Can you explain EME?

Can you define Data Profiler?

How Abinitio EME is segregated?

How can you connect EME to Abinitio Server?

What are the file extensions used in Abinitio?

What information does a .dbc file extension provides to connect to the database?

The .dbc extension provides the GDE with the information to connect with the database are

Name and version number of the data-base to which you want to connect
Name of the computer on which the data-base instance or server to which you want to connect runs, or on which the database remote access software is installed
Name of the server, database instance or provider to which you want to link

How you can run a graph infinitely in Abinitio?

Can you define SANDBOX?

Can you define multifile system?

What are the components or functions available in ab initio?

The main components in ab initio are here below,

Dedup: To remove duplicates

Join: To join multiple input dataset based on a common key value.

Sort: This component reorders the data. It takes the collation order and dumps data to memory

Filter: Any conditional related removal of data.

Replicate: This is component is mainly for the parallelism as an additional copy of data is useful while any other nodes go unavailable.

mergeT:his component is to combine multiple input data.

How can you run a graph infinitely?

Can you explain the different types of parallelism used in Abinitio?

Component parallelism: A graph with multiple processes executing simultaneously on separate data uses parallelism

Data parallelism: A graph that works with data divided into segments and operates on each segment respectively, uses data parallelism.

Pipeline parallelism: A graph that deals with multiple components executing simultaneously on the same data uses pipeline parallelism. Each component in the pipeline read continuously from the upstream components, processes data and writes to downstream components. Both components can operate in parallel.

Can you explain Sort Component?

What is the difference between dedup-component and replicate component?

Can you define partition and what are the different types of partition components in Abinitio?

In Abinitio, partition is the process of dividing data sets into multiple sets for further processing. Different types of partition component includes

Partition by Round-Robin: Distributing data evenly, in block size chunks, across the output partitions

Partition by Range: You can divide data evenly among nodes, based on a set of partitioning ranges and key

Partition by Percentage: Distribution data, so the output is proportional to fractions of 100

Partition by Load balance: Dynamic load balancing

Partition by Expression: Data dividing according to a DML expression

Partition by Key: Data grouping by a key

Can you explain de-partition in Abinitio?

What are the air commands used in Abinitio?

Can you explain Rollup Component?

What is the difference between rollup and scan?

What is the syntax for m_dump in Abinitio?

What is the relation between EME, GDE and co-operating system?

EME is said as enterprise metadataenv, GDE as graphical development env and co-operating system can be said as abinitio server relation b/w this co-op, eme and gde is as fallowsco operating system is the abinitio server. This co-op is installed on particular o.s platform that is called native o.s .coming to the eme, its just as repository in Informatica, its hold the metadata, transformations, dbconfig files source and targets information’s. Coming to gde its is end user environment where we can develop the graphs (mapping just like in Informatica) designer uses the gde and designs the graphs and save to the eme or sand box it is at user side. Where eme is at server side.

How can you run a graph infinitely?

How do you add default rules in transformer?

Can you explain local lookup is?

If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key.

Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fast.

What is the difference between look-up file and look-up, with a relevant example?

Generally Lookup file represents one or more serial files (Flat files). The amount of data is small enough to be held in the memory. This allows transform functions to retrieve records much more quickly than it could retrieve from Disk.

A lookup is a component of abinitio graph where we can store data and retrieve it by using a key parameter.A lookup file is the physical file where the data for the lookup is stored.

What is lookup?

Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the inputto the join contains less number of records with slim record length.AbInitio has built-in functions to retrieve values using the key for the lookup.

Can you define outer join?

Can you explain improve the performance of a graph?

There are many ways the performance of the graph can be improved.

Use a limited number of components in a particular phase
Use optimum value of max core values for sort and join components
Minimize the number of sort components
Minimize sorted join component and if possible replace them by in-memory join/hash join
Use only required fields in the sort, reformat, join components
Use phasing/flow buffers in case of merge, sorted joins
If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port
For large dataset don’t use broadcast as partitioner
Minimize the use of regular expression functions like re_index in the transfer functions
Avoid repartitioning of data unnecessarily
Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be partitioned.

How do you truncate a table?

What is the difference between a DB config and a CFG file?

Can you explain data mapping?

Data mapping deals with the transformation of the extracted data at FIELD level

i.e. the transformation of the source field to target field is specified by the mapping

defined on the target field. The data mapping is specified during the cleansing of the

data to be loaded.

For Example:

source;

string(35) name = “interviewgig”;

target;

string(“01”) nm=NULL(“”);/*(maximum length is string(35))*/

Then we can have a mapping like:

Straight move.Trim the leading or trailing spaces.

The above mapping specifies the transformation of the field nm

Can you explain the Graph parameter?

Can you explain the layouts does Abinitio supports?

Can you explain primary keys and foreign keys?

Can you explain Cartesian joins?

How do you truncate a table?

How to run the graph without GDE?

How can I run the 2 GUI merge files?

Can you explain .abinitiorc ?

Can you explain local and formal parameter?

What is the difference between partitioning with key and round robin?

Partition by key: we have to specify the key based on which the partition will occur. It results in well-balanced data due to the key based partitions. It is useful for key dependent parallelism.

Partition by round robin: Distributing data evenly in block size chunks the records are partitioned in a sequential way across the output partition. It is not key based and results are well-balanced data especially with a block size of 1. It is useful for record independent parallelism.

Can you explain add default rules in transformer?

Add Default Rules: Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to output fields with the same name.

1) If it is not already displayed, display the Transform Editor Grid.

2) Click the Business Rules tab if it is not already displayed.

3) Select Edit > Add Default Rules.

In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achieve the functionality.

Can you explain Abinitio?

What is the architecture of Abinitio ETL?

What is the role of Co-operating system in Abinitio?

Can you explain AbInitio GDE?

Can you explain Component Library?

What are the layouts does Ab initio support?

What does dependency analysis mean in Abinitio?

Can you explain EME?

Can you define Data Profiler?

How Abinitio EME is segregated?

How can you connect EME to Abinitio Server?

What are the file extensions used in Abinitio?

What information does a .dbc file extension provides to connect to the database?

How you can run a graph infinitely in Abinitio?

Can you define SANDBOX?

Can you define multifile system?

What are the components or functions available in ab initio?

How can you run a graph infinitely?

Can you explain the different types of parallelism used in Abinitio?

Can you explain Sort Component?

What is the difference between dedup-component and replicate component?

Can you define partition and what are the different types of partition components in Abinitio?

Can you explain de-partition in Abinitio?

What are the air commands used in Abinitio?

Can you explain Rollup Component?

What is the difference between rollup and scan?

What is the syntax for m_dump in Abinitio?

What is the relation between EME, GDE and co-operating system?

How can you run a graph infinitely?

How do you add default rules in transformer?

Can you explain local lookup is?

What is the difference between look-up file and look-up, with a relevant example?

What is lookup?

Can you define outer join?

Can you explain improve the performance of a graph?

How do you truncate a table?

What is the difference between a DB config and a CFG file?

Can you explain data mapping?

Can you explain the Graph parameter?

Can you explain the layouts does Abinitio supports?

Can you explain primary keys and foreign keys?

Can you explain Cartesian joins?

How do you truncate a table?

How to run the graph without GDE?

How can I run the 2 GUI merge files?

Can you explain .abinitiorc ?

Can you explain local and formal parameter?

What is the difference between partitioning with key and round robin?

Can you explain add default rules in transformer?

Related Posts