What is Oracle Data Integrator (ODI)?
Oracle acquired SUNOPSIS with its ETL tool called “Sunopsis Data Integrator” and renamed to Oracle Data Integrator (ODI) is an E-LT (Extract, Load and Transform) tool used for high-speed data movement between disparate systems.
The version, Oracle Data Integrator Enterprise Edition (ODI-EE 12c) brings together “Oracle Data Integrator” and “Oracle Warehouse Builder” as separate components of a single product with a single licence.
Explain what is ODI? why is it different from the other ETL tools.
ODI stands for Oracle Data Integrator. It is different from another ETL tool in a way that it uses E-LT approach as opposed to ETL approach. This approach eliminates the need of the exclusive Transformation Server between the Source and Target Data server. The power of the target data server can be used to transform the data. i.e. The target data server acts as staging area in addition to its role of target database.
while loading the data in the target database (from staging area) the transformation logic is implemented. Also, the use of appropriate CKM (Check Knowledge Module) can be made while doing this to implement data quality requirement.
What is E-LT? Or What is the difference between ODI and other ETL Tools?
E-LT is an innovative approach to extracting, loading and Transforming data. Typically, ETL application vendors have relied on costly heavyweight, mid-tier server to perform the transformations required when moving large volumes of data around the enterprise.
ODI delivers unique next-generation, Extract Load and Transform (E-LT) technology that improves performance and reduces data integration costs, even across heterogeneous systems by pushing the processing required down to the typically large and powerful database servers already in place within the enterprise.
What components make up Oracle Data Integrator?
“Oracle Data Integrator” comprises of:
- Oracle Data Integrator + Topology Manager + Designer + Operator + Agent
- Oracle Data Quality for Data Integrator
- Oracle Data Profiling
What is Oracle Data Integration Suite?
Oracle data integration suite is a set of data management applications for building, deploying, and managing enterprise data integration solutions:
- Oracle Data Integrator Enterprise Edition
- Oracle Data Relationship Management
- Oracle Service Bus (limited use)
- Oracle BPEL (limited use)
- Oracle WebLogic Server (limited use)
Additional product options are:
- Oracle Goldengate
- Oracle Data Quality for Oracle Data Integrator (Trillium-based DQ)
- Oracle Data Profiling (Trillium based Data Profiling)
- ODSI (the former Aqualogic Data Services Platform)
What systems can ODI extract and load data into?
ODI brings true heterogeneous connectivity out-of-the-box, it can connect natively to Oracle, Sybase, MS SQL Server, MySQL, LDAP, DB2, PostgreSQL, Netezza.
It can also connect to any data source supporting JDBC, its possible even to use the Oracle BI Server as a data source using the jdbc driver that ships with BI Publisher
What are Knowledge Modules?
Knowledge Modules form the basis of ‘plug-ins’ that allow ODI to generate the relevant execution code, across technologies, to perform tasks in one of six areas, the six types of knowledge module consist of:
- Reverse-engineering knowledge modules are used for reading the table and other object metadata from source databases
- Journalizing knowledge modules record the new and changed data within either a single table or view or a consistent set of tables or views
- Loading knowledge modules are used for efficient extraction of data from source databases for loading into a staging area (database-specific bulk unload utilities can be used where available)
- Check knowledge modules are used for detecting errors in source data
- Integration knowledge modules are used for efficiently transforming data from staging area to the target tables, generating the optimized native SQL for the given database
- Service knowledge modules provide the ability to expose data as Web services
- ODI ships with many knowledge modules out of the box, these are also extendable, they can be modified within the ODI Designer module.
How do ‘Contexts’ work in ODI?
ODI offers a unique design approach through use of Contexts and Logical schemas. Imagine a development team, within the ODI Topology manager a senior developer can define the system architecture, connections, databases, data servers (tables etc) and so forth.
These objects are linked through contexts to ‘logical’ architecture objects that are then used by other developers to simply create interfaces using these logical objects, at run-time, on specification of a context within which to execute the interfaces, ODI will use the correct physical connections, databases + tables (source + target) linked the logical objects being used in those interfaces as defined within the environment Topology.
Does my ODI infrastructure require an Oracle database?
No, the ODI modular repositories (Master + and one of multiple Work repositories) can be installed on any database engine that supports ANSI ISO 89 syntax such as Oracle, Microsoft SQL Server, Sybase AS Enterprise, IBM DB2 UDB, IBM DB2/40.
Does ODI support web services?
Yes, ODI is ‘SOA’ enabled and its web services can be used in 3 ways:
- The Oracle Data Integrator Public Web Service, that lets you execute a scenario (a published package) from a web service call
- Data Services, which provide a web service over an ODI data store (i.e. a table, view or other data source registered in ODI)
- The ODIInvokeWebService tool that you can add to a package to request a response from a web service
How to reverse engineer views(how to load the data from views)?
In Models Go to Reverse engineering tab and select Reverse engineering object as VIEW.
Is ODI Used by Oracle in their products?
Yes, there are many Oracle products that utilise ODI, but here are just a few:
- Oracle Application Integration Architecture (AIA)
- Oracle Agile products
- Oracle Hyperion Financial Management
- Oracle Hyperion Planning
- Oracle Fusion Governance, Risk & Compliance
- Oracle Business Activity Monitoring
- Oracle BI Applications also uses ODI as its core ETL tool in place of Informatica, but only for one release of OBIA and when using a certain source system.
How will you bring in the different source data into ODI?
you will have to create dataservers in the topology manager for the different sources that you want.
How will your bulk load data?
In Odi there are IKM that are designed for bulk loading of data.
How will you bring in files from remote locations?
We will invoke the Service knowledge module in ODI,this will help us to accesses data thought a web service.
How will you handle dataquality in ODI?
There are two ways of handling dataquality in Odi….the first method deals with handling the incorrect data using the CKM…the second method uses Oracle data quality tool(this is for advanced quality options)
What is load plans and types of load plans?
Load plan is a process to run or execute multiple scenarios as a Sequential or parallel or conditional based execution of your scenarios. And same we can call three types of load plans, Sequential, parallel and Condition based load plans.
What is profile in odi?
profile is a set of objective wise privileges. we can assign this profile to the users. Users will get the privileges from profile.
How to write the sub queries in odi?
- Using Yellow interface and sub queries option we can create sub queries in odi.
- Using VIEW, we can go for sub queries Or Using ODI Procedure we can call direct DB queries in ODI.
How to write the procedures in odi?
Procedure is a step by step any technology code operations. you can refer What are the types of Variables?1) Global2) Project A variable is an object that stores a single value. This value can be a string, a number or a date. The value is stored in Oracle Data Integrator and can be updated at run-time. The value of a variable can be updated from the result of a query executed on a logical schema.
For example, it can retrieve the current date and time from a database.A variable can be created as a global variable or in a project. Global variables can be used in all projects, while project variables can only be used within the project in which they are defined.
Where we can use variables?
Variables can be used in all Oracle Data Integrator expressions:ü Mapping,ü Filters,ü Joins,ü Constraints,
What is Work Repository?
Each work repository is attached to a master repository, therefore, information about the physical connection to a work repository is stored in the master repository it is attached to.
Defining a connection to a work repository consists of defining a connection to a master repository, then selecting one of the work repositories attached to this master repository.
What is Master Repository?
The Master Repository is a data structure containing information on the topology of a company’s IT resources, on security and on version management of projects and data models. This repository is stored on a relational database accessible in client/server mode from the different modules.Generally, only one master repository is necessary.However, in exceptional circumstances, it may be necessary to create several master repositories in one of the following cases:
Project construction over several sites not linked by a high-speed network (off-site development, for example).
Necessity to clearly separate the interfaces’ operating environments (development, test, production), including on the database containing the master repository. This may be the case if these environments are on several sites.
What is a Procedure?
A Procedure is a reusable component that allows you to group actions that do not fit in the Interface framework. (That is load a target datastore from one or more sources).A Procedure is a sequence of commands launched on logical schemas. It has a group of associated options. These options parameterize whether or not a command should be executed as well as the code of the commands.
What is Model?
An Oracle Model is a set of datastores corresponding to views and tables contained in an Oracle Schema. A model is always based on aLogical Schema. In a given Context, the Logical Schema corresponds to a Physical Schema. The Data Schema of this Physical Schema contains the Oracle model’s tables and views.
What is a Package?
The package is the biggest execution unit in Oracle Data Integrator. A package is made of a sequence of steps organized in an execution diagram.
What is User Parameters?
Oracle Data Integrator saves user parameters such as default directories, windows positions,etc.User parameters are saved in the userpref.xml file in /bin.
What is a Project?
A project is a group of objects developed using Oracle Data Integrator.
What is Folder?
Certain objects in a project are organized into folders and sub-folders.
What is Sequence?
A sequence is an variable automatically incremented when used. Between two uses the value is persistent. The sequences are usable like variable in interfaces, procedures, steps, …A sequence can also be defined outside a project (global scope), in order to be used in all projects.
What is User Functions?
User functions enable to define customized functions or “functions aliases”, for which you will define technology-dependant implementations. They are usable in the interfaces and procedures.
What is Marker?
Elements of a project may be flagged in order to reflect the methodology or organization of the developments. Flags are defined using the markers. These markers are organized into groups and can be applied to most objects in a project.
What is Scenario?
When a package, interface, procedure or variable component is finished, it is compiled in a scenario. A scenario is the execution unit for production, that can be scheduled.
What is Context?
A context is a set of resources allowing the operation or simulation of one or more data processing applications. Contexts allow the same jobs (Reverse, Data Quality Control, Package, etc) to be executed on different databases and/or schemas.In Oracle Data Integrator, a context allows logical objects (logical agents, logical schemas) to be linked with physical objects (physical agents, physical schemas).
What is Memos?
A memo is an unlimited amount of text attached to virtually any object, visible on its Memo tab. When an object has a memo attached, the icon appears next to it.
What is Sequences?
A sequence is a variable that increments itself each time it is used. Between two uses, the value can be stored in the repository or managed within an external RDBMS table.Oracle Data Integrator supports two types of sequences:
Standard sequences, whose last value is stored in the Repository.
Specific sequences, whose last value is stored in an RDBMS table cell. Oracle Data Integrator undertakes to read the value, to lock the row (for concurrent updates) and to update the row after the last increment.
What is Session?
A session is an execution (of a scenario, an interface, a package or a procedure, …) undertaken by an execution agent. A session is made up of steps which are made up of tasks.
What is Session Tasks?
The task is the smallest execution unit. It corresponds to a procedure command in a KM, a procedure, assignment of a variable, etc
Can I create more than one Master Repository in ODI?
Yes. In general, you need only one master repository. However, it may be necessary to create several master repositories if the Project construction over several sites not linked by a high-speed network (off-site development, for example) or Necessity to clearly separate the interfaces operating environments (development, test, production), including on the database containing the master repository.
This may be the case if these environments are on several sites.
What are the types of Knowledge Modules?
- LKM(used to extract data from heterogeneous source systems (files, middleware, databases, etc.) to a staging area).
- IKM(used to integrate (load) data from staging to target tables)
- RKM(used to perform a customized reverse-engineering of data models for a specific technology. It extracts metadata from a metadata provider to ODI repository. These are used in data models.)
- JKM(used to create a journal of data modifications (insert, update and delete) of the source databases to keep track of changes. These are used in data models and used for Changed Data Capture.)
- CKM( used to check data consistency i.e. constraints on the sources and targets are not violated. These are used in data model’s static checks and interfaces flow checks. Static check refers to constraint or rules defined in data model to verify integrity of source or application data. Flow check refers to declarative rules defined in interfaces to verify an application’s incoming data before loading into target tables.)
What is An Interface?
Interface is an object in ODI which will map the sources to target datamarts.
What is a temporary Interface (Yellow Interface)?
The advantage of using a yellow interface is to avoid the creation of Models each time we need to use it in an interface. Since they are temporary, they are not a part of the data model and hence don’t need to be in the Model.
Explain some differences between ODI 10g and ODI 11g?
ODI 11g provides a Java API to manipulate both the design-time and run-time artifacts of the product. This API allows you for example to create or modify interfaces programmatically, create your topology, perform import or export operations, launch or monitor sessions. This API can be used in any Java SE and Java EE applications, or in the context of Java-based scripting languages like Groovy or Jython.
External Password Storage, to have source/target data servers (and contexts) passwords stored in an enterprise credential store. External Authentication, to have user/password information stored in an enterprise identity store (e.g.: LDAP, Oracle Directory, Active Directory), and ODI authenticating against this store.
These two features let you optionally store critical information in dedicated storages and not within the ODI repository. The ODI Console may also use Oracle’s single-sign on systems with ODI.
What is CKM and when we will use this CKM?
Check control module is used when we are creating constraints on target datastore. We can say that CKM is used in Data Quality control.
What is SKM and when we will use this SKM?
SKM (Service Knowledge Module) is used to generate code required for data services. These are used in data models. Data Services are specialized web services that enable access to application data in datastores, and to the changes captured for these datastores using Changed Data Capture.
What are the types of data quality control?
There are two ways to data quality control
Static: We will run the constraints on existing target data. This is done after loading the data into target.
Flow: We will run the constraints on incoming data. This is done before loading the data into target.
What is a constraint?
It is a condition which you want to apply while transferring the data from source to target.
What is E$ table in ODI?
Temporary Error table created by ODI. This is created by CKM.
What is I$ table in ODI?
This is a flow table created by IKM while integrating data in the datamart. This is a temporary table used by ODI.
What is J$ table in ODI?
This is where all changes are recorded. Journals contain references to the changed records along with the type of change (insert, update or delete).
What is Journalization and why we are using in ODI?
It is the way to implement change data capture in ODI. We use JKM for this purpose.
Explain step by step procedure to enable Journalization?
The first step is to import a proper JKM. After creating model and reverse engineering we have to add the model to CDC and then we need to subscribe to the table we want. This will enable the Journalization.
What is the ODI Console?
ODI console is a web-based navigator to access the Designer, Operator and Topology navigators through browser.
What’s load plans and types of load plans?
Load plan is a process to run or execute multiple scenarios as a Sequential or parallel or conditional based execution of your scenarios. And same we can call three types of load plans, Sequential, parallel and Condition based load plans.
How to write the sub-queries in ODI?
We can follow anyone of the following to create a sub query.
- Using Yellow interface and sub queries option we can create sub queries in ODI.
- Using a VIEW, we can go for sub queries.
- Using ODI Procedure we can call direct database queries in ODI.
Remove the duplicate in ODI?
Use DISTINCT, in IKM level. it will remove the duplicate rows while loading into target.
[dt_sc_toggle title=”How to implement data validations?
Use Filters & Mapping Area AND Data Quality related to constraints use CKM Flow control.
How to handle exceptions?
Exceptions In packages advanced tab and load plan exception tab we can handle exceptions.
How to implement the logic in procedures if the source side data deleted that will reflect the target side table?
User this query on Command on target Delete from Target_table where not exists (Select ‘X’ From Source_table Where Source_table.ID=Target_table.ID).
Can we implement package in package?
Yes. we can, call one package into another package.
How to load the data with one flat file and one RDBMS table using joins?
Drag and drop both File and table into source area and join as in Staging area.
How to reverse engineer views (how to load the data from views)?
In Models Go to Reverse engineering tab and select Reverse engineering object as VIEW.
What systems can ODI extract and load data into?
ODI brings true heterogeneous connectivity out-of-the-box, it can connect natively to Oracle, Sybase, MS SQL Server, MySQL, LDAP, DB2, PostgreSQL, Netezza. It can also connect to any data source supporting JDBC, its possible even to use the Oracle BI Server as a data source using the JDBC driver that ships with BI Publisher
What are the prime responsibilities of Data Integration Administrator?
- Scheduling and executing the batch jobs.
- Configuring, starting and stopping the real-time services
- Adapters configuration and managing them.
- Repository usage, Job Server configuration.
- Access Server configuration.
- Batch job publishing.
- Real-time services publishing through web services.