What is SSIS?
SSIS stands for SQL Server Integration Services. It was introduced by Microsoft SQL server 2005.It is mainly used for data transformation , ETL operations and data integration solutions.
What are the Data Flow Transformations in SSIS?
Some of the more common transformations
- Aggregate
- Audit
- Conditional Split
- Data Conversion
- Derived Column
- Lookup
- Merge Join
- Merge
- Sort union all
- Script component
- Multicast
- Row count
- Pivot
- Unpivot
- Copy column
What are the different types of files that support SSIS?
- ODBC
- Excel
- XML
- .NetSQL Client
- OLEDB
- Flat File
Explain Audit Transformation?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are:
- Execution of Instance GUID: ID of execution instance of the package
- Package ID: ID of the package
- Package Name
- VersionID: GUID version of the package
- Execution Start Time
- Machine Name
- Username
- TaskName
- TaskID: uniqueidentifier type of the data flow task that contains audit transformation
What are Advantages of SSIS?
- SSIS can handle data from heterogeneous data sources at a same package. We say Data sources can be diverse, including custom or scripted adapters.
- SSIS consumes data which are difficult like FTP, HTTP, MSMQ, and Analysis services etc.
- SSIS provides transformation functionality.
- Easier to maintain and package configuration
- Tightly integrated with Microsoft Visual Studio and SQL Server
- Use the SQL Server Destination instead of OLE DB; which allows you to load data into SQL faster.
- Remove network as a bottleneck for insertion of data by SSIS into SQL.
- Better for complex transformations, multi-step operations, aggregating data from different data sources or types, and structured exception handling
- Data can be loaded in parallel to many varied destinations.
Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.
What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS
- Sources
- Transformations
- Destinations
How can you do an incremental load?
The best and fastest way to do incremental load is by using Timestamp column in the source table and storing the last ETL timestamp.
Can you explain Checkpoint?
A checkpoint is the property in SSIS which enables the project to restart from the point of failure. When we set the property to true package create the checkpoint file which stores the information about package execution and use to restart package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time the package runs.
What is Manifest file in SSIS?
Manifiest file is the utility which can be used to deploy the package using wizard on file system and SQL Server database
What is the difference between ActiveX Script and Script Task?
Script Task is the latest version for the deprecated feature ActiveX Script. Both are used to implement extended functionality in SSIS.
ActiveX script supports VBScript and JScript where as Script Task supports VB.Net and C#.Net.
Script Task is preferable as ActiveX Script has been removed in MSSQL 2012.
Script Task is supported with integrated help, IntelliSense, debugging and can reference external Dotnet assembles.
What is the difference between Script Task and Script Component?
Both are used to extend the native functionality of SSIS.
Script Task is to enhance the functionality for control flow where as Script Component is to enhance the functionality for Data flow.
Script Task can handle the execution of parts of the package where as Script Component can handle the data flow and transformations by processing row by row.
Explain how can we connect to Oracle, DB2 and MySQL from SSIS?
Oracle:
Native OLEDB\Microsoft OLEDB Provider for Oracle
Native .Net providers\ or
.Net providers for OLEDB\
DB2:
Native OLEDB\Microsoft OLEDB Provider for DB2
Native .Net providers\ ,
.Net providers\ ODBC OR
.Net providers for OLEDB\
MySQL:
.Net Providers \ MySQL Data Provider Or
.Net Providers \ ODBC
Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.
What is the difference between Fuzzy Lookup and Fuzzy Grouping?
The Fuzzy grouping task performs the same operations as the Fuzzy Lookup task but instead of evaluating input records against an outside reference table, the input set becomes the reference. Input records are therefore evaluated against other records in the input set and evaluated for similarity and assigned to a group.
What is a Task?
A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language
What is a workflow in SSIS?
Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages
What is the Control Flow?
 When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order
What is a Transformation?
A transformation simply means bringing in the data in a desired format. For example, you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Another example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.
Can you explain config file in SSIS?
Config file in SSIS is used to provide inputs to connection manager different properties which package use to assign values at runtime dynamically.
Using config file user need to make changes to the config file which package will take automatically at runtime because of using it you don’t need to every time make changes in the packages in case you are deploying package on multiple servers or locations. There are multiple ways in which configuration values can be stored.
XML configuration file: Store the config file as an XML file..
Environment variable Store the config in on of the environment variables.
Registry entry Store the confi in the registry
Parent package variable Store the config as a variable in the package that contains the tasks.
SQL Server Store the config in t a table in SQL Server
What are the main SSIS tool box widgets?
- CDC Control Task
- Bulk Insert Task
- Backup Database Task
- ActiveX Script Task
- Analysis Service Processing Task
- Analysis Services Execute DDL Task
- Transfer error message
How many difference source and destinations have you used?
It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.
Can you explain how do you handle errors in SSIS?
When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.
Errors typically fall into one the following categories:
Data conversion errors: occurs if a conversion results in loss of significant digits, the loss of insignificant digits, and the truncation of strings. Data conversion errors also occur if the requested conversion is not supported.
Expression evaluation errors: occurs if expressions that are evaluated at run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.
Lookup errors: occurs if a lookup operation fails to locate a match in the lookup table.
Many data flow components support error outputs, which let you control how the component handles row-level errors in both incoming and outgoing data. You specify how the component behaves when truncation or an error occurs by setting options on individual columns in the input or output.
Can you explain OLE DB Command Transform?
The OLE DB Command Transform is a component designed to execute a SQL Statement for each row in an input stream. This task is analogous to an ADO Command Object being created, prepared, and executed for each row of a result set. The input stream provides that data for parameters that can be set into the SQL Statement that is either an In – line statement or a stored procedure call.
What is precedence constraint?
A precedence constraint is a link between 2 control flow tasks and lays down the condition on which the second task is run. They are used to control the workflow of the package. There are 3 kinds of precedence constraint – success (green arrow), failure (red arrow) or Completion script task (blue arrow). By default, when we add 2 tasks, it links by green arrow. The way the precedence constraint is evaluated can be based on outcome of the initial task. Also, we can add expression to evaluate such outcome. Any expression that can be judged as true or false can be used for such purpose. The precedence constraint is very useful in error handling in SSIS package.
Explain why variables called the most powerful component of SSIS
Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like:
- updating the properties at runtime,
- populating the query parameter value at runtime,
- used in script task,
- Error handling logic and
- With various looping logic.
Can we add our custom code in SSIS?
We can customize SSIS through code by using Script Task. The main purpose of this task is to control the flow of the package. This is very useful in the scenario where the functionality you want to implement is not available in existing control flow item.
To add your own code:
- In control flow tab, drag and drop Script Task from toolbox.
- Double click on script task to open and select edit to open Script task editor.
- In script task editor, there are 3 main properties
- General – Here you can specify name and description
- Script – through this we can add our code by clicking on Design Script button. The scripting language present is Net only.
- Expression
What are the Data viewers in SSIS?
Grid
Histogram
Scatter Plot
Column Chart
What are the some SSIS components?
- Control Flow Elements
- Data Flow Elements
- Integration Services Projects
- Integration Services Packages
- Integration Services Connections
- Integration Services Variables
- Integration Services Event Handlers
- Integration Services Log Providers
What are the command line tools to execute SQL Server Integration Services packages?
DTSEXECUI: When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package.
DTEXEC: This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of the SSIS package.
What is conditional split?
As the name suggest, this transformation splits the data based on condition and route them to different path. The logic for this transformation is based on CASE statement. The condition for this transformation is an expression. This transformation also provides us with default output, where rows matching no condition are routed. Conditional split is useful in scenarios like Telecom industry data you want to divide the customer data on gender, condition would be:
GENDER == ‘F’
Explain the SSIS Architecture?
Runtime engine: The Integration Services runtime saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions.
API or object model: The Integration Services object model includes managed application programming interfaces (API) for creating custom components for use in packages, or custom applications that create, load, run, and manage packages. Developer can write custom applications or custom tasks or transformations by using any common language runtime (CLR) compliant language.
Integration Services service: It is a Windows service, monitors running SSIS packages and manages the storage of packages.
Data flow: It contains a data flow engine that manages the data flow components. There are 3 types of
Data Flow components: Source components (which extracts the data from a system), Transformation components (performs transformations, modifications onto the extracted data) and Load components (which simply performs the data loading tasks into the destination systems). Besides the available data flow components, we can write our own custom data flow components to accomplish any custom requirements.
What is a breakpoint in SSIS?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.
Explain the use of containers in SSIS and also their types
Containers can be defined as objects that stores one or more tasks. The primary purpose of container is grouping logically related tasks. Once the task is placed into the containers, we can perform various operations such as looping on container level until the desired criterion is met. Nesting of container is allowed. Container is placed inside the control flow. There are 4 types of Container:
- Task Host container- Only one task is placed inside the container. This is default container.
- Sequence Container – This container can be defined as subset of package control flow.
- For loop container – Allows looping based on condition. Runs a control flow till condition is met.
- For each loop container – Loop through container based on enumerator.
What is XMLify component?
It is 3rd party free component used rather frequently to output errors into XML field which saves development time.
Why is the need for data conversion transformations?
This transformation converts the datatype of input columns to different datatype and then route the data to output columns. This transformation can be used to:
- Change the datatype
- If datatype is string then for setting the column length
- If datatype is numeric then for setting decimal precision.
This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merge.
Can you explain Error Handling in SSIS?
An error handler allows us to create flows to handle errors in the package in quite an easy way. Through event handler tab, we can name the event on which we want to handle errors and the task that needs to be performed when such an error arises. We can also add sending mail functionality in event of any error through SMTP Task in Event handler. This is quite useful in event of any failure in office non-working hours. In Data flow, we can handle errors for each connection through following failure path or red arrow.
What is the difference between control flow and data flow?
Control flow deals with orderly processing of individual, isolated tasks, these tasks are linked through precedence constraints in random order. Also the output for task has finite outcome i.e., Success, Failure, or Completion. A subsequent task does not initiate unless its predecessor has completed. Data flow, on the other hand, streams the data in pipeline manner from its source to a destination and modifying it in between by applying transformations. Another distinction between them is the absence of a mechanism that would allow direct transfer of data between individual control flow tasks. On the other hand, data flow lacks nesting capabilities provided by containers.
Â
What are different component of SSIS will you use?
In the data flow, we will use one OLE DB source, data conversion transformation and one OLE DB destination or SQL server destination. OLE DB source is data source is useful for reading data from Oracle, SQL Server and Access databases. Data Conversion transformation would be needed to remove datatype abnormality since there is difference in datatype between the two databases (Access and SQL Server) mentioned. If our database server is stored on and package is run from same machine, we can use SQL Server destination otherwise we need to use OLE DB destination. The SQL Server destination is the destination that optimizes the SQL Server.Â