Teradata Interview Questions and Answers

Teradata Corporation is a provider of database-related products and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech and Citibank’s advanced technology group. The company was acquired by NCR Corporation in 1991, and subsequently spun-off again as an independent public company on October 1, 2007.

Teradata began to market with the term “big data” in 2010. CTO Stephen Brobst attributed the rise of big data to “new media sources, such as social media. The increase in semi-structured and unstructured data gathered from online interactions prompted Teradata to form the “Petabyte club” in 2011 for its heaviest big data users. (Wikipedia says)

What is Teradata?

Teradata is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations.

It is compatible with industry standards (ANSI compliant).
It is completely built on a parallel architecture.
It is an open system, running on a UNIX MP-RAS or Windows server platform.
It is capable of supporting many concurrent users from various client platforms.

Why Teradata?

Teradata database is an information repository supported by tools and utilities that make it, as part of Teradata Warehouse, a complete and active relational database management system.

There have plenty of reasons why customers like to choose Teradata.

It supports larger warehouse data than all competitors combined.
It can scale from 100 gigabytes to over 100+ petabytes of data on a single system without losing any performance. This is called Scalability.
It provides a parallel-aware Optimizer that makes query tuning unnecessary to get a query to run.
Automatic and even data distribution eliminates complex indexing schemes or time-consuming reorganizations.
High availability of data because there is no single point of failure – fault tolerance is built-in to the system.
It can handle the most concurrent users, who are often running multiple, complex queries.
Designed and built with parallelism.
Supports ad-hoc queries using SQL
Single point of control for the DBA (Teradata Manager).
Unconditional parallelism (parallel architecture)
It provides the lowest total cost (TCO) of ownership

Can you explain the architecture of Teradata?

Teradata Architecture consists of three components.

Parsing Engine: It is receiving the query from the user, parses it and prepares the execution plan.

BYNET: It is receiving the execution plan from the Parsing Engine and dispatches to the appropriate AMP.

AMP: It is responsible for storing and retrieving rows. It stores the data in the virtual disk associated with it. In addition to this, AMP is responsible for lock management, space management, sorting and aggregation.

Can you define Primary Index?

What is Partitioned Primary Index (PPI) and discuss the Pros of using it in a query?

Partitioned Primary Index (PPI) is an indexing mechanism that facilitates to improve the performance of certain queries. Within a table with Partitioned Primary Index (PPI) defined, rows are sorted according to their partition number. Within each partition, records are arranged by their row hash. Also, rows are assigned to a partition based on the definition of a partition.

Pros of PPI:

For several queries, PPI helps to avoid a full table scan and only required partitions are accessed.
Prevents using the secondary index and it helps to avoid additional Input/output maintenance.
Quick access to the subset of a large table.
Easy to drop old data and add new data.

Can you explain Unique Primary Index (UPI)?

A Unique Primary Index (UPI) is unique and cannot have any duplicates. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column. A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table.

Can you explain Non-Unique Primary Index (NUPI)?

Can you define Secondary Index?

Can you define Join Index?

Can you explain TPUMP?

TPump (Teradata Parallel Data Pump), as learned Fastload and Multiload are loads huge volume of data. But TPump loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table. TPump performs Inserts, Updates, Deletes and Upserts from Flat filed to populated Teradata tables at ROW LEVEL.

TPump supports:

Secondary Indexes
Referential Integrity
Triggers
Join indexes
Pumpdata in at varying rates.

Can you define TPT?

Can you define BTEQ? And features of BTEQ?

BTEQ (Batch TEradata Query) was the first utility and query tool for Teradata. BTEQ can be used as a Query tool, to load data a row at a time into Teradata and to export data off of Teradata a row at a time. BTEQ is pronounced Bee-Teeeek)

Features of BTEQ:

BTEQ can be used to submit SQL in either a batch or interactive environment.
BTEQ gives the outputs in a report format, where Queryman outputs data in a format more like a spreadsheet.
BTEQ is an excellent tool for importing and exporting data.

What are the BTEQ Scripts?

Following is the list of terms commonly used in BTEQ scripts.

IMPORT: This is specifying the input file path.
EXPORT: This is specifying the output file path and initiates the export.
LOGON: It is used to log into Teradata system.
ACTIVITYCOUNT: This is returns the number of rows affected by the previous query.
ERRORCODE: This is returns the status code of the previous query.
DATABASE: This is Sets the default database.
LABEL: This is assigns a label to a set of SQL commands.
RUN FILE: This is executing the query contained in a file.
GOTO: This is transfers control to a label.
LOGOFF: This is logs off from database and terminates all sessions.

What are the different types of BTEQ Exports?

There are mainly 4 types of BTEQ Exports.

Export DATA: Users will export data to a flat file format. This is called Record Mode or DATA mode. If the data has no headers or white space between the data contained in each column and the data is written to the file in a normal format. (. EXPORT DATA)

Export INDICDATA: It is used to export data with extra indicator bytes to indicate NULLs in column for a row. (.EXPORT INDICDATA)

Export REPORT: In this mode the output of BTEQ export would return the column headers for the fields, white space, expanded packed or binary data.(.EXPORT REPORT)

Export DIF: This called as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.

How do you transfer large amount of data in Teradata?

Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer (Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.

BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.

Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.

FastExport is used to export data from Teradata to the Host.

What are the steps involved after the disk manager passes the request?

The steps involved after the disk manager passes the request are as follows:

File system that is used in Teradata returns the data that is requested by the database manager.
Database manager uses the dispatcher to dispatch the messages back. It sends a message to the requesting user as well.
BYNET is also known as single-node system uses the sorted response and returns to the requesting interface engine that is used for packaging system.
Dispatcher is used to respond to the messages that are being sent by the clients and then it routes the communication channel driver to the requesting client.
TDP receives the information and unpack the packed response messages and provide them to CLI.
CLI passes the messages to the application that has requested it in the form of blocks and then the application respond back to the client with the data in the form of relational tables

What is OLTP?

OLTP (Online Transaction Processing) uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently, OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.

What is Multi Insert?

What is Caching in Teradata?

What is a permanent Journal in Teradata?

What is UPSERT command?

What is filler command in Teradata?

Can you define Data Modeling?

Can you define Dimensional Modeling?

What is set operators in Teradata?

How to eliminate product joins in a Teradata SQL query?

Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

Make sure you are referencing the correct alias.
Also, if you have an alias, you must always reference it instead of a fully qualified table name.
Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.

What are the types of tables in Teradata?

Can you define Logical Data Model?

A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

What is ODS?

An ODS (Operational Data Store) is primarily a “dump” of relevant information from a very small number of systems (often just one) usually with little or no transformation. The benefits are an ad hoc query database, which does not affect the operation of systems required to run the business. ODS’s usually deal with data “raw” and “current” and can answer a limited set of queries as a result.

What is Subject Area?

What is a Checkpoint?

Can you define Data Warehousing?

What is a three-tier data warehouse?

The three-tier differs from the two-tier architecture by strictly enforcing a logical separation of the graphical user interface, business logic, and data. The three-tier is widely used for data warehousing today. Organizations that require greater performance and scalability, the three-tier architecture may be more appropriate. In this architecture, data extracted from legacy systems is cleansed, transformed, and stored in high –speed database servers, which are used as the target database for front-end data access.

Can you define GTT?

Global Temporary Table is also known as GTT. The global table is not much different than regular table you create with any database with few exceptions on data life. The table is global but not the data. What does that mean to you? The data on GTT is private meaning the session that inserts data can access it, no one else can access it. The table is accessible or say visible to anyone who has proper access to database and table.

Can you define ETL?

Extract, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database. The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system. Note that ETT (extract, transformation, transportation) and ETM (extraction, transformation, and move) are sometimes used instead of ETL.

What can be achieved by using the Teradata RDBMS?

The Teradata RDBMS is used by the users that are on the client system and sends the request to the system.

The RDBMS requests can be sent through the TDP (Teradata Director Program) that uses the Call level interface (CLI) program.
The transfer is being done by the ODBC (Open database connectivity and by the use of the ODBC drivers.
Teradata RDBMS achieves high scalability and flexibility to, run the system and allow different system to communicate with each other.
The level of security in Teradata RDBMS is quite high and the functionality can be achieved by the higher levels.

Can you explain Staging Area?

The data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the “back room” portion of the data warehouse environment. The data staging area is where the extract, transform and load (ETL) takes place and is out of bounds for end users. Some of the functions of the data staging area include:

Extracting data from multiple legacy systems.
Cleansing the data, usually with a specialized tool.
Integrating data from multiple legacy systems into a single data warehouse.
Transforming legacy system keys into data warehouse keys, usually surrogate keys.
Transforming disparate codes for gender, marital status, etc., into the data warehouse standard.

What are differences between Teradata and ANSI session modes in Teradata?

Teradata mode:

Comparison is not Case sensitive.
Create table are default to SET tables.
Each transaction is committed implicitly.
Supports all Teradata commands.
It follows BTET (Begin and End Transaction) Mode.

ANSI mode:

Comparison is CASE sensitive.
Create table are default to MULTISET tables.
Each transaction has to be committed explicitly.
Does not support all Teradata commands.
It does not follow BTET Mode.

What are the features of Teradata?

Teradata contains a huge data processing system; Shared Nothing design thereby eliminating resource competition. Teradata is especially designed to accommodate massive information warehouse implementation and has several information deposition options embedded into the core of the info. With linear measurability (Software will scale linearly with hardware), unconditional correspondence, multi-faceted correspondence, intelligent information distribution, parallel-aware optimizer makes Teradata is capable of handling massive information and sophisticated queries.

Unlimited Parallelism
Robust Utilities
Automatic Distribution
Mature Optimizer
SQL
Linear Scalability
Shared Nothing Architecture
Connectivity

What are the basic rules that define how pi is defined in Teradata?

What are the functions of a Teradata DBA?

Following are the different functions which a DBA can perform:

User Management: Creation and managing Users, Databases, Roles, Profiles and Accounts.

Space Allocation: Assigning Permanent Space, Spool Space and Temporary Space.

Access of Database Objects: Granting and Revoking Access Rights on different database objects.

Security Control: Handling logon and logoff rules for Users.

System Maintenance: Specification of system defaults, restart etc.

System Performance: Use of Performance Monitor (PMON), Priority Scheduler and Job Scheduling.

Resource Monitoring: Database Query Log (DBQL) and Access Logging.

Data Archives, Restores and Recovery: ARC Utility and Permanent Journals.

Can you explain Teradata database Data Protection?

There are several types of data protection are available with the Teradata Database. Some processes for protection are automatic and some of them are optional. There are

RAID: Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level.

Cliques: A clique is a group of nodes that share access to the same disk arrays. Each multi-node system has at least one clique.

Hot Standby Nodes: A Hot Standby Node (HSN) is a node that is a member of a clique that is not configured (initially) to execute any Teradata vprocs.

Fallback: Fallback is a Teradata Database feature that protects data in the case of an AMP vproc failure.

Journaling: Journals are kept on the system to provide data availability in the event of a component or process failure in the system

Locks: Locking prevents multiple users who are trying to access or change the same data simultaneously from violating data integrity. This concurrency control is implemented by locking the target data.