What is Teradata?
Teradata is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations.
- It is compatible with industry standards (ANSI compliant).
- It is completely built on a parallel architecture.
- It is an open system, running on a UNIX MP-RAS or Windows server platform.
- It is capable of supporting many concurrent users from various client platforms.
Why Teradata?
Teradata database is an information repository supported by tools and utilities that make it, as part of Teradata Warehouse, a complete and active relational database management system.
There have plenty of reasons why customers like to choose Teradata.
- It supports larger warehouse data than all competitors combined.
- It can scale from 100 gigabytes to over 100+ petabytes of data on a single system without losing any performance. This is called Scalability.
- It provides a parallel-aware Optimizer that makes query tuning unnecessary to get a query to run.
- Automatic and even data distribution eliminates complex indexing schemes or time-consuming reorganizations.
- High availability of data because there is no single point of failure – fault tolerance is built-in to the system.
- It can handle the most concurrent users, who are often running multiple, complex queries.
- Designed and built with parallelism.
- Supports ad-hoc queries using SQL
- Single point of control for the DBA (Teradata Manager).
- Unconditional parallelism (parallel architecture)
- It provides the lowest total cost (TCO) of ownership
Can you explain the architecture of Teradata?
Teradata Architecture consists of three components.
Parsing Engine: It is receiving the query from the user, parses it and prepares the execution plan.
BYNET: It is receiving the execution plan from the Parsing Engine and dispatches to the appropriate AMP.
AMP: It is responsible for storing and retrieving rows. It stores the data in the virtual disk associated with it. In addition to this, AMP is responsible for lock management, space management, sorting and aggregation.
Can you define Primary Index?
The Primary Index is defined when the table is created. Every table must have at least one column as the Primary Index. There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.
What is Partitioned Primary Index (PPI) and discuss the Pros of using it in a query?
Partitioned Primary Index (PPI) is an indexing mechanism that facilitates to improve the performance of certain queries. Within a table with Partitioned Primary Index (PPI) defined, rows are sorted according to their partition number. Within each partition, records are arranged by their row hash. Also, rows are assigned to a partition based on the definition of a partition.
Pros of PPI:
- For several queries, PPI helps to avoid a full table scan and only required partitions are accessed.
- Prevents using the secondary index and it helps to avoid additional Input/output maintenance.
- Quick access to the subset of a large table.
- Easy to drop old data and add new data.
Can you explain Unique Primary Index (UPI)?
A Unique Primary Index (UPI) is unique and cannot have any duplicates. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column. A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table.       Â
Can you explain Non-Unique Primary Index (NUPI)?
A Non-Unique Primary Index (NUPI) means that the values for the selected column can be non-unique. Duplicate values can exist. A Non-Unique Primary Index will almost never spread the table rows evenly. An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.
Can you define Secondary Index?
A Secondary Index (SI) is an alternate data access path. It allows you to access the data without having to do a full-table scan. You can drop and recreate secondary indexes dynamically, as they are needed. Secondary Indexes are stored in separate sub tables that requires additional disk space and maintenance which is handled automatically by the system.
Can you define Join Index?
A Join Index is an optional index which may be created by a User. Join indexes provide additional processing efficiencies:
- Eliminate base table access
- Eliminate aggregate processing
- Reduce joins
- Eliminate redistributions
- Eliminate summary processing
Can you explain TPUMP?
TPump (Teradata Parallel Data Pump), as learned Fastload and Multiload are loads huge volume of data. But TPump loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table. TPump performs Inserts, Updates, Deletes and Upserts from Flat filed to populated Teradata tables at ROW LEVEL.
TPump supports:
- Secondary Indexes
- Referential Integrity
- Triggers
- Join indexes
- Pumpdata in at varying rates.
Can you define TPT?
The Teradata Parallel Transport (TPT) utility is combination of BTEQ, FastLoad, MultiLoad, Tpump, and FastExport utilities. So TPT can perform
- Insert data to tables
- Export data from tables
- Update tables
Can you define BTEQ? And features of BTEQ?
BTEQ (Batch TEradata Query) was the first utility and query tool for Teradata. BTEQ can be used as a Query tool, to load data a row at a time into Teradata and to export data off of Teradata a row at a time. BTEQ is pronounced Bee-Teeeek)
Features of BTEQ:
- BTEQ can be used to submit SQL in either a batch or interactive environment.
- BTEQ gives the outputs in a report format, where Queryman outputs data in a format more like a spreadsheet.
- BTEQ is an excellent tool for importing and exporting data.
What are the BTEQ Scripts?
Following is the list of terms commonly used in BTEQ scripts.
- IMPORT: This is specifying the input file path.
- EXPORT: This is specifying the output file path and initiates the export.
- LOGON: It is used to log into Teradata system.
- ACTIVITYCOUNT: This is returns the number of rows affected by the previous query.
- ERRORCODE: This is returns the status code of the previous query.
- DATABASE: This is Sets the default database.
- LABEL: This is assigns a label to a set of SQL commands.
- RUN FILE: This is executing the query contained in a file.
- GOTO: This is transfers control to a label.
- LOGOFF: This is logs off from database and terminates all sessions.
What are the different types of BTEQ Exports?
There are mainly 4 types of BTEQ Exports.
Export DATA: Users will export data to a flat file format. This is called Record Mode or DATA mode. If the data has no headers or white space between the data contained in each column and the data is written to the file in a normal format. (. EXPORT DATA)
Export INDICDATA: It is used to export data with extra indicator bytes to indicate NULLs in column for a row. (.EXPORT INDICDATA)
Export REPORT: In this mode the output of BTEQ export would return the column headers for the fields, white space, expanded packed or binary data.(.EXPORT REPORT)
Export DIF: This called as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.
How do you transfer large amount of data in Teradata?
Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer (Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.
BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.
Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
FastExport is used to export data from Teradata to the Host.
What are the steps involved after the disk manager passes the request?
The steps involved after the disk manager passes the request are as follows:
- File system that is used in Teradata returns the data that is requested by the database manager.
- Database manager uses the dispatcher to dispatch the messages back. It sends a message to the requesting user as well.
- BYNET is also known as single-node system uses the sorted response and returns to the requesting interface engine that is used for packaging system.
- Dispatcher is used to respond to the messages that are being sent by the clients and then it routes the communication channel driver to the requesting client.
- TDP receives the information and unpack the packed response messages and provide them to CLI.
- CLI passes the messages to the application that has requested it in the form of blocks and then the application respond back to the client with the data in the form of relational tables
What is OLTP?
OLTP (Online Transaction Processing) uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently, OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.
What is Multi Insert?
Inserting data records into the table using multiple insert statements. Putting a semi colon in front of the key word INSERT in the next statement rather than terminating the first statement with a semi colon achieves it.
What is Caching in Teradata?
Caching is primarily working with the source that does not change frequently. The system keeps frequently using the data blocks in a separate memory in the application. Also caching is shared within several applications.
What is a permanent Journal in Teradata?
Permanent journals keep track of data before or after applying the changes. This help to roll back or roll forward the table to a particular state. Permanent journals can be enabled at table level or database level.
What is UPSERT command?
It is used for the operation that needs to perform Update Else Insert natures. This is a specific option that is only available in Teradata.
What is filler command in Teradata?
While using the mload of fastload if you don’t want to load a particular filed in the data file to the target then use this filler command to achieve this.
Can you define Data Modeling?
A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements
Can you define Dimensional Modeling?
Dimensional Data Modeling comprises of one or more-dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts (measures) are stored in these tables.
What is set operators in Teradata?
Set operators are used to combine the result from multiple SELECT statements. This is different from joins because joins combine columns in multiple tables, but set operators combine multiple rows.
Below Set operators in Teradata:
- Union
- Union All
- Intersect
- Minus/Except
How to eliminate product joins in a Teradata SQL query?
Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
- Make sure you are referencing the correct alias.
- Also, if you have an alias, you must always reference it instead of a fully qualified table name.
- Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.
What are the types of tables in Teradata?
- Set table
- Multiset table
- Derived table
- Global temporary table (temporary table)
- Volatile table
Can you define Logical Data Model?
A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.
What is ODS?
An ODS (Operational Data Store) is primarily a “dump” of relevant information from a very small number of systems (often just one) usually with little or no transformation. The benefits are an ad hoc query database, which does not affect the operation of systems required to run the business. ODS’s usually deal with data “raw” and “current” and can answer a limited set of queries as a result.
What is Subject Area?
Subject area means fundamental entities that make up the major components of the business, e.g. customer, product, employee.
What is a Checkpoint?
Checkpoints are entries posted to a restart log table at regular intervals during the data transfer operation. If processing stops while a job is running, you can restart the job at the most recent checkpoint.
Can you define Data Warehousing?
A data warehouse is a subject oriented, integrated, time variant, non-volatile collection of data in support of management’s decision-making process.
What is a three-tier data warehouse?
The three-tier differs from the two-tier architecture by strictly enforcing a logical separation of the graphical user interface, business logic, and data. The three-tier is widely used for data warehousing today. Organizations that require greater performance and scalability, the three-tier architecture may be more appropriate. In this architecture, data extracted from legacy systems is cleansed, transformed, and stored in high –speed database servers, which are used as the target database for front-end data access.
Can you define GTT?
Global Temporary Table is also known as GTT. The global table is not much different than regular table you create with any database with few exceptions on data life. The table is global but not the data. What does that mean to you? The data on GTT is private meaning the session that inserts data can access it, no one else can access it. The table is accessible or say visible to anyone who has proper access to database and table.
Can you define ETL?
Extract, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database. The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system. Note that ETT (extract, transformation, transportation) and ETM (extraction, transformation, and move) are sometimes used instead of ETL.
What can be achieved by using the Teradata RDBMS?
The Teradata RDBMS is used by the users that are on the client system and sends the request to the system.
- The RDBMS requests can be sent through the TDP (Teradata Director Program) that uses the Call level interface (CLI) program.
- The transfer is being done by the ODBC (Open database connectivity and by the use of the ODBC drivers.
- Teradata RDBMS achieves high scalability and flexibility to, run the system and allow different system to communicate with each other.
- The level of security in Teradata RDBMS is quite high and the functionality can be achieved by the higher levels.
Can you explain Staging Area?
The data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the “back room” portion of the data warehouse environment. The data staging area is where the extract, transform and load (ETL) takes place and is out of bounds for end users. Some of the functions of the data staging area include:
- Extracting data from multiple legacy systems.
- Cleansing the data, usually with a specialized tool.
- Integrating data from multiple legacy systems into a single data warehouse.
- Transforming legacy system keys into data warehouse keys, usually surrogate keys.
- Transforming disparate codes for gender, marital status, etc., into the data warehouse standard.
What are differences between Teradata and ANSI session modes in Teradata?
Teradata mode:
- Comparison is not Case sensitive.
- Create table are default to SET tables.
- Each transaction is committed implicitly.
- Supports all Teradata commands.
- It follows BTET (Begin and End Transaction) Mode.
ANSI mode:
- Comparison is CASE sensitive.
- Create table are default to MULTISET tables.
- Each transaction has to be committed explicitly.
- Does not support all Teradata commands.
- It does not follow BTET Mode.
What are the features of Teradata?
Teradata contains a huge data processing system; Shared Nothing design thereby eliminating resource competition. Teradata is especially designed to accommodate massive information warehouse implementation and has several information deposition options embedded into the core of the info. With linear measurability (Software will scale linearly with hardware), unconditional correspondence, multi-faceted correspondence, intelligent information distribution, parallel-aware optimizer makes Teradata is capable of handling massive information and sophisticated queries.
- Unlimited Parallelism
- Robust Utilities
- Automatic Distribution
- Mature Optimizer
- SQL
- Linear Scalability
- Shared Nothing Architecture
- Connectivity
What are the basic rules that define how pi is defined in Teradata?
The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function:
- One Primary Index per table.
- A Primary Index value can be unique or non-unique.
- The Primary Index value can be NULL.
- The Primary Index value can be modified.
- The Primary Index of a populated table cannot be modified.
- A Primary Index has a limit of 64 columns.
What are the functions of a Teradata DBA?
Following are the different functions which a DBA can perform:
User Management: Creation and managing Users, Databases, Roles, Profiles and Accounts.
Space Allocation: Assigning Permanent Space, Spool Space and Temporary Space.
Access of Database Objects: Granting and Revoking Access Rights on different database objects.
Security Control: Handling logon and logoff rules for Users.
System Maintenance: Specification of system defaults, restart etc.
System Performance: Use of Performance Monitor (PMON), Priority Scheduler and Job Scheduling.
Resource Monitoring: Database Query Log (DBQL) and Access Logging.
Data Archives, Restores and Recovery: ARC Utility and Permanent Journals.
Can you explain Teradata database Data Protection?
There are several types of data protection are available with the Teradata Database. Some processes for protection are automatic and some of them are optional. There are
RAID: Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level.
Cliques: A clique is a group of nodes that share access to the same disk arrays. Each multi-node system has at least one clique.
Hot Standby Nodes: A Hot Standby Node (HSN) is a node that is a member of a clique that is not configured (initially) to execute any Teradata vprocs.
Fallback: Fallback is a Teradata Database feature that protects data in the case of an AMP vproc failure.
Journaling: Journals are kept on the system to provide data availability in the event of a component or process failure in the system
Locks: Locking prevents multiple users who are trying to access or change the same data simultaneously from violating data integrity. This concurrency control is implemented by locking the target data.
Â