What is Netezza?
Netezza is meant to handle extraordinarily complicated business scenario with higher performance. it’s a knowledge warehousing appliance which may method your terabytes of data in glimpse. Netezza is kind of a brand-new Appliance when put next to some already existing big player within the field. it absolutely was founded in 2000, Head office was located in Marlborough. The appliance has evolved with important changes within the hardware in addition as software front in these years. Netezza is a targeted appliance ideal for (but not restricted to) departmental data warehouse solutions and for satellite data marts that extend an enterprise data warehouse to edge applications. Netezza used to get the hardware built from a 3rd party at first. Later because the company expanded, Netezza has evolved and captive fully over to IBM hardware.
What is DWA?
DWA (Data Warehouse Appliance) is an integration of database, server and storage technology. A data warehouse appliance is a combination hardware and software product that is designed specifically for analytical processing. An appliance allows the purchaser to deploy a high-performance data warehouse right out of the box. Data warehouse appliances use Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), and OLE DB interfaces to integrate with other extract-transform-load (ETL) tools and business intelligence (BI) or business analytic (BA) applications.
Can you explain the usage of num_star_planner_rels in Netezza?
When a query is executed, the optimizer come back up with plans to hitch tables within the most effective sequence. during this method, it tries to search out all doable permutation and combination to join the tables. However, optimizer will limit range|the amount|the quantity} of permutations it will contemplate once there are an oversized number of tables involved within the query.
The parameter num_star_planner_rels helps to limit the number of permutations to come up with most efficient plan. Higher the number and more the time optimizer will take to come up with best plan however it helps to generate efficient plan. Lower the value and lower attempt will be made to get the best plan. Default the value of this variable is set to 8 on NPS system.
Can you explain enable_2phase_planner variable in Netezza?
“enable_2phase_planner” basically divide the main plan into two plans namely pre-broadcast and main plan. Optimizer identifies the small tables in the queries and broadcast them so as to join with large tables. Optimizer will collect the information about the broadcasted temporary tables at first stage and used this information to create main plan.
Explain which file can be used to change the value of Netezza optimizer variables?
/nz/data/postgresql.conf OR /nz/data.1.0/postgresql.conf file is used to set / unset the optimizer related flag/parameter for Netezza system. Both the location point to the same file.
We do not necessarily need to restart the NPS to bring the changes in affect. NPS read the file “postgresql.conf” once in a day and implement the changes from file. However one will need to restart the NPS (nzstart/nzstop) if immediate change is required.
Can you explain different Port numbers used by Netezza server?
Netezza system uses the following port numbers or environmental variables for the CLI commands and the NzAdmin tool. Following table lists the default ports, corresponding environment variables and short description:
- Port No:5480NZ_DBMS_PORT The postgres port for the nzsql command, NzAdmin ODBC, and JDBC.
- Port No 5481: NZ_CLIENT_MGR_PORT The port for the CLI and NzAdmin tool messaging.
- Port No 5482: NZ_LOAD_MGR_PORT (Prior to Release 3.1, this port handled loads. As of Release 3.1, this port is not required.)
- Port No 5483: NZ_BNR_MGR_PORT The port for the nzbackup and nzrestore commands.
Netezza SSH Port: By default, you can use the port 22 for SSH connection (putty) if remote service access is granted to connect to Netezza server.
Netezza FTP Ports: By default, you can use the port 20 and 21 for FTP connection if remote service access is granted to connect to Netezza server.
Can you explain Zomemap?
Zonemap is very powerful tool to boost the performance of query and it works very well with FPGA. Conceptually Zonemap appears to be similar as Block indexing in DB2 and partitions in Oracle. For a high-level understanding, zonemap is basically a catalog table which stores the information about minimum and maximum of any block. Zonemaps are most efficient when data is stored in sorted order.
Explain the Undo / recover from a Delete, Update, or Insert in Netezza?
The great thing about Netezza is it does “soft” deletes until the groom process runs. This means your deleted data is really still there. This is of course very important if you just deleted something you didn’t mean too. And since “updates” are really just a delete of the old values and an insert of the new behind the scenes, you can actually “undo” updates too.
So first a little about how Netezza handles transactions. Every insert, update, or delete transaction is assigned a sequential transaction id or xid. You can see this id by querying the column createxid. Also, there is a column to indicate deleted rows called deletexid. This is set to 0 if this is a readable, not deleted row. When the row gets deleted, this column gets populated with the transaction id assigned to the delete or update statement. And as I mentioned above, updates are handled by doing a delete of the old record and inserting the entire new row with the new values.
Under normal circumstances when you run a select you will not see rows that have a deletexid not equal to zero. But there is a simple session variable that can be set that allows you to see these deleted rows.
Can you explain Netezza Implicit?
Netezza Implicit Skew is the Netezza skew that occurs within the database when processing large data sets. The implicit Netezza skew are very difficult to identify. Netezza implicit skew is occurs when data get redistributed or broadcasted on some other column to perform join operations. Data will be redistributed or broadcasted to perform co-located joins. The column on which data get redistributed or joined could be skewed that is, most of the redistributed data get inserted to in few or single disk (SPU) on the system. These types of skews can only occur in run-time when we join the two large tables using SQL.
Can you explain Netezza emulator?
This is an emulated Netezza appliance that runs both a virtual host and a virtual S-Blade. It is a totally functional system to be used for initial development and testing only. it’s solely offered to customers and partners via the IBM Netezza Developer Network (NDN) website on developer Works. as a result of it’s thought of sensitive material a customer or partner has to be vetted in order to download it.
Can you define dynamic session partition?
A dynamic session partition is where the Integration Service scales the number of session partitions at runtime. The number of partitions is based on a number of factors including number of nodes in a grid or source database partitions.
Can you define snippet?
A snippet is a small unit of work that are carried out in SPU.
Can you explain Distribution key?
It is a column based on which data get distributed on the disks. Quite similar to partition key in DB2. System calculate the hash for value of distribution column and later divide it with the total number of disk. Row will be placed on the disk based on the remainder value.
How do I find the timestamp of last time procedure updated?
One will need following SQL to find out the timestamp of last time procedure got modified:
Connect to the database in which procedure resides and execute following SQL:
select OBJCREATED, OBJMODIFIED from _t_object where objid = (select OBJID from _v_procedure where PROCEDURE = ‘NAME’);
Can you explain creating materialized views?
A materialized read reduces the width (number of columns) of data being scanned within the base table by making a thin version (fewer columns) of the bottom table that contains a little set of frequently queried columns.
Explain does enabling of ENABLE_FACTREL_PLANNER variable help on Netezza system?
ENABLE_FACTREL_PLANNER is a variable which TURN ON AND OFF the ability of Netezza system to identify FACT tables based on number of rows.
“FACTREL_SIZE_THRESHOLD” variable define the number of rows and helps the system to identify which tables should be treated as FACT tables, and which tables should be treated as SMALL tables.
What is ERROR: Memory exhausted in AllocSetAlloc?
This error “Memory exhausted on AllocSetAlloc()” is an Out-of-memory condition on the host. Few of the likely reason is a bad plan OR complex queries. In few instances, this error occurs before the system generates a plan file so the query did not actually get to the point where it started to run.
How do you redistribute a table?
Use Create Table As (CTAS) to redistribute the data in a table. While creating the new table specify the distribute on clause to distribute the data on the new columns.
Can you define real-time processing?
Data sources such as JMS, WebSphere MQ, MSMQ, TIBCO, web Methods, SQP, and webservices can publish data in real-time. These real-time sources can be leveraged by Informatica Power Centre to process data on-demand. A session can be specifically configured for real-time processing.
What are the commands of Netezza backup?
Netezza provides following backup utilities for backing up to a disk or tape media:
nzbackup/nzrestore: These commands are the part of NPS. They can help to backup and restore to/from filesystem/SAN, Veritas Netbackup, Tivoli Storage Manager. This is an ideal way to do a backup and restore.
nz_backup/nz_restore: These are provided as a part of the nz-support tools provided by Netezza support team. These tools can handle single tables, multiple tables or entire databases. These tools can also generate ASCII delimited backups for portability to non-Netezza systems. Behind the screen, it is just executing the SQL to unload the table and load the table in another database/system.
What does error SPU Oversubscribed memory and Out of memory refers in Netezza system?
“SPU Oversubscribed memory” occurred when SPU swap space is full i.e. swap does not have enough space to process the load. This can be due to bad stats generating bad plan, skewed intermediate tables during processing etc…”Out of memory” error is observed if Shared memory on SPU is full.
Explain data warehouse appliances you know?
- Netezza
- Greenplum
- Vertica
- Teradata
- Dataupia
- Oracle Exadata
What are the different types of pushdown optimization that can be configured?
Source-side pushdown optimization: This Service pushes as much transformation logic as possible to the source database.
Target-side pushdown optimization: This Service pushes as much transformation logic as possible to the target database.
Full pushdown optimization: This service attempt to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization
Explain the load achieved in Netezza and why is that quick / fast?
Loads by pass a few steps that typically a query would go through (a query goes through plan generation, optimization and transaction management). Loads are done in terms of “sets” and this set is based on underlying table structure (thus loads for two different tables are different as their sets are based on table structures). Data is processed to check format and distribution of records calculated very quickly (in one step), fills into ‘set’ structure and writes to storage structure. Storage also performs space availability and other admin TASKS; all these operations go pretty quick (think of them as UNIX named pipes that streams data and SPU store these records).
Why NOT NULL specification is better in Netezza?
Specifying not null results in better performance as NULL values are tracked at row header level. Having NULL values results in storing references to NULL values in header. If all columns are NOT NULL, then there is no record header.
What is the user data capacity TF-12 (1000-6) Netezza system?
Netezza Tf-12 provides 32 TB user data (uncompressed) storage space. For TF-6, user data space will be 16TB and TF-24 should be able to store 64 TB of data.
List out the t databases are we able to configure pushdown optimization?
- IBM DB2
- IBM Netezza
- Microsoft SQL Server
- Oracle
- Sybase ASE
- Teradata
- Databases that use ODBC drivers
Can you explain we update all columns in a Netezza table.?
No, the column that is used in distribution clause cannot be used for updates. Remember, up to four columns can be used for distribution of data on SPU. From practical sense, updating distribution columns result in redistribution of data; the single most performance hit when large table is involved. This restriction makes sense.
Can a group become a member of another group in Netezza user administration? Can we use same group name for databases?
In Netezza, public group is formed automatically and each one could be a member of this group by default. we are able to produce as several groups and any user are often member of any group(s). group can’t be a member of another group. group names, user names and database names area unit distinctive. That is, we cannot have a database known as sales and a group conjointly called sales.
Can you explain the different states of Netezza?
Online: It is Normal or usual state.
Offline: It is waits for completion of current queries, new queries in queue receives error.
Stopped: Netezza will shut down after completing current queries, no new queries allowed.
Paused: Same as above, but no error displayed. Typically caused during Netezza boot up or startup.
Down: Just plain down, could be due to Netezza server problem or user initiated.
How FPGA can be helpful in improving query performance?
While reading data from the disk, the Field Programmable Gate Array (FPGA) on each SPU filters out unwanted data. This process of data elimination removes IO bottlenecks and frees up downstream components such as the CPU, memory and network from processing extra data.
Can you define nzload?
Nzload utility is used load data from a file into a table. It is used to load bulk data quickly and simultaneously rejects erroneous content
Can you define collocated join?
When you join tables which are distributed on the same key and used these key columns in the join condition, then each SPU in netezza works 100% independent of the other, as the required data is available in itself. This type of joins is called collocated joins.
Can you explain Netezza SQL?
IBM NetezzaSQL is the Netezza Structured Query Language (SQL), which runs on the Netezza data warehouse appliance. Throughout this publication, the term SQL is the Netezza SQL implementation. Several standards relate to the definition of Netezza SQL
Can you explain Netezza Datawarehouse appliance (DWA)?
The IBM Netezza data warehouse appliance includes a highly optimized SQL that is called IBM Netezza Structured Query Language (SQL). You can use the SQL commands to create and manage your Netezza databases, user access and permissions for the databases, and to query the contents of the databases
Netezza provides several components for loading data into the appliance.
The components are as follows:
- External tables
- The nzload command
- File format options
- Backup and restore utilities
- The nz_migrate utility
Can you explain advanced Security in Neetezza?
The advanced security feature on the IBM® Netezza® system has the following areas:
- User login control provides session context and access restrictions.
- Masquerading provides a way for a user to operate as another user, with all the privileges of that user.
- Advanced query history captures details about the user activity on the Netezza system, such as the queries that are run, query plans, table access, column access, session creation, and failed authentication requests.
- Multi-Level Security (MLS) is an abstract security model, which Netezza uses to define rules to control access to table rows.
User-defined functions (UDFs) and user-defined aggregates (UDAs) run from within the Netezza security domain and might gain access to data without database access controls. Carefully control the UDFs and UDAs that are installed on the system.
(Source: Netezza and wiki)