What is Hive?
Hive is a data warehouse software which is used for facilitates querying and managing large data sets residing in distributed storage. Hive language almost look like SQL language called HiveQL. Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS)
What is Hive Metastore?
Hive Meta store is a database that stores metadata of your hive tables like table name, column name, data types, table location, number of buckets in the table etc.
What are the different types of tables available in Hive?
There are two types. Managed table and external table. In managed table both the data a schema in under control of hive but in external table only the schema is under control of Hive.
Why do we need Hive?
Hive is a tool in Hadoop ecosystem which provides an interface to organize and query data in a databse like fashion and write SQL like queries. It is suitable for accessing and analyzing data in Hadoop using SQL syntax.
What is Hive Installation Path?
export HIVE_HOME=/home/hadoop/work/hive-x.y.z
export PATH=$PATH:$HIVE_HOME/bin
What is the default location where hive stores table data?
hdfs://namenode_server/user/hive/warehouse
How Facebook Uses Hadoop, Hive and Hbase?
Facebook data stored on HDFS, everyday millions of photos uploaded into Facebook with the help of Hadoop
- Facebook Messages, Likes and statues updates running on top of Hbase
- Hive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.
What is Apache Hcatalog?
HCatalog is built on top of the Hive metastore and incorporates Hive’s DDL. Apache Hcatalog is a table and data management layer for hadoop, we can process the data on Hcatalog by using Apache pig, Apache Mapreduce and Apache Hive. There is no need to worry in Hcatalog where data is stored and which format of data generated. HCatalog displays data from RC File format, text files, or sequence files in a tabular view. It also provides REST APIs so that external systems can access these tables’ metadata.
What is the work of Hive/Hcatalog?
Hive/HCatalog also enables sharing of data structure with external systems including traditional data management tools.
What are collection data types in Hive?
There are three collection data types in Hive.
What is a Hive variable? What for we use it?
The hive variable is variable created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.
What is the importance of .hiverc file?
It is a file containing list of commands needs to run when the hive CLI starts. For example, setting the strict mode to be true etc.
Does the archiving of Hive tables give any space saving in HDFS?
No. It only reduces the number of files which becomes easier for namenode to manage.
How can Hive avoid mapreduce?
If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.
What is bucketing?
The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.
What is a generic UDF in hive?
It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.
I did’t found Spark content