1. Why Impala¶
flexibliity for your big data flow high-performance analytics exploratory business intelligence
extrac-transform-load(ETL) BI: Business intelligence
2. Getting Up and Running with Impala¶
Cloudera live demo
A view is an alias for a longer query, and takes no time or storage to set up
3. Impala for the database developer¶
OLTP-style(online transaction processing)
impala implements SQL-92 standard features for queries, with som enhancements from later SQL standards Hadoop Distributed File System(HDFS)
- Impala currently doesn’t have OLTP-style data manipulation language (DML) such as DELETE or UPDATE.
- Impala also does not have indexes, constraints or foreign keys.
- No transactions
impala can very effeciently perform full table scans of large tables.
HDFS Storage Model: CDH: Cloudera Distribution with Hadoop Parquet File Format: binary file format
4. Common Developer Tasks for Impala¶
ETL(Extract-trnasform-load)
Make sure always close query handles when finished(release memory) JDBC or ODBC
with Impala, the biggest I/O savings com from using partitioned tables and choosing the most appropriate file format
Impala partitioned tables are just HDFS directories UDF(user defined functions)