This week Cloudera is officially declaring Impala to be generally available and ready for deployment.
“Impala is a parallel database query engine that runs natively on top of the Hadoop platoform,” Charles Zedlewski, VP of Product at Cloudera, explained to Enterprise Apps Today. “It runs natively on Hadoop storage, with the same schema, Hive metastore and file formats.”
Zedlewski stressed that Impala can be deployed in an existing Hadoop cluster without the need for any architectural changes. In that way, Impala can be used as an overlay for existing data sets.
“It is SQL and the first word in SQL is structured, so I don’t want to pretend that you can point Impala at a bunch of video files for a query,” Zedlewski said. “But Impala does not require that you have to massage your structured data into a special format.”
Impala was started as an open source project licensed under the Apache license and it remains as such today. Cloudera has added a commercially supported offering called RTQ, which provides additional monitoring and management interfaces.