dcsimg
RealTime IT News

Impala Big Data Hadoop Open Source Querry Engine Debuts

This week Cloudera is officially declaring Impala to be generally available and ready for deployment.

"Impala is a parallel database query engine that runs natively on top of the Hadoop platoform," Charles Zedlewski, VP of Product at Cloudera, explained to Enterprise Apps Today. "It runs natively on Hadoop storage, with the same schema, Hive metastore and file formats."

Zedlewski stressed that Impala can be deployed in an existing Hadoop cluster without the need for any architectural changes. In that way, Impala can be used as an overlay for existing data sets.

"It is SQL and the first word in SQL is structured, so I don't want to pretend that you can point Impala at a bunch of video files for a query," Zedlewski said. "But Impala does not require that you have to massage your structured data into a special format."

Impala was started as an open source project licensed under the Apache license and it remains as such today. Cloudera has added a commercially supported offering called RTQ, which provides additional monitoring and management interfaces.

Read the full story at Enterprise Apps Today:
Cloudera Accelerates Big Data with Impala GA

Sean Michael Kerner is a senior editor at InternetNews.com. Follow him on Twitter @TechJournalist.