How-to: Create a CDH Cluster on Amazon EC2 via Cloudera Manager
Editor’s Note (added Feb. 25, 2015): For releases beyond 4.5, Cloudera recommends the use of Cloudera Director for deploying CDH in cloud environments. Cloudera Manager includes a new express...
View ArticleHow-to: Use Vagrant to Set Up a Virtual Hadoop Cluster (updated for CDH 5)
This guest post, which is now updated for CDH 5, comes to us from David Greco. Vagrant is a very nice tool for programmatically managing many virtual machines (VMs) on a single physical machine. It...
View ArticleWhere to Find Cloudera Tech Talks Through June 2013
It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to...
View ArticleThe Platform for Big Data is Here
It has been an exciting couple of days for new product announcements at Cloudera — exciting especially for me as the edges of the new platform for big data we have been talking about since Strata +...
View ArticleCloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL...
In October 2012, we introduced the Impala project, at that time the first known effort to bring a modern, open source, distributed SQL query engine to Apache Hadoop. Our release of source code and a...
View ArticleCustomer Spotlight: Six3 Systems’ Wayne Wheeles Drives Cyber Security...
This week represents quite a milestone for Cloudera and, at least we’d like to believe, the Hadoop ecosystem at large: the general availability release of Cloudera Impala. Since we launched the Impala...
View ArticleHow-to: Automate Your Hadoop Cluster from Java
One of the complexities of Apache Hadoop is the need to deploy clusters of servers, potentially on a regular basis. At Cloudera, which at any time maintains hundreds of test and development clusters in...
View ArticleHBaseCon 2013: "Operations" Track Preview
As you have probably learned by now, HBaseCon 2013 sessions are organized into four tracks: Operations, Internals, Ecosystem, and Case Studies. In combination, they offer a 360-degree view of Apache...
View ArticleCDH 4.3 is Released!
I’m pleased to announce that CDH 4.3 is released and available for download. This is the third quarterly update to our GA shipping CDH 4 line and the 17th significant release of our 100% open source...
View ArticleWelcome, Tom!
We announced a leadership change at Cloudera today. Tom Reilly, formerly CEO at Arcsight, is joining us in my old role – CEO – and I am assuming two new posts: Chief Strategy Officer and Chairman of...
View ArticleDemo: The New Search App in Hue 2.4
In version 2.4 of Hue, the open source Web UI that makes Apache Hadoop easier to use, a new app was added in addition to more than 150 fixes: Search! Using this app, which is based on Apache Solr, you...
View ArticleCongrats to Explorys, A Computerworld Honors Laureate for Big Data
The following guest post is courtesy of Doug Meil, Chief Architect at Explorys, Apache HBase Committer/PMC Member, and Champion of Big Data: On June 3, 2013, I represented Explorys at the Computerworld...
View ArticleHow-to: Use the Apache Oozie REST API
Apache Oozie has a Java client and a Java API for submitting and monitoring jobs, but what if you want to use Oozie from another language or a non-Java system? Oozie provides a Web Services API, which...
View ArticleHow HiveServer2 Brings Security and Concurrency to Apache Hive
Apache Hive was one of the first projects to bring higher-level languages to Apache Hadoop. Specifically, Hive enables the legions of trained SQL users to use industry-standard SQL to process their...
View ArticleHow Improved Short-Circuit Local Reads Bring Better Performance and Security...
One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other...
View ArticleHow Cloudera Ensures HBase Client API Compatibility in CDH
Apache HBase supports three primary client APIs that developers can use to bind applications with HBase: the Java API, the REST API, and the Thrift API. Therefore, as developers build apps against...
View ArticleWhat I Learned During My Summer Internship at Cloudera, Part 2
The guest post below is from Wei Yan, a 2013 summer intern at Cloudera. In this post, he helpfully describes his personal projects from this summer. Thanks for your contributions, Wei! As a Ph.D....
View ArticleHow-to: Use the HBase Thrift Interface, Part 1
There are various way to access and interact with Apache HBase. Most notably, the Java API provides the most functionality. But some people want to use HBase without Java. Those people have two main...
View ArticleHow-to: Use HBase Bulk Loading, and Why
Apache HBase is all about giving you random, real-time, read/write access to your Big Data, but how do you efficiently get that data into HBase in the first place? Intuitively, a new user will try to...
View ArticleThis Month in the Ecosystem (September 2013)
Welcome to our third edition of “This Month in the Ecosystem,” a digest of highlights from September 2013 (never intended to be comprehensive; for completeness, see Hadoop Weekly). Note: there were a...
View Article