The Smart Grid: Hadoop at the Tennessee Valley Authority (TVA)
For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy...
View ArticleApache Hadoop HA Configuration
Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations. One of the things we get a lot of questions about is how to...
View ArticleHadoop World: Security and API Compatibility
Today’s Hadoop World talk comes from Owen O’Malley and talks about some of the biggest challenges facing Hadoop: Security and API Compatibility. Over the past several months, Yahoo! has been leading...
View ArticleCDH3 Beta 1 Now Available
It’s official – Cloudera’s Distribution for Hadoop Version 2, which we often shorthand as CDH2, has been released. CDH2 is the product we recommend to our current production customers. It’s a stable...
View ArticleHighlights from the First Hadoop Contributors Meeting
While the vast majority of the Hadoop development discussion takes place on the Apache Jira and various project mailing lists, it’s often useful to meet face to face for high bandwidth discussion. To...
View ArticleWhat’s New in CDH3b2: Core Hadoop
In this post I’ll cover some of the larger or more significant changes that have gone into core Hadoop in CDH3 beta 2. The Hadoop in CDH3 is based on the latest Apache Hadoop core release – version...
View ArticleWhat’s New in CDH3b2: ZooKeeper
CDH3 beta 2 is the first to incorporate Apache ZooKeeper. ZooKeeper is a highly reliable and available coordination service for distributed processes. It is a proven technology and a well established...
View ArticleWhat’s New in CDH3b2: Oozie
Hadoop has emerged as an indispensable component of any data-intensive enterprise infrastructure. In many ways, working with large datasets on a distributed computing platform (powered by commodity...
View ArticleWhat’s New in CDH3b2: HUE
The HUE (aka. Hadoop User Experience) project [download|installation|manual] started as Cloudera Desktop about a year ago. The old name “Desktop” really refers to a desktop look-and-feel, since HUE is...
View ArticleUpcoming webinar: 10 Common Hadoop-able Problems
At Cloudera we find that organizations often have trouble recognizing how Hadoop can help solve some of the large scale data problems they may be facing. Through our engagements with customers we have...
View ArticlePurdue Universitys Saptarshi Guha Interviewed Regarding Hadoop, R and Hadoop...
In anticipation of Hadoop World 2010 in New York October 12th, we continue our Q&A series with Hadoop World presenters to provide a taste of what attendees can expect. Were excited about the 36...
View ArticleUsing Flume to Collect Apache 2 Web Server Logs
Flume is a flexible, scalable, and reliable system for collecting streaming data. The Flume User Guide describes how to configure Flume, and the new Flume Cookbook contains instructions (called...
View ArticleWhat is in our Kitchen?
If there is one thing that chefs are proud of, it’s their kitchens. Whether cavernous top-of-the-line affairs or cramped New York apartments, kitchens are the place where raw ingredients are combined...
View ArticleOne Possible Hadoop World Morning Path
Hadoop World will kick off with keynote presentations by Mike Olson, Cloudera CEO, and Tim OReilly, Founder and CEO of OReilly Media. Both are engaging speakers that will fill our fresh morning minds...
View ArticleHadoop: The Definitive Guide, Second Edition
The second edition of my book “Hadoop: The Definitive Guide”, published by O’Reilly, is now available. The first edition was launched at the Hadoop Summit in June 2009, and has gone on to sell well....
View ArticleTackling Large Scale Data in Government
This is a guest post provided by Booz Allen Hamilton data analysis consultant, Aaron Cordova. Aaron specializes in large-scale distributed data processing systems. Working within the U.S. federal...
View ArticleBetter Workflow Management in CDH with Oozie 2
Oozie version 2.2.1 is now bundled with Cloudera Distribution for Hadoop (CDH3 Beta 3). This major upgrade includes new functionality such as time and date-driven workflow jobs, and an embedded Tomcat...
View ArticleConfiguring Security Features in CDH3
Post written by Cloudera Software Engineer Aaron T. Myers. Apache Hadoop has had methods of doing user authorization for some time. The Hadoop Distributed File System (HDFS) has a permissions model...
View ArticleSetting up CDH3 Hadoop on my new Macbook Pro
This is a guest re-post courtesy of Arun Jacob, Data Architect at Disney, prior to that he was an engineer at RichRelevance and Evri. For the last couple of years, Arun has been focused on data...
View ArticleHow-to: Include Third-Party Libraries in Your MapReduce Job
“My library is in the classpath but I still get a Class Not Found exception in a MapReduce job” – If you have this problem this blog is for you. Java requires third-party and user-defined classes to be...
View Article