Quantcast
Browsing all 166 articles
Browse latest View live

The Smart Grid: Hadoop at the Tennessee Valley Authority (TVA)

For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy...

View Article


Apache Hadoop HA Configuration

Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations. One of the things we get a lot of questions about is how to...

View Article


Hadoop World: Security and API Compatibility

Today’s Hadoop World talk comes from Owen O’Malley and talks about some of the biggest challenges facing Hadoop: Security and API Compatibility. Over the past several months, Yahoo! has been leading...

View Article

CDH3 Beta 1 Now Available

It’s official – Cloudera’s Distribution for Hadoop Version 2, which we often shorthand as CDH2, has been released. CDH2 is the product we recommend to our current production customers. It’s a stable...

View Article

Highlights from the First Hadoop Contributors Meeting

While the vast majority of the Hadoop development discussion takes place on the Apache Jira and various project mailing lists, it’s often useful to meet face to face for high bandwidth discussion. To...

View Article


What’s New in CDH3b2: Core Hadoop

In this post I’ll cover some of the larger or more significant changes that have gone into core Hadoop in CDH3 beta 2. The Hadoop in CDH3 is based on the latest Apache Hadoop core release – version...

View Article

What’s New in CDH3b2: ZooKeeper

CDH3 beta 2 is the first to incorporate Apache ZooKeeper. ZooKeeper is a highly reliable and available coordination service for distributed processes. It is a proven technology and a well established...

View Article

What’s New in CDH3b2: Oozie

Hadoop has emerged as an indispensable component of any data-intensive enterprise infrastructure.  In many ways, working with large datasets on a distributed computing platform (powered by commodity...

View Article


What’s New in CDH3b2: HUE

The HUE (aka. Hadoop User Experience) project [download|installation|manual] started as Cloudera Desktop about a year ago. The old name “Desktop” really refers to a desktop look-and-feel, since HUE is...

View Article


Upcoming webinar: 10 Common Hadoop-able Problems

At Cloudera we find that organizations often have trouble recognizing how Hadoop can help solve some of the large scale data problems they may be facing. Through our engagements with customers we have...

View Article

Purdue University’s Saptarshi Guha Interviewed Regarding Hadoop, R and Hadoop...

In anticipation of Hadoop World 2010 in New York – October 12th, we continue our Q&A series with Hadoop World presenters to provide a taste of what attendees can expect. We’re excited about the 36...

View Article

Using Flume to Collect Apache 2 Web Server Logs

Flume is a flexible, scalable, and reliable system for collecting streaming data.   The Flume User Guide describes how to configure Flume, and the new Flume Cookbook contains instructions (called...

View Article

What is in our Kitchen?

If there is one thing that chefs are proud of, it’s their kitchens. Whether cavernous top-of-the-line affairs or cramped New York apartments, kitchens are the place where raw ingredients are combined...

View Article


Image may be NSFW.
Clik here to view.

One Possible Hadoop World Morning Path

Hadoop World will kick off with keynote presentations by Mike Olson, Cloudera CEO, and Tim O’Reilly, Founder and CEO of O’Reilly Media. Both are engaging speakers that will fill our fresh morning minds...

View Article

Hadoop: The Definitive Guide, Second Edition

The second edition of my book “Hadoop: The Definitive Guide”, published by O’Reilly, is now available. The first edition was launched at the Hadoop Summit in June 2009, and has gone on to sell well....

View Article


Tackling Large Scale Data in Government

This is a guest post provided by Booz Allen Hamilton data analysis consultant, Aaron Cordova.  Aaron specializes in large-scale distributed data processing systems. Working within the U.S. federal...

View Article

Better Workflow Management in CDH with Oozie 2

Oozie version 2.2.1 is now bundled with Cloudera Distribution for Hadoop (CDH3 Beta 3). This major upgrade includes new functionality such as time and date-driven workflow jobs, and an embedded Tomcat...

View Article


Configuring Security Features in CDH3

Post written by Cloudera Software Engineer Aaron T. Myers. Apache Hadoop has had methods of doing user authorization for some time. The Hadoop Distributed File System (HDFS) has a permissions model...

View Article

Setting up CDH3 Hadoop on my new Macbook Pro

This is a guest re-post courtesy of Arun Jacob, Data Architect at Disney, prior to that he was an engineer at RichRelevance and Evri. For the last couple of years, Arun has been focused on data...

View Article

How-to: Include Third-Party Libraries in Your MapReduce Job

“My library is in the classpath but I still get a Class Not Found exception in a MapReduce job” – If you have this problem this blog is for you. Java requires third-party and user-defined classes to be...

View Article
Browsing all 166 articles
Browse latest View live