Meet Cloudera’s Apache Spark Committers
The super-active Apache Spark community is exerting a strong gravitational pull within the Apache Hadoop ecosystem. I recently had that opportunity to ask Cloudera’s Apache Spark committers (Sean Owen,...
View ArticleHow-to: Prepare Unstructured Data in Impala for Analysis
Learn how to build an Impala table around data that comes from non-Impala, or even non-SQL, sources. As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you...
View ArticleHow-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code
Learn how to use OCR tools, Apache Spark, and other Apache Hadoop components to process PDF images at scale. Optical character recognition (OCR) technologies have advanced significantly over the last...
View ArticleNew in Cloudera Enterprise 5.9: S3 Integration and SQL Editor Improvements
Cloudera Enterprise 5.9 includes the latest release of Hue (3.11), the web UI that makes Apache Hadoop easier to use. As part of Cloudera’s continuing investments in user experience and productivity,...
View ArticleHow to secure ‘Internet exposed’ Apache Hadoop
You may have heard of the recent (and ongoing) hacks targeting open source database solutions like MongoDB and Apache Hadoop. From what we know, an unknown number of hackers scanned for...
View ArticleNew in Cloudera Enterprise 5.10: Hue SQL Editor and Security Improvements
Cloudera Enterprise 5.10 includes the latest updates of Hue, the intelligent editor for SQL Developers and Analysts. As part of Cloudera’s continuing investments in user experience and productivity,...
View ArticleSecurity, Hive-on-Spark, and Other Improvements in Apache Hive 1.2.0
Apache Hive 1.2.0, although not a major release, contains significant improvements. Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago,...
View ArticleInside Apache HBase’s New Support for MOBs
Learn about the design decisions behind HBase’s new support for MOBs. Apache HBase is a distributed, scalable, performant, consistent key value database that can store a variety of binary data types....
View ArticleThrift Client Authentication Support in Apache HBase 1.0
Thrift client authentication and doAs impersonation, introduced in HBase 1.0, provides more flexibility for your HBase installation. In the two-part blog series “How-to: Use the HBase Thrift Interface”...
View ArticleHow-to: Secure YARN Containers with Cloudera Navigator Encrypt
Learn how Cloudera Navigator Encrypt bring data security to YARN containers. With the introduction of transparent data encryption in HDFS, we are now a big step closer toward a secure platform in the...
View ArticleWhat’s New in Cloudera Director 2.5?
Cloudera Director 2.5 brings cluster auto-repair functionality and improved support for AWS Spot instances. Support for Cloudera Manager’s external account feature has been added along with S3Guard...
View ArticleCloudera SDX: Under the Hood
What is SDX? Shared Data Experience — SDX — is Cloudera’s secret ingredient that makes it possible to deploy Cloudera’s four core functions (Data Engineering, Data Science, Analytic DB, Operational DB)...
View ArticleWhat’s New in Cloudera Director 2.6
Cloudera Director 2.6 introduces support for protecting communications with TLS and SSH host keys. Azure support is enhanced with support for Azure Managed Disks and custom images.. Cloudera Director...
View ArticleGetting Started with Cloudera’s Cybersecurity Solution
A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer is to adopt a machine...
View ArticleHadoop Delegation Tokens Explained
Apache Hadoop’s security was designed and implemented around 2009, and has been stabilizing since then. However, due to a lack of documentation around this area, it’s hard to understand or debug when...
View ArticleAutomatic TLS Configuration with Cloudera Director 2.6
Cloudera Director 2.6 and Cloudera Manager 5.13 offer a simple way to have TLS configured for Cloudera Manager and CDH clusters. In this blog post, Bill Havanki describes how to use the new feature and...
View ArticleWhat’s New in Cloudera Director 2.8?
Cloudera Director 2.8 introduces a simpler way to create clusters in AWS or Microsoft Azure that requires less information to get started than the standard procedure. A new configuration export...
View ArticleThird-Party Libraries in C6
Cloudera has put a significant amount of work into upgrading the third-party libraries used in our just-released C6 version. This major upgrade of our software has given us the opportunity to upgrade...
View ArticleNetwork Security with Cloudera Altus and Apache Spot
Introduction In the last few years, IT security threats to enterprise systems have increased, which has necessitated installing log ingestion and analysis solutions in any enterprise network. This blog...
View ArticleProtecting Hadoop Clusters From Malware Attacks
Two new strains of malware–XBash and DemonBot–are targeting Apache Hadoop servers for Bitcoin mining and DDOS purposes. This malware is scanning the internet so vigorously for Hadoop clusters that an...
View Article