Cloudera named a market leader in 2023 GigaOm Radar Report for Data Lakes & Lakehouses Get the report

Store

Hadoop’s infinitely scalable flexible architecture (based on the HDFS filesystem) allows organizations to store and analyze unlimited amounts and types of data—all in a single, open source platform on industry-standard hardware.

Learn more about HDFS

Learn more about Apache Kudu

Man using computer

Team working on blueprint

Process

Quickly integrate with existing systems or applications to move data into and out of Hadoop through bulk load processing (Apache Sqoop) or streaming (Apache Flume, Apache Kafka).


Transform complex data, at scale, using multiple data access options (Apache Hive, Apache Pig) for batch (MR2) or fast in-memory (Apache Spark™) processing. Process streaming data as it arrives in your cluster via Spark Streaming.

Learn more about Apache Spark™


Discover

Analysts interact with full-fidelity data on the fly with Apache Impala, the data warehouse for Hadoop. With Impala, analysts experience BI-quality SQL performance and functionality plus compatibility with all the leading BI tools.

Using Cloudera Search, an integration of Hadoop and Apache Solr, analysts can accelerate the process of discovering patterns in data in all amounts and formats, especially when combined with Impala.

Learn more about Apache Impala

Learn more about Cloudera Search

Side view of two men looking at computer

three men sitting together, looking at laptop

Model

With Hadoop, analysts and data scientists have the flexibility to develop and iterate on advanced statistical models using a mix of partner technologies as well as open source frameworks like Apache Spark™.


Serve

The distributed data store for Hadoop, Apache HBase, supports the fast, random reads/writes (“fast data”) required for online applications.

Learn more about Apache HBase

Man sitting in office with laptop on lap

CDH: Built on Open Source and Open Standards

CDH, the world's most popular Hadoop distribution, is Cloudera’s 100% open source platform. It includes all the leading Hadoop ecosystem components to store, process, discover, model, and serve unlimited data, and it's engineered to meet the highest enterprise standards for stability and reliability.

CDH is based entirely on open standards for long-term architecture. And as the main curator of open standards in Hadoop, Cloudera has a track record of bringing new open source solutions into its platform (such as Apache Spark™, Apache HBase, and Apache Parquet) that are eventually adopted by the entire ecosystem. 

Learn more about key CDH components

Learn more about open source and open standards

Try now

Developers working late

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.