Cloudera named a market leader in 2023 GigaOm Radar Report for Data Lakes & Lakehouses Get the report
Overview

Streamline and operationalize data pipelines securely at any scale.

CDP Data Engineering is the only cloud-native service purpose-built for enterprise data engineering teams. Building on Apache Spark, Data Engineering is an all-inclusive data engineering toolset that enables orchestration automation with Apache Airflow, advanced pipeline monitoring, visual troubleshooting, and comprehensive management tools to streamline ETL processes across enterprise analytics teams.

Data Engineering is fully integrated with Cloudera Data Platform, enabling end-to-end visibility and security with SDX as well as seamless integrations with CDP services such as Data Warehouse and Machine Learning. Data Engineering on CDP powers consistent, repeatable, and automated data engineering workflows on a hybrid cloud platform anywhere.

Use cases

  • Automate data pipelines everywhere
  • Gain ETL visibility and control
  • Maintain data integrity throughout

Automate data pipelines everywhere


Securely deliver quality datasets to CDP Data Warehouse, CDP Machine Learning, or any other analytic tool.

Data Engineering streamlines data pipelines to analytic teams from machine learning to data warehousing and beyond. Speed time to value by orchestrating and automating pipelines to deliver curated, quality datasets anywhere securely and transparently.

Get hands on

Gain ETL visibility and control


Holistically manage your data lifecycle transparently.

Managing the data lifecycle and controlling costs becomes increasingly complex when attempting to operationalize data pipelines across the enterprise at scale.

Data Engineering offers a suite of operational control and visibility features for capacity planning, pipeline automation, automatic lineage capture, and troubleshooting across business use cases.

Read the blog post

Abstract image suggesting visibility and control

Maintain data integrity throughout


Full data pipeline visibility to protect your business.

As data quantity and complexity grows, ensuring ongoing accuracy and fidelity for scaling analytical workloads across the business can be difficult.

Data Engineering offers native data pipeline monitoring and alerting to catch issues early, and visual troubleshooting to quickly resolve problems before they impact your business.

 

Data pipeline troubleshooting screenshot

Key features

Orchestrate complex data transformation workflows backed by Apache Airflow with hundreds of operators to meet mission-critical analytic requirements.

Data Engineering is containerized, scalable, and portable, with isolated workload environments and guardrails—enabling secure pipeline management with on-demand elastic compute to meet business SLAs cost-effectively.

Visualize performance metrics including CPU, memory, and I/O across all the stages of your Spark jobs to pinpoint performance bottlenecks and identify the needle in the haystack while troubleshooting.

Leverage a rich job management interface through a CLI and REST APIs to automate and integrate with existing workflows like CI/CD pipelines and third-party tools with ease.

Data Engineering offers a fully integrated Spark on Kubernetes service that automates and streamlines artifact management, security, and resource scheduling—leveraging Apache Yunikorn to provide FIFO and GANG scheduling.

From a centralized interface, platform administrators can manage access and security, then quickly provision new workloads while easily monitoring capacity and visualizing resource usage over time. SDX also enables full lifecycle lineage tracking to know where data came from and where it’s going.

Ready to take a deeper look?


Experience Data Engineering on Cloudera Data Platform for yourself

Get started

CDP demo

Watch an on demand demo to learn how to accelerate your enterprise data engineering workflows everywhere.

Learn more

Discover CDP video tour


Look under the hood with a video tour of CDP and discover how secure and optimized data engineering workflows can better serve your business.

Watch now

CDP technical resources

Save time with a one-stop-shop for technical information and resources to develop your skills and gain knowledge about Cloudera Data Engineering.

Get some answers

Free training

Access on-demand training to get up to speed with Data Engineering to enable fast and secure pipeline delivery across the enterprise.

Go learn

Pricing

Evaluate pricing, billing terms, licensing details, and hourly rates as well as estimate costs with handy calculators.

Explore pricing

Product documentation

Get started on the right foot with resource planning, product configuration, and everything you need for data engineering best practices.

Read now

Ebook

CDP Data Engineering: Taking your data lifecycle to the next level

Webinar

Cognilytica Webinar: Optimizing Data Engineering Pipelines

Whitepaper

AI Data Engineering Lifecycle Checklist

Webinar

Data Engineering in the enterprise: How to accelerate and scale your data pipelines

World-class training, support, & services

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.