Streaming Data Analytics

Dataproc

• Open: Run open source data analytics at scale, with enterprise grade security

• Flexible: Use serverless, or manage clusters on Google Compute and Kubernetes

• Intelligent: Enable data users through integrations with Vertex AI, BigQuery, and Dataplex

• Secure: Configure advanced security such as Kerberos, Apache Ranger and Personal Authentication

• Cost-effective: Realize 54% lower TCO compared to on-prem data lakes with per-second pricing

Benefits

Modernize your open source data processing

Whether you need VMs or Kubernetes, extra memory for Presto, or even GPUs, Dataproc can help accelerate your data and analytics processing through on-demand purpose-built or serverless environments.

Intelligent and seamless OSS for data science

Enable data scientists and data analysts to seamlessly perform data science jobs through native integrations with Vertex AI.

Advanced security, compliance, and governance

Manage and enforce user authorization and authentication using existing Kerberos and Apache Ranger policies or Personal Cluster Authentication. Define permissions without having to set up a network node.

Key Features

Fully managed and automated big data open source software

Serverless deployment, logging, and monitoring let you focus on your data and analytics, not on your infrastructure. Reduce TCO of Apache Spark management by up to 54%. Enable data scientists and engineers to build and train models 5X faster, compared to traditional notebooks, through integration with Vertex AI Workbench. The Dataproc Jobs API makes it easy to incorporate big data processing into custom applications, while Dataproc Metastore eliminates the need to run your own Hive metastore or catalog service.

The best of open source with the best of Google Cloud

Dataproc lets you take the open source tools, algorithms, and programming languages that you use today, but makes it easy to apply them on cloud-scale datasets. At the same time, Dataproc has out-of-the-box integration with the rest of the Google Cloud analytics, database, and AI ecosystem. Data scientists and engineers can quickly access data and build data applications connecting Dataproc to BigQuery, Vertex AI, Cloud Spanner, Pub/Sub, or Data Fusion.

Enterprise security integrated with Google Cloud

When you create a Dataproc cluster, you can enable Hadoop Secure Mode via Kerberos by adding a Security Configuration. Additionally, some of the most commonly used Google Cloud-specific security features used with Dataproc include default at-rest encryption, OS Login, VPC Service Controls, and customer-managed encryption keys (CMEK).

Containerize Apache Spark jobs with Kubernetes

Build your Apache Spark jobs using Dataproc on Kubernetes so you can use Dataproc with Google Kubernetes Engine (GKE) to provide job portability and isolation.

Solution Partners

Contact Sales