Streaming Data Analytics
Dataproc
• Open: Run open source data analytics at scale, with enterprise grade security
• Flexible: Use serverless, or manage clusters on Google Compute and Kubernetes
• Intelligent: Enable data users through integrations with Vertex AI, BigQuery, and Dataplex
• Secure: Configure advanced security such as Kerberos, Apache Ranger and Personal Authentication
• Cost-effective: Realize 54% lower TCO compared to on-prem data lakes with per-second pricing
Benefits
Modernize your open source data processing
Whether you need VMs or Kubernetes, extra memory for Presto, or even GPUs, Dataproc can help accelerate your data and analytics processing through on-demand purpose-built or serverless environments.
Intelligent and seamless OSS for data science
Enable data scientists and data analysts to seamlessly perform data science jobs through native integrations with Vertex AI.
Advanced security, compliance, and governance
Manage and enforce user authorization and authentication using existing Kerberos and Apache Ranger policies or Personal Cluster Authentication. Define permissions without having to set up a network node.
Key Features
Fully managed and automated big data open source software
Serverless deployment, logging, and monitoring let you focus on your data and analytics, not on your infrastructure. Reduce TCO of Apache Spark management by up to 54%. Enable data scientists and engineers to build and train models 5X faster, compared to traditional notebooks, through integration with Vertex AI Workbench. The Dataproc Jobs API makes it easy to incorporate big data processing into custom applications, while Dataproc Metastore eliminates the need to run your own Hive metastore or catalog service.
The best of open source with the best of Google Cloud
Dataproc lets you take the open source tools, algorithms, and programming languages that you use today, but makes it easy to apply them on cloud-scale datasets. At the same time, Dataproc has out-of-the-box integration with the rest of the Google Cloud analytics, database, and AI ecosystem. Data scientists and engineers can quickly access data and build data applications connecting Dataproc to BigQuery, Vertex AI, Cloud Spanner, Pub/Sub, or Data Fusion.
Enterprise security integrated with Google Cloud
When you create a Dataproc cluster, you can enable Hadoop Secure Mode via Kerberos by adding a Security Configuration. Additionally, some of the most commonly used Google Cloud-specific security features used with Dataproc include default at-rest encryption, OS Login, VPC Service Controls, and customer-managed encryption keys (CMEK).
Containerize Apache Spark jobs with Kubernetes
Build your Apache Spark jobs using Dataproc on Kubernetes so you can use Dataproc with Google Kubernetes Engine (GKE) to provide job portability and isolation.
Solution Partners