Skip to content
6 min read·Lesson 10 of 10

Cloud Data Platforms Compared

AWS, Azure, GCP, Databricks, Snowflake — how the major data platforms stack up for data engineering workloads, and how to pick.

This is the platform-selection lesson. Each major vendor offers a complete stack — ingestion, storage, transformation, serving, governance — but their strengths differ. There is no universal "best".

AWS Data Stack

LayerService
StorageS3 (lake), Redshift (warehouse), Aurora / DynamoDB (operational)
IngestionDMS, Kinesis Data Streams / Firehose, MSK, Glue crawlers
TransformationGlue (Spark), EMR, Athena (Trino), Redshift
StreamingKinesis, MSK, Managed Service for Apache Flink
OrchestrationMWAA (Airflow), Step Functions, EventBridge
Catalog / governanceGlue Data Catalog, Lake Formation, IAM, DataZone
BI / MLQuickSight, SageMaker

Strengths: deepest service catalog, broadest ecosystem, S3 is the de-facto lake substrate.

Watch out: integration between services (IAM, networking, catalogs) is your job. Many small services to learn.

Cert: AWS DEA-C01 (Data Engineer Associate) is the focused track.

Azure Data Stack

LayerService
StorageADLS Gen2 (lake), Synapse / Fabric Warehouse, Cosmos DB / SQL DB (operational)
IngestionADF / Fabric Data Factory, Event Hubs, Stream Analytics
TransformationSynapse Spark, Synapse SQL, Fabric notebooks, Databricks (often)
StreamingEvent Hubs (Kafka-compatible), Stream Analytics, Fabric eventstreams
OrchestrationADF / Fabric pipelines, Logic Apps
Catalog / governanceMicrosoft Purview, Fabric OneLake
BI / MLPower BI (deeply integrated), Azure ML

Strengths: Power BI integration is unmatched; Microsoft Fabric is a serious bet on a unified experience over OneLake (Delta-based).

Watch out: Synapse and Fabric overlap; product strategy is in transition. Check current Microsoft guidance.

Cert: DP-700 Fabric Data Engineer (DP-203 retiring).

GCP Data Stack

LayerService
StorageGCS (lake), BigQuery (warehouse), Spanner / Firestore / Cloud SQL (operational)
IngestionDatastream (CDC), Pub/Sub, Dataflow, Storage Transfer
TransformationBigQuery (SQL), Dataform / dbt, Dataflow (Beam), Dataproc (Spark)
StreamingPub/Sub, Dataflow streaming, BigQuery streaming inserts
OrchestrationCloud Composer (Airflow), Workflows, Dataform schedules
Catalog / governanceDataplex, Data Catalog, BigLake, policy tags, DLP
BI / MLLooker, Vertex AI

Strengths: BigQuery is the gold standard for serverless analytics; Dataflow is exceptional for unified streaming + batch (Apache Beam).

Watch out: smaller market share than AWS/Azure means a smaller hiring pool and fewer third-party connectors.

Cert: Google Professional Data Engineer (PDE).

Databricks

LayerCapability
StorageDelta Lake on S3 / ADLS / GCS; Unity Catalog
ComputeSpark (Photon), SQL Warehouses, Delta Live Tables, Workflows
StreamingStructured Streaming + Auto Loader
MLMLflow, Mosaic AI, model serving, vector search
GovernanceUnity Catalog (tables, files, ML models, dashboards)
BIDatabricks SQL + dashboards; integrates with Power BI / Tableau

Strengths: the lakehouse leader; unifies BI, data engineering, ML on one substrate. Best multi-cloud option for teams who don't want vendor lock-in.

Watch out: still a separate platform you operate; cost can climb if clusters are mismanaged.

Cert: Databricks Certified Data Engineer Associate / Professional.

Snowflake

LayerCapability
StorageSnowflake-managed micro-partitions (S3/ADLS/GCS under the hood); Iceberg tables
ComputeIndependent virtual warehouses; Snowpark (Python / Java / Scala)
StreamingSnowpipe / Snowpipe Streaming; Dynamic Tables
MLSnowpark ML, Cortex AI
GovernanceHorizon Catalog, dynamic data masking, row access policies
BINative dashboards (Streamlit), strong third-party support

Strengths: simplest warehouse experience, multi-cloud, strong governance, mature ecosystem.

Watch out: primarily warehouse-shaped; ML and unstructured workloads catching up but Databricks still leads there.

Cert: SnowPro Core; SnowPro Advanced Data Engineer.

Decision Framework

"We're already on cloud X"

Default to that hyperscaler's stack. The integration savings dominate. Add Databricks or Snowflake only if a specific workload demands it.

"We're multi-cloud or undecided"

Snowflake (warehouse-led) or Databricks (lakehouse-led) is the natural choice. Both run on all three clouds with consistent UX.

"BI-heavy, SQL-only team"

BigQuery, Snowflake, or Synapse / Fabric. Avoid Spark unless you have to.

"ML-heavy / unstructured / huge scale"

Databricks. Or BigQuery + Vertex if on GCP.

"Real-time first"

Kafka + Flink (managed) on top of any of the above. Confluent + Snowflake / BigQuery is a popular combo.

"Power BI is non-negotiable"

Microsoft Fabric / Synapse will be the path of least resistance.

Cost: The Hidden Variable

  • Per-TB / per-credit pricing differs less than people think; workload patterns matter more.
  • Always-on streaming is the most expensive mode; batch with auto-suspend is the cheapest.
  • Track cost per pipeline / per team from day one; tag everything.
  • Beware "managed serverless" sticker shock — convenience has a price.
  • Reserved / committed-use discounts (Snowflake, BigQuery, Databricks) reward predictable workloads.

The Convergence Trend

Open formats are eroding lock-in:

  • Iceberg is now first-class in Snowflake, BigQuery, AWS, Databricks.
  • Delta is open-sourced and supported by many engines.
  • Unity Catalog and the Iceberg REST catalog spec aim for cross-engine governance.

The 2030-era picture: data lives in object storage in an open table format, governed by an open catalog, queried by whichever engine fits the workload. Today we are about half-way there.

Cert Mapping Recap

GoalCert
Microsoft / Azure / FabricDP-700 (Fabric DE), DP-203 (legacy)
AWSDEA-C01
GCPProfessional Data Engineer
Multi-cloud lakehouseDatabricks Certified DE Associate / Professional
SnowflakeSnowPro Core, SnowPro Advanced DE

Pick one cloud / platform deeply, learn the others conceptually. The fundamentals from this course — pipelines, warehouses, lakes, Spark, streaming, governance — transfer across all of them.

Where to Go Next

  • Build something end-to-end on a free tier: ingest → land → transform → serve.
  • Pair this course with our DevOps and Cloud Security courses — modern data platforms live and die by both.
  • For depth, pick the cert most relevant to your environment and study it against real workloads, not just Q&A drills.

Key Takeaways

  • No single platform is best for every workload — pick based on existing skills, ecosystem, and dominant workload type.
  • AWS, Azure, and GCP each offer a full data engineering stack; Databricks and Snowflake compete cross-cloud.
  • Snowflake leads on warehouse simplicity; Databricks leads on lakehouse + ML; the hyperscalers lead on integration.
  • Open table formats (Iceberg, Delta) are slowly making the warehouse-vs-lakehouse choice less binding.
  • Total cost depends more on workload patterns and discipline than on per-TB pricing.
🎉

Course Complete!

You've finished Data Engineering Fundamentals. Now put your knowledge to the test with real exam-style practice questions.