Data Engineer – Real-Time Analytics
Microsoft
Wroclaw, Poland
7 000 - 9 600 USD
Monthly
Employment
Full time
Experience
Senior
Contract
B2B
Job type
Hybrid
Job description
Microsoft's real-time analytics team in Warsaw processes event streams from across the Microsoft 365 product suite - telemetry, usage signals, and business metrics that feed executive dashboards and ML feature stores. As a Senior Data Engineer you will design and operate streaming pipelines that ingest from Azure Event Hubs, transform with Spark Structured Streaming on Databricks, and land data in Delta Lake with sub-minute latency.
You will own the reliability of the streaming platform: implement checkpoint management, dead-letter queuing, schema evolution strategies, and cost-aware cluster autoscaling. You will define dbt models for the gold layer that data analysts and data scientists consume, and write the runbooks that help the on-call engineer diagnose lag and backpressure at 3 AM.
The role requires close collaboration with product engineering teams (who produce the raw events) and the ML platform team (who need clean feature tables). You will participate in capacity planning reviews and drive down cloud costs through query optimisation and partition pruning improvements.
Experience with Structured Streaming and Delta Lake time-travel is non-negotiable. Familiarity with the Medallion architecture (bronze/silver/gold) and event-time windowing patterns is strongly preferred.
Technical stack
- Azure Event Hubs
- Azure Databricks
- Delta Lake
- Apache Spark 3.5
- Python 3.12
- dbt (dbt-databricks adapter)
- Azure Data Lake Storage Gen2
- Azure Synapse Analytics
- Azure Monitor
- SQL
- pyspark
- pytest
- Terraform (AzureRM)
- GitHub Actions
Interview process
Step 1 - Phone screen (45 min, recruiter + engineering manager): your data engineering background, scale of systems you have operated, and why real-time analytics.
Step 2 - Data engineering case study (take-home, up to 4 h): given a fictional event schema and a set of analytical requirements, design a Medallion pipeline and implement the bronze-to-silver transformation in PySpark. Submit a Jupyter notebook with explanations.
Step 3 - Technical panel I - case study walkthrough (60 min): we review your submission together, probe your design decisions, and ask follow-up questions about failure handling and schema evolution.
Step 4 - Technical panel II - SQL + architecture (60 min): window functions, slowly-changing dimensions, and a whiteboard exercise on designing a multi-tenant lakehouse with row-level security.
Step 5 - Offer: issued within 7 business days of completing all rounds.
Interested in this role?
Don't miss this opportunity.
Read the full description and apply if you think you are a good match.
Job views
2 786
Posted
a day ago
Publisher
Helen Price
Similar Job Offers
Join our newsletter
Get the latest job offers directly to your inbox.