Back to offers
Microsoft

Data Engineer – Real-Time Analytics

Microsoft

Wroclaw, Poland

7 000 - 9 600 USD

Gross

Monthly

Employment

Full time

Experience

Senior

Contract

B2B

Job type

Hybrid

PythonAzureDataSQL

Original Offer

View detailed description on company site

Job description

Microsoft's real-time analytics team in Warsaw processes event streams from across the Microsoft 365 product suite - telemetry, usage signals, and business metrics that feed executive dashboards and ML feature stores. As a Senior Data Engineer you will design and operate streaming pipelines that ingest from Azure Event Hubs, transform with Spark Structured Streaming on Databricks, and land data in Delta Lake with sub-minute latency.

You will own the reliability of the streaming platform: implement checkpoint management, dead-letter queuing, schema evolution strategies, and cost-aware cluster autoscaling. You will define dbt models for the gold layer that data analysts and data scientists consume, and write the runbooks that help the on-call engineer diagnose lag and backpressure at 3 AM.

The role requires close collaboration with product engineering teams (who produce the raw events) and the ML platform team (who need clean feature tables). You will participate in capacity planning reviews and drive down cloud costs through query optimisation and partition pruning improvements.

Experience with Structured Streaming and Delta Lake time-travel is non-negotiable. Familiarity with the Medallion architecture (bronze/silver/gold) and event-time windowing patterns is strongly preferred.

Technical stack

  • Azure Event Hubs
  • Azure Databricks
  • Delta Lake
  • Apache Spark 3.5
  • Python 3.12
  • dbt (dbt-databricks adapter)
  • Azure Data Lake Storage Gen2
  • Azure Synapse Analytics
  • Azure Monitor
  • SQL
  • pyspark
  • pytest
  • Terraform (AzureRM)
  • GitHub Actions

Interview process

Step 1 - Phone screen (45 min, recruiter + engineering manager): your data engineering background, scale of systems you have operated, and why real-time analytics.

Step 2 - Data engineering case study (take-home, up to 4 h): given a fictional event schema and a set of analytical requirements, design a Medallion pipeline and implement the bronze-to-silver transformation in PySpark. Submit a Jupyter notebook with explanations.

Step 3 - Technical panel I - case study walkthrough (60 min): we review your submission together, probe your design decisions, and ask follow-up questions about failure handling and schema evolution.

Step 4 - Technical panel II - SQL + architecture (60 min): window functions, slowly-changing dimensions, and a whiteboard exercise on designing a multi-tenant lakehouse with row-level security.

Step 5 - Offer: issued within 7 business days of completing all rounds.

Read the full description and apply if you think you are a good match.

Job views

2 786

Posted

a day ago

Publisher

Helen Price

Similar Job Offers

Join our newsletter

Get the latest job offers directly to your inbox.