Case Study — Data Engineering & Analytics

RevPipeline360
Fabric Lakehouse Build

An end-to-end Microsoft Fabric analytics pipeline — Bronze/Silver/Gold medallion architecture, PySpark notebook transforms, Direct Lake semantic model, and a Power BI Revenue Intelligence dashboard.

Microsoft Fabric OneLake PySpark / SQL Direct Lake Power BI DAX Star Schema Dataflow Gen2
My Role

Lead Architect & Full-Stack Builder

I led the end-to-end design and implementation of RevPipeline360 — from defining the medallion architecture and writing every PySpark transform, to configuring the pipeline, building the Direct Lake semantic model, designing the DAX measure library, and delivering the Power BI Revenue Intelligence report. Requirements were gathered directly from business stakeholders across Sales and Finance, and all technical decisions — including the relationship strategy, attribution pattern, and pipeline hardening approach — were made and executed independently.

Architecture Design PySpark / SQL Transforms Pipeline Orchestration Semantic Model Build DAX Measure Library Power BI Report Design Stakeholder Requirements
01 — Overview

System at a Glance

9

Gold Fact & Dimension Tables — Full Star Schema

5 dimensions + 9 fact tables built via PySpark notebooks on a Bronze/Silver/Gold medallion lakehouse, feeding a Direct Lake semantic model with full DAX measure library and a live Power BI Revenue Intelligence report.

1,200
Total Leads
27%
Lead Conversion Rate
82%
Win Rate (Lead-Sourced)
$78K
Pipeline Amount
14
Dataflow Gen2 Queries
3
Medallion Layers
Bronze
Raw Landing
OneDrive CSV ingestion via Dataflow Gen2. 1:1 with source files. 10+ bronze tables including customers, campaigns, leads, opportunities, invoices, GL postings, budget, headcount.
Silver
Cleaned & Typed
NB_Silver_Transform notebook. TRIM, CAST, NULLIF, date normalization. Standardized column names and types. Validated with row count checks across all tables.
Gold
Curated Star Schema
NB_Gold_Transform notebook. Dims: date, customers, products, users, campaigns. Facts: opportunity, opportunity line, leads, campaigns, invoice header/line, GL postings, budget, headcount.
Semantic Model
Direct Lake
SM_RevPipeline360 with active/inactive relationships, USERELATIONSHIP and TREATAS for attribution. Full DAX measure library under MyMeasures table.
Power BI
Revenue Intelligence
RevDashboardV4 — Campaign to Revenue page. FY slicer, lead funnel, channel attribution, campaign effectiveness scatter, top campaigns bar chart.
02 — Architecture

Technical Architecture

Workspace & Assets

AssetTypeDescription
WS_RevPipeline360_LabWorkspacePrimary Fabric workspace containing all build assets
LH_RevPipeline360LakehouseOneLake storage — Bronze / Silver / Gold dbo schema
NB_Silver_TransformNotebookPySpark/SQL: cleanse Bronze into Silver (trim, cast, normalize)
NB_Gold_TransformNotebookPySpark/SQL: build star schema Gold dims and facts
PL_RevPipeline360_RefreshPipelineOrchestration: Run_NB_Silver → (On Success) → Run_NB_Gold
SM_RevPipeline360Semantic ModelDirect Lake — relationships, DAX measures, role security
RevDashboardV4Power BI ReportRevenue Intelligence: Campaign to Revenue page

Gold Schema — Dimensions

  • gold_dim_date — date_key, calendar_date, year, month, quarter, week, fiscal_year, fiscal_month
  • gold_dim_customers — master_customer_id (PK), customer_name, industry, region, active_flag
  • gold_dim_products — master_sku (PK), product_name, product_family, category, uom
  • gold_dim_users — salesrep_id (PK), rep_name, manager, territory, region
  • gold_dim_campaigns — campaign_id (PK), campaign_name, channel, start_date/end_date + date_keys

Gold Schema — Facts

  • gold_fact_opportunity — opportunity_id, customer_id, salesrep_id, stage, is_closed, is_won, amount, close_date_key, created_date_key
  • gold_fact_opportunity_line — line items joined to opportunity and products (product_sku, quantity, unit_price, discount_pct, line_amount)
  • gold_fact_leads — lead_id, lead_source, campaign_id, created_date_key, converted_to_opportunity_id
  • gold_fact_campaigns — campaign activity fact bridging campaigns dim to lead funnel
  • gold_fact_invoice_header — invoice_id, opportunity_id, customer_id, invoice_date_key, due_date_key, status
  • gold_fact_invoice_line — line-level invoice detail with product and customer enrichment
  • gold_fact_gl_postings — posting_date_key, customer_id, account_code, account_type, amount
  • gold_fact_budget_monthly — month_date_key, region, budget_revenue, budget_cogs, budget_gross_profit, budget_gm_pct
  • gold_fact_headcount_monthly — month_date_key, region, headcount, hires, attrition, attrition_rate

Relationship Design

  • Active paths: gold_dim_date → fact tables via primary date key per fact (e.g., close_date_key for opportunity, invoice_date_key for invoice line)
  • Inactive relationships: created_date_key, due_date_key, campaign start/end date keys — activated via USERELATIONSHIP in DAX where needed
  • Attribution pattern: campaign → lead → opportunity linked via TREATAS on converted_to_opportunity_id to avoid ambiguous active paths
  • Pipeline hardening: retries + timeouts on notebook activities to handle Spark/Livy session throttling (HTTP 430)

Key DAX Measures

MeasureLogic
LeadsCOUNTROWS(gold_fact_leads)
Won OpportunitiesCALCULATE([Opportunities], is_won = 1)
Win RateDIVIDE([Won Opportunities], [Opportunities], 0)
Lead Conversion RateDIVIDE([Converted Leads], [Leads])
Pipeline Amount (By Channel)TREATAS attribution via converted_to_opportunity_id
Active Campaigns (FY)FY overlap window using SELECTEDVALUE + date boundary filter
Pipeline Amount (Attributed)USERELATIONSHIP on lead → opportunity inactive path
03 — Documentation

Build Notes & Design Decisions

Medallion ELT Pattern

All transforms run inside Fabric via PySpark notebooks using spark.sql() with CREATE OR REPLACE TABLE statements. This ensures idempotent rebuilds — re-running any notebook fully refreshes the layer without manual cleanup. Silver focuses exclusively on data quality (types, nulls, trimming); Gold focuses exclusively on business logic (joins, date key enrichment, derived metrics).

Campaign Date Key Enhancement

The gold_dim_campaigns table was enhanced with both start_date_key and end_date_key by joining to gold_dim_date twice during Gold build. This enables fiscal year overlap slicing for the Active Campaigns (FY) measure without requiring a bridge table.

Ambiguous Relationship Resolution

Several fact tables (opportunity, invoice header) have multiple date foreign keys pointing to gold_dim_date. Only one relationship per fact table is set active (the primary analytical date). Alternate date roles are activated in DAX measures using USERELATIONSHIP. For campaign-to-opportunity attribution where no direct foreign key exists, TREATAS is used to project the filtered opportunity set from lead conversion data.

Pipeline Governance

The PL_RevPipeline360_Refresh pipeline uses On Success chaining between Silver and Gold notebook activities. Retry policies and timeout settings were added after intermittent Spark/Livy session throttling (HTTP 430) was observed during early runs on shared capacity. Interactive notebook sessions should not run in parallel with pipeline execution.

04 — Full Document

Architecture PDF

RevPipeline360_Fabric_Build_Architecture.pdf
05 — Video Walkthrough

Build in Action

Fabric Workspace & Pipeline Walkthrough
16-57-08.mp4 — WS_RevPipeline360_Lab overview
Dashboard & Semantic Model Demo
18-57-16.mp4 — RevDashboardV4 live walkthrough
06 — Visual Artifacts

Build Screenshots

Revenue Intelligence Dashboard
Revpipeline_Dashboard.png — Revenue Intelligence: Campaign to Revenue Dashboard
Direct Lake Semantic Model
Revpipeline_SemanticModel.png — Direct Lake Semantic Model & Relationships
Manage Relationships ERD
Revpipeline_ERD.png — Manage Relationships View
Lakehouse Bronze Tables
Revpipeline_Lakehouse.png — LH_RevPipeline360 Bronze Tables
Gold Transform Notebook
Revpipeline_Gold_Transform.png — NB_Gold_Transform Notebook (PySpark)
Pipeline Orchestration
Revpipeline_LakehousePipeline.png — PL_RevPipeline360_Refresh Orchestration
Dataflow Gen2
Revpipeline_Dataflow.png — Dataflow Gen2 Query Editor (14 queries)
WS_RevPipeline360_Lab Workspace
RevPipeline_Workspace.png — WS_RevPipeline360_Lab Workspace
07 — Flow Diagram

Live Architecture Diagram

flowchart TD A[OneDrive CSV Files\nSalesforce + Finance + HR] --> B[Dataflow Gen2\n14 queries - bronze load] B --> C[LH_RevPipeline360\nBronze Tables dbo.bronze_*] C --> D[NB_Silver_Transform\nTRIM CAST NULLIF normalize] D --> E[Silver Tables dbo.silver_*\ncleaned typed standardized] E --> F[NB_Gold_Transform\nstar schema + date key joins] F --> G1[Gold Dims\ndim_date customers products users campaigns] F --> G2[Gold Facts\nopportunity leads invoices GL budget headcount] G1 --> H[SM_RevPipeline360\nDirect Lake Semantic Model] G2 --> H H --> I[DAX Measures\nWin Rate CPI Conversion TREATAS attribution] I --> J[RevDashboardV4\nRevenue Intelligence Dashboard] C --> PL[PL_RevPipeline360_Refresh\nSilver then Gold on success] PL --> D
08 — Implementation

Build Checklist