RevPipeline360 — Microsoft Fabric Lakehouse Build

01 — Overview

System at a Glance

Gold Fact & Dimension Tables — Full Star Schema

5 dimensions + 9 fact tables built via PySpark notebooks on a Bronze/Silver/Gold medallion lakehouse, feeding a Direct Lake semantic model with full DAX measure library and a live Power BI Revenue Intelligence report.

1,200

Total Leads

27%

Lead Conversion Rate

82%

Win Rate (Lead-Sourced)

$78K

Pipeline Amount

Dataflow Gen2 Queries

Medallion Layers

Bronze

Raw Landing

OneDrive CSV ingestion via Dataflow Gen2. 1:1 with source files. 10+ bronze tables including customers, campaigns, leads, opportunities, invoices, GL postings, budget, headcount.

Silver

Cleaned & Typed

NB_Silver_Transform notebook. TRIM, CAST, NULLIF, date normalization. Standardized column names and types. Validated with row count checks across all tables.

Gold

Curated Star Schema

NB_Gold_Transform notebook. Dims: date, customers, products, users, campaigns. Facts: opportunity, opportunity line, leads, campaigns, invoice header/line, GL postings, budget, headcount.

Semantic Model

Direct Lake

SM_RevPipeline360 with active/inactive relationships, USERELATIONSHIP and TREATAS for attribution. Full DAX measure library under MyMeasures table.

Power BI

Revenue Intelligence

RevDashboardV4 — Campaign to Revenue page. FY slicer, lead funnel, channel attribution, campaign effectiveness scatter, top campaigns bar chart.

02 — Architecture

Technical Architecture

Workspace & Assets

Asset	Type	Description
`WS_RevPipeline360_Lab`	Workspace	Primary Fabric workspace containing all build assets
`LH_RevPipeline360`	Lakehouse	OneLake storage — Bronze / Silver / Gold dbo schema
`NB_Silver_Transform`	Notebook	PySpark/SQL: cleanse Bronze into Silver (trim, cast, normalize)
`NB_Gold_Transform`	Notebook	PySpark/SQL: build star schema Gold dims and facts
`PL_RevPipeline360_Refresh`	Pipeline	Orchestration: Run_NB_Silver → (On Success) → Run_NB_Gold
`SM_RevPipeline360`	Semantic Model	Direct Lake — relationships, DAX measures, role security
`RevDashboardV4`	Power BI Report	Revenue Intelligence: Campaign to Revenue page

Gold Schema — Dimensions

gold_dim_date — date_key, calendar_date, year, month, quarter, week, fiscal_year, fiscal_month
gold_dim_customers — master_customer_id (PK), customer_name, industry, region, active_flag
gold_dim_products — master_sku (PK), product_name, product_family, category, uom
gold_dim_users — salesrep_id (PK), rep_name, manager, territory, region
gold_dim_campaigns — campaign_id (PK), campaign_name, channel, start_date/end_date + date_keys

Gold Schema — Facts

gold_fact_opportunity — opportunity_id, customer_id, salesrep_id, stage, is_closed, is_won, amount, close_date_key, created_date_key
gold_fact_opportunity_line — line items joined to opportunity and products (product_sku, quantity, unit_price, discount_pct, line_amount)
gold_fact_leads — lead_id, lead_source, campaign_id, created_date_key, converted_to_opportunity_id
gold_fact_campaigns — campaign activity fact bridging campaigns dim to lead funnel
gold_fact_invoice_header — invoice_id, opportunity_id, customer_id, invoice_date_key, due_date_key, status
gold_fact_invoice_line — line-level invoice detail with product and customer enrichment
gold_fact_gl_postings — posting_date_key, customer_id, account_code, account_type, amount
gold_fact_budget_monthly — month_date_key, region, budget_revenue, budget_cogs, budget_gross_profit, budget_gm_pct
gold_fact_headcount_monthly — month_date_key, region, headcount, hires, attrition, attrition_rate

Relationship Design

Active paths: gold_dim_date → fact tables via primary date key per fact (e.g., close_date_key for opportunity, invoice_date_key for invoice line)
Inactive relationships: created_date_key, due_date_key, campaign start/end date keys — activated via USERELATIONSHIP in DAX where needed
Attribution pattern: campaign → lead → opportunity linked via TREATAS on converted_to_opportunity_id to avoid ambiguous active paths
Pipeline hardening: retries + timeouts on notebook activities to handle Spark/Livy session throttling (HTTP 430)

Key DAX Measures

Measure	Logic
`Leads`	COUNTROWS(gold_fact_leads)
`Won Opportunities`	CALCULATE([Opportunities], is_won = 1)
`Win Rate`	DIVIDE([Won Opportunities], [Opportunities], 0)
`Lead Conversion Rate`	DIVIDE([Converted Leads], [Leads])
`Pipeline Amount (By Channel)`	TREATAS attribution via converted_to_opportunity_id
`Active Campaigns (FY)`	FY overlap window using SELECTEDVALUE + date boundary filter
`Pipeline Amount (Attributed)`	USERELATIONSHIP on lead → opportunity inactive path

03 — Documentation

Build Notes & Design Decisions

Medallion ELT Pattern

All transforms run inside Fabric via PySpark notebooks using spark.sql() with CREATE OR REPLACE TABLE statements. This ensures idempotent rebuilds — re-running any notebook fully refreshes the layer without manual cleanup. Silver focuses exclusively on data quality (types, nulls, trimming); Gold focuses exclusively on business logic (joins, date key enrichment, derived metrics).

Campaign Date Key Enhancement

The gold_dim_campaigns table was enhanced with both start_date_key and end_date_key by joining to gold_dim_date twice during Gold build. This enables fiscal year overlap slicing for the Active Campaigns (FY) measure without requiring a bridge table.

Ambiguous Relationship Resolution

Several fact tables (opportunity, invoice header) have multiple date foreign keys pointing to gold_dim_date. Only one relationship per fact table is set active (the primary analytical date). Alternate date roles are activated in DAX measures using USERELATIONSHIP. For campaign-to-opportunity attribution where no direct foreign key exists, TREATAS is used to project the filtered opportunity set from lead conversion data.

Pipeline Governance

The PL_RevPipeline360_Refresh pipeline uses On Success chaining between Silver and Gold notebook activities. Retry policies and timeout settings were added after intermittent Spark/Livy session throttling (HTTP 430) was observed during early runs on shared capacity. Interactive notebook sessions should not run in parallel with pipeline execution.

04 — Full Document

Architecture PDF

RevPipeline360_Fabric_Build_Architecture.pdf

05 — Video Walkthrough

Build in Action

Fabric Workspace & Pipeline Walkthrough

16-57-08.mp4 — WS_RevPipeline360_Lab overview

Dashboard & Semantic Model Demo

18-57-16.mp4 — RevDashboardV4 live walkthrough

06 — Visual Artifacts

Build Screenshots

Revpipeline_Dashboard.png — Revenue Intelligence: Campaign to Revenue Dashboard

Revpipeline_SemanticModel.png — Direct Lake Semantic Model & Relationships

Revpipeline_ERD.png — Manage Relationships View

Revpipeline_Lakehouse.png — LH_RevPipeline360 Bronze Tables

Revpipeline_Gold_Transform.png — NB_Gold_Transform Notebook (PySpark)

Revpipeline_LakehousePipeline.png — PL_RevPipeline360_Refresh Orchestration

Revpipeline_Dataflow.png — Dataflow Gen2 Query Editor (14 queries)

RevPipeline_Workspace.png — WS_RevPipeline360_Lab Workspace

07 — Flow Diagram

Live Architecture Diagram

flowchart TD A[OneDrive CSV Files\nSalesforce + Finance + HR] --> B[Dataflow Gen2\n14 queries - bronze load] B --> C[LH_RevPipeline360\nBronze Tables dbo.bronze_*] C --> D[NB_Silver_Transform\nTRIM CAST NULLIF normalize] D --> E[Silver Tables dbo.silver_*\ncleaned typed standardized] E --> F[NB_Gold_Transform\nstar schema + date key joins] F --> G1[Gold Dims\ndim_date customers products users campaigns] F --> G2[Gold Facts\nopportunity leads invoices GL budget headcount] G1 --> H[SM_RevPipeline360\nDirect Lake Semantic Model] G2 --> H H --> I[DAX Measures\nWin Rate CPI Conversion TREATAS attribution] I --> J[RevDashboardV4\nRevenue Intelligence Dashboard] C --> PL[PL_RevPipeline360_Refresh\nSilver then Gold on success] PL --> D

08 — Implementation

Build Checklist

◆Ingest CSV source files from OneDrive via Dataflow Gen2 into Bronze tables (14 queries)
◆Build NB_Silver_Transform — TRIM, CAST, NULLIF, date normalization across all tables
◆Build NB_Gold_Transform — star schema dims and facts with date key enrichment via joins
◆Enhance gold_dim_campaigns with start_date_key and end_date_key for FY slicing
◆Configure PL_RevPipeline360_Refresh with Silver → Gold on-success chaining + retry/timeout
◆Build SM_RevPipeline360 Direct Lake semantic model with active/inactive relationship strategy
◆Implement DAX measures: Leads, Opportunities, Win Rate, Lead Conversion, Pipeline Amount
◆Implement TREATAS and USERELATIONSHIP attribution measures for campaign analytics
◆Build RevDashboardV4: Revenue Intelligence Campaign to Revenue report with FY slicer

RevPipeline360
Fabric Lakehouse Build

Lead Architect & Full-Stack Builder