System at a Glance
Gold Fact & Dimension Tables — Full Star Schema
5 dimensions + 9 fact tables built via PySpark notebooks on a Bronze/Silver/Gold medallion lakehouse, feeding a Direct Lake semantic model with full DAX measure library and a live Power BI Revenue Intelligence report.
Technical Architecture
Workspace & Assets
| Asset | Type | Description |
|---|---|---|
WS_RevPipeline360_Lab | Workspace | Primary Fabric workspace containing all build assets |
LH_RevPipeline360 | Lakehouse | OneLake storage — Bronze / Silver / Gold dbo schema |
NB_Silver_Transform | Notebook | PySpark/SQL: cleanse Bronze into Silver (trim, cast, normalize) |
NB_Gold_Transform | Notebook | PySpark/SQL: build star schema Gold dims and facts |
PL_RevPipeline360_Refresh | Pipeline | Orchestration: Run_NB_Silver → (On Success) → Run_NB_Gold |
SM_RevPipeline360 | Semantic Model | Direct Lake — relationships, DAX measures, role security |
RevDashboardV4 | Power BI Report | Revenue Intelligence: Campaign to Revenue page |
Gold Schema — Dimensions
- gold_dim_date — date_key, calendar_date, year, month, quarter, week, fiscal_year, fiscal_month
- gold_dim_customers — master_customer_id (PK), customer_name, industry, region, active_flag
- gold_dim_products — master_sku (PK), product_name, product_family, category, uom
- gold_dim_users — salesrep_id (PK), rep_name, manager, territory, region
- gold_dim_campaigns — campaign_id (PK), campaign_name, channel, start_date/end_date + date_keys
Gold Schema — Facts
- gold_fact_opportunity — opportunity_id, customer_id, salesrep_id, stage, is_closed, is_won, amount, close_date_key, created_date_key
- gold_fact_opportunity_line — line items joined to opportunity and products (product_sku, quantity, unit_price, discount_pct, line_amount)
- gold_fact_leads — lead_id, lead_source, campaign_id, created_date_key, converted_to_opportunity_id
- gold_fact_campaigns — campaign activity fact bridging campaigns dim to lead funnel
- gold_fact_invoice_header — invoice_id, opportunity_id, customer_id, invoice_date_key, due_date_key, status
- gold_fact_invoice_line — line-level invoice detail with product and customer enrichment
- gold_fact_gl_postings — posting_date_key, customer_id, account_code, account_type, amount
- gold_fact_budget_monthly — month_date_key, region, budget_revenue, budget_cogs, budget_gross_profit, budget_gm_pct
- gold_fact_headcount_monthly — month_date_key, region, headcount, hires, attrition, attrition_rate
Relationship Design
- Active paths: gold_dim_date → fact tables via primary date key per fact (e.g., close_date_key for opportunity, invoice_date_key for invoice line)
- Inactive relationships: created_date_key, due_date_key, campaign start/end date keys — activated via
USERELATIONSHIPin DAX where needed - Attribution pattern: campaign → lead → opportunity linked via
TREATASonconverted_to_opportunity_idto avoid ambiguous active paths - Pipeline hardening: retries + timeouts on notebook activities to handle Spark/Livy session throttling (HTTP 430)
Key DAX Measures
| Measure | Logic |
|---|---|
Leads | COUNTROWS(gold_fact_leads) |
Won Opportunities | CALCULATE([Opportunities], is_won = 1) |
Win Rate | DIVIDE([Won Opportunities], [Opportunities], 0) |
Lead Conversion Rate | DIVIDE([Converted Leads], [Leads]) |
Pipeline Amount (By Channel) | TREATAS attribution via converted_to_opportunity_id |
Active Campaigns (FY) | FY overlap window using SELECTEDVALUE + date boundary filter |
Pipeline Amount (Attributed) | USERELATIONSHIP on lead → opportunity inactive path |
Build Notes & Design Decisions
Medallion ELT Pattern
All transforms run inside Fabric via PySpark notebooks using spark.sql() with CREATE OR REPLACE TABLE statements. This ensures idempotent rebuilds — re-running any notebook fully refreshes the layer without manual cleanup. Silver focuses exclusively on data quality (types, nulls, trimming); Gold focuses exclusively on business logic (joins, date key enrichment, derived metrics).
Campaign Date Key Enhancement
The gold_dim_campaigns table was enhanced with both start_date_key and end_date_key by joining to gold_dim_date twice during Gold build. This enables fiscal year overlap slicing for the Active Campaigns (FY) measure without requiring a bridge table.
Ambiguous Relationship Resolution
Several fact tables (opportunity, invoice header) have multiple date foreign keys pointing to gold_dim_date. Only one relationship per fact table is set active (the primary analytical date). Alternate date roles are activated in DAX measures using USERELATIONSHIP. For campaign-to-opportunity attribution where no direct foreign key exists, TREATAS is used to project the filtered opportunity set from lead conversion data.
Pipeline Governance
The PL_RevPipeline360_Refresh pipeline uses On Success chaining between Silver and Gold notebook activities. Retry policies and timeout settings were added after intermittent Spark/Livy session throttling (HTTP 430) was observed during early runs on shared capacity. Interactive notebook sessions should not run in parallel with pipeline execution.
Architecture PDF
Build in Action
Build Screenshots
Live Architecture Diagram
Build Checklist
- ◆Ingest CSV source files from OneDrive via Dataflow Gen2 into Bronze tables (14 queries)
- ◆Build NB_Silver_Transform — TRIM, CAST, NULLIF, date normalization across all tables
- ◆Build NB_Gold_Transform — star schema dims and facts with date key enrichment via joins
- ◆Enhance gold_dim_campaigns with start_date_key and end_date_key for FY slicing
- ◆Configure PL_RevPipeline360_Refresh with Silver → Gold on-success chaining + retry/timeout
- ◆Build SM_RevPipeline360 Direct Lake semantic model with active/inactive relationship strategy
- ◆Implement DAX measures: Leads, Opportunities, Win Rate, Lead Conversion, Pipeline Amount
- ◆Implement TREATAS and USERELATIONSHIP attribution measures for campaign analytics
- ◆Build RevDashboardV4: Revenue Intelligence Campaign to Revenue report with FY slicer