Hi @v-juangutier
To predefine an appropriate Microsoft Fabric SKU, estimate Capacity Units (CUs) by modeling workloads based on data volume, operation types, and usage patterns.
While Microsoft Fabric doesn’t provide a direct formula to convert raw data sizes (MB, GB, TB) to CUs due to the variability of workloads , you can estimate CU requirements by modeling workloads based on data volume, operation types, and expected usage patterns.
Identify the client's primary Fabric workloads (eg: Data Warehouse, Lakehouse, Power BI reports, Dataflows, Spark notebooks) then estimate data volumes(e.g., 100 GB for a Lakehouse, 1 TB for a Data Warehouse), and determine the frequency and type of operations(e.g., daily ingestion, hourly queries, interactive reporting).
Ex: A client plans to ingest 500 GB daily into a Lakehouse, run 100 SQL queries per hour on a 2 TB Data Warehouse, and generate 50 Power BI reports viewed by 200 users daily.
Microsoft Fabric measures CU consumption in CU-seconds, where CUs represent compute resources (CPU, memory, I/O). Each workload consumes CUs differently based on data size and operation complexity
Workload-Specific CU Consumption Rates:
For instance:
- Data Warehouse: 1 Fabric Data Warehouse core = 2 CUs. A query processing 1 GB might consume ~10 CU-seconds, depending on complexity.
- Spark: 2 Spark vCores = 1 CU. Processing 1 GB in a notebook might consume ~50 CU-seconds for a typical ETL job.
- Power BI: Rendering a report with 1 GB of data might consume ~5–20 CU-seconds per user interaction, depending on visuals.
- OneLake: Reading 16 MB via OneLake consumes ~4 CU-seconds (e.g., 10,000 reads of 16 MB = 40,000 CU-seconds).
Estimate CU Usage per Workload:
Example Calculation:
Lakehouse Ingestion (500 GB/day):
Assume ingestion via Dataflow Gen2, consuming ~0.1 CU-seconds per MB.
500 GB = 500,000 MB → 500,000 × 0.1 = 50,000 CU-seconds/day.
Data Warehouse Queries (2 TB, 100 queries/hour):
Assume each query processes 10 GB (10,000 MB) and consumes 100 CU-seconds.
100 queries × 100 CU-seconds = 10,000 CU-seconds/hour → 240,000 CU-seconds/day (24 hours).
Power BI Reports (50 reports, 200 users):
Assume each report consumes 10 CU-seconds per view, with 200 users viewing 50 reports daily.
50 reports × 200 users × 10 CU-seconds = 100,000 CU-seconds/day.
Total Daily CU-seconds:
50,000 + 240,000 + 100,000 = 390,000 CU-seconds/day.
Convert to CU-hours: 390,000 ÷ 3,600 = ~108 CU-hours/day.
Map to SKU:
- Fabric SKUs range from F2 (2 CUs) to F2048 (2048 CUs). A SKU provides its CU value continuously for a 24-hour period.
- Calculate required CUs: 108 CU-hours ÷ 24 hours = ~4.5 CUs.
- Select the next-highest SKU: An F8 (8 CUs) would cover this workload, with room for bursting during peak usage.
If this post is helpful, please mark it as the Accepted Solution.
Thank You!