Overview of Module 3: Implement and manage semantic models (20–25%)

rsaprano · ‎11-26-2024

Overview of Module 3: Implement and manage semantic models (20–25%)

Module 3 guides professionals through the art of building efficient semantic models used as a basis for interactive visualisations in power BI and available for querying using the Sempy library in notebooks.

Firstly, we need to consider the storage mode and use cases for each: Import mode, Direct query or Direct lake. Import Mode loads and caches data into Power BI’s memory, providing lightning-fast querying and interactions. This mode is perfect for scenarios where data doesn’t change frequently or where performance is critical, such as reports tracking quarterly sales trends or annual reports. The data is refreshed periodically, so if it only changes daily or weekly, Import Mode is optimal. As a use case, consider a finance team creating a static income statement that refreshes data every night. The data is loaded once a day, allowing for high-speed querying during business hours.

DirectQuery Mode does not cache data. Instead, it queries data directly from the source every time a user interacts with the report. This mode is suitable for scenarios where data needs to be current, such as monitoring live KPIs or tracking sales performance in near real-time. As a use case, consider a retail company which uses DirectQuery to display up-to-the-minute sales data during a major product launch, allowing decision-makers to adjust marketing strategies based on current performance.

Direct Lake Mode is the latest addition, blending the strengths of Import and DirectQuery modes while directly accessing data in a data lake. This mode uses an optimized connection to Azure Data Lake Storage (OneLake), allowing for fast queries without importing all the data into memory. This makes it ideal for scenarios involving huge datasets that need efficient querying without full ingestion. Consider an organization analyzing petabytes of IoT sensor data stored in Azure Data Lake for anomaly detection, with results visualized in near real-time without caching every data point.

Building effective models requires a solid understanding of DAX (Data Analysis Expressions) for expressing business logic. The exam syllabus refers specifically to iterators, window and information functions.

Iterators are DAX functions that perform row-by-row calculations over a table. Unlike standard aggregation functions (like SUM or AVERAGE), iterators—such as SUMX, AVERAGEX, and MAXX—allow more nuanced calculations by iterating through each row and applying a specific formula.

Scenario: A company wants to calculate the weighted average margin based on sales volume. Using SUMX, they can evaluate this by multiplying the margin for each product by its sales volume and dividing by the total volume:

Weighted_Avg_Margin = 
    SUMX(Sales, Sales[Margin] * Sales[Volume]) / SUM(Sales[Volume])

Windowing functions are crucial for analyzing data in relation to other rows, not just within time periods. Functions like INDEX, OFFSET, and RANKX help calculate metrics that involve moving or comparative windows, enabling advanced trend analysis.

Scenario: Suppose a finance analyst needs to create a rolling 3-month average of sales to smooth out fluctuations. Using OFFSET or INDEX, they can access data from preceding rows, allowing the calculation of moving averages:

Rolling_3_Month_Avg = 
    AVERAGEX(
        DATESINPERIOD(Date[Date], 
                      LASTDATE(Date[Date]), 
                      -3, 
                      MONTH), 
        SUM(Sales[Revenue])
    )

Alternatively, using OFFSET:

Rolling_Avg = 
    AVERAGEX(
        OFFSET(-2, 0, ALLSELECTED(Date[Date])), 
        SUM(Sales[Revenue])
    )

Functions like ISBLANK, HASONEVALUE, and USERELATIONSHIP can manage data validation and context adjustments. Using HASONEVALUE for example can ensure a measure only calculates when a single value in a column is in scope, preventing unintended aggregations.

Information functions such as INFO.TABLES contains information about the tables in the model, such as the table name, description, and whether it is hidden or not. INFO.COLUMNS contains information about the columns in a model and INFO.MEASURES contains information about the measures.

In addition, you will need to know about calculation groups and dynamic format strings for the exam. Calculation groups are a simple way to reduce the number of measures in a model by grouping common measure expressions . They are made up calculation items, which are simply DAX statements containing a substitute or placeholder for existing explicit measures in your model with use of SELECTEDMEASURE()

With dynamic format strings for measures, you can determine how measures appear in visuals by conditionally applying a format string with a separate DAX expression.

Another key aspect of this module is to analyse and optimise performance of semantic models. You can use performance Analyzer to see how long it takes to render visuals on the page and if the DAX query is the element which is causing a visual to perform slowly, you can copy the DAX query into DAX Studio to troubleshoot it.

To do this, turn ON query plan and server timings. run the query and observe the Formula Engine (FE) and Storage Engine (SE) timings plus number of queries that are run by the SE.

SE operations involve retrieving raw data from the model’s storage. These are usually fast; however, if SE time is high, consider simplifying the data model or using aggregations. FE operations involve computations performed after retrieving data. High FE time may indicate complex calculations that need optimization, such as heavy use of iterators.

To identify bottlenecks, look for areas where query time is concentrated—whether it's in SE or FE operations. Use this information to pinpoint which DAX expressions or data model structures are causing delays.

Lastly, the exam syllabus mentions Incremental Refresh in this module. This is essential for optimizing large datasets in Power BI, as it significantly reduces the time and resources required to update data. Instead of refreshing the entire dataset, incremental refresh only updates the most recent data changes, leading to faster refresh times and better performance, especially for datasets that grow over time.

Module 3 Worked Example - Creating a Calculation Group

In the attached PDF, there is a guided walkthrough of creating a Calculation group in Power BI Desktop

This example is avaialable as an interactive "practice the clicks" exercise in the DP600 Study guide...

Navigating the (new) DP-600 exam: Part 3- Implement and manage semantic models

Overview of Module 3: Implement and manage semantic models (20–25%)

With dynamic format strings for measures, you can determine how measures appear in visuals by conditionally applying a format string with a separate DAX expression.