Document your fabric datasets using the semantic link and ChatGPT API

Abdelghani Bouaddi, MBA

MBA | MSc Financial Engineering Candidate | DP-600 : Fabric Analytics Engineer Associate MSc | PL-300 : Power BI Data Analyst Associate

Published Oct 30, 2023

Documenting datasets can be a headache for Power BI developers, especially when it comes to explaining DAX measures to data professionals who may not understand DAX. In this article, I will attempt to show you how you can automate the documentation of dataset components within a Fabric workspace and how to leverage the OpenAI ChatGPT API to provide explanations for measures in the datasets.

Now, Fabric provides the opportunity to interact with Power BI datasets using the Semantic Link. This is particularly beneficial for data scientists who want to query datasets with PySpark in Fabric notebooks.

A quick win that can be achieved with this new feature is the automation of datasets documentation using Fabric notebooks and the sempy.fabric package. In the following sections, I will explain the steps to achieve this with fabric notebook:

Import the necessary packages:

Create Pandas dataframes where we will store the different components:

Dataframe for datasets:

Dataframe for tables :

We will iterate through the list of datasets, extract the list of tables, and add them to the tables' dataframe while including the corresponding dataset name:

Even the descriptions added at the table level can be extracted.

Dataframe of relationships between tables :

Dataframe of measures :

To go further, we can use the OpenAI API to employ ChatGPT to explain each measure and add a 'Measure Explanation' column that can be visualized when presenting the documentation in a Power BI report.

To do this, we define the following function:

Now, we can iterate over the dataframe of measures to add the 'Measure Explanation' column. Please note that we have used iteration instead of using "apply" method because we cannot send multiple requests to OpenAI at once.

In this part, we will convert the Pandas dataframes into Spark dataframes, and we will remove spaces from the column names because they are not supported in the Fabric lakehouse.

Finally, we will create the DeltaTables in the lakehouse.

And there you have it, abracadabra!

We can now schedule the notebook to run automatically and periodically update the various modifications to the structure of the different datasets.

Now that the majority of the work has been done, we can create a Fabric dataset from these tables within the lakehouse and define the relationships between the tables.

As a result, we can create a Power BI report using DirectLake to visualize the automated documentation. For example, by hovering over each measure, we can see an explanation of the measure with ChatGPT. This will greatly simplify the life of a data scientist who may not necessarily be proficient in DAX.

Thus, we can see that quick wins can be derived from the semantic link provided by Fabric. This is just one example among many of what can be achieved when we harmoniously leverage the different components provided by Fabric, and that's where its power as an end-to-end analytics solution truly comes into play.

Arrjune Rasiah

Manager, Data Analytics and Data Modeling at Public Health Agency of Canada | Gestionnaire, Analyse des données et modélisation des données, Agence de la santé publique du Canada

Love this article! Very insightful and I'll looking to implement this on our end! Thank you!

1 Reaction

To view or add a comment, sign in

See all

Document your fabric datasets using the semantic link and ChatGPT API

Abdelghani Bouaddi, MBA

MBA | MSc Financial Engineering Candidate | DP-600 : Fabric Analytics Engineer Associate MSc | PL-300 : Power BI Data Analyst Associate

More articles by this author

Insights from the community

Others also viewed

DATA Pill #048 - Zero-ETL, Chat GPT and why NOT to use Kubeflow

OpenLink Data Twingler AI Agent Example

Retrieval Augmented Generation (RAG) for Structured Data Processing

The Anatomy of a GenAI System - Part 2

DATA Pill #035 - ChatGPT3, recommendation system from YouTube & hype for Rust

📌 GPT + SQL = The Future of Analytics?

ChatGPT for Data Analytics: Course Review

BI and LLM Integration

Alpha Capture in Digital Commerce [Series]

How to Analyze Comprehensively Generated Conversation chats imported from WhatsApp.

Explore topics

Real-Time CSV Streaming with Microsoft Fabric OneLake and Spark Job Definitions"

Mar 14, 2024

How to Automatically Pause Fabric Capacity using Microsoft Data Factory After Pipeline Execution and Save Money in Your Azure Subscription?

Jan 21, 2024

A Deep Dive into Slowly Changing Dimensions with Microsoft Fabric: Leveraging Notebooks, Pipelines, Dataflows, and Advanced ETL Techniques

Nov 21, 2023

Incremental Refresh and Medallion Architecture within Fabric, implemented using dataflows and notebooks.

Oct 27, 2023