Document your fabric datasets using the semantic link and ChatGPT API

Document your fabric datasets using the semantic link and ChatGPT API

Documenting datasets can be a headache for Power BI developers, especially when it comes to explaining DAX measures to data professionals who may not understand DAX. In this article, I will attempt to show you how you can automate the documentation of dataset components within a Fabric workspace and how to leverage the OpenAI ChatGPT API to provide explanations for measures in the datasets.

Now, Fabric provides the opportunity to interact with Power BI datasets using the Semantic Link. This is particularly beneficial for data scientists who want to query datasets with PySpark in Fabric notebooks.

A quick win that can be achieved with this new feature is the automation of datasets documentation using Fabric notebooks and the sempy.fabric package. In the following sections, I will explain the steps to achieve this with fabric notebook:

Import the necessary packages:

Article content

 Create Pandas dataframes where we will store the different components:

Dataframe for datasets:

Article content

Dataframe for tables :

Article content

We will iterate through the list of datasets, extract the list of tables, and add them to the tables' dataframe while including the corresponding dataset name:

Even the descriptions added at the table level can be extracted.


Dataframe of relationships between tables :

Article content

Dataframe of measures :

Article content

To go further, we can use the OpenAI API to employ ChatGPT to explain each measure and add a 'Measure Explanation' column that can be visualized when presenting the documentation in a Power BI report.

To do this, we define the following function:

Article content


Now, we can iterate over the dataframe of measures to add the 'Measure Explanation' column. Please note that we have used iteration instead of using "apply" method because we cannot send multiple requests to OpenAI at once.

Article content

In this part, we will convert the Pandas dataframes into Spark dataframes, and we will remove spaces from the column names because they are not supported in the Fabric lakehouse.


Article content

 Finally, we will create the DeltaTables in the lakehouse.

Article content

And there you have it, abracadabra!

 

Article content

We can now schedule the notebook to run automatically and periodically update the various modifications to the structure of the different datasets.

 

Article content


 Now that the majority of the work has been done, we can create a Fabric dataset from these tables within the lakehouse and define the relationships between the tables.


Article content

 

Article content

As a result, we can create a Power BI report using DirectLake to visualize the automated documentation. For example, by hovering over each measure, we can see an explanation of the measure with ChatGPT. This will greatly simplify the life of a data scientist who may not necessarily be proficient in DAX.


Article content

Thus, we can see that quick wins can be derived from the semantic link provided by Fabric. This is just one example among many of what can be achieved when we harmoniously leverage the different components provided by Fabric, and that's where its power as an end-to-end analytics solution truly comes into play.

 

Arrjune Rasiah

Manager, Data Analytics and Data Modeling at Public Health Agency of Canada | Gestionnaire, Analyse des données et modélisation des données, Agence de la santé publique du Canada

1y

Love this article! Very insightful and I'll looking to implement this on our end! Thank you!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics