Document your fabric datasets using the semantic link and ChatGPT API
Documenting datasets can be a headache for Power BI developers, especially when it comes to explaining DAX measures to data professionals who may not understand DAX. In this article, I will attempt to show you how you can automate the documentation of dataset components
Now, Fabric provides the opportunity to interact with Power BI datasets
A quick win that can be achieved with this new feature is the automation of datasets documentation using Fabric notebooks and the sempy.fabric package. In the following sections, I will explain the steps to achieve this with fabric notebook:
Import the necessary packages:
Create Pandas dataframes where we will store the different components:
Dataframe for datasets:
Dataframe for tables :
We will iterate through the list of datasets, extract the list of tables, and add them to the tables' dataframe while including the corresponding dataset name:
Even the descriptions added at the table level can be extracted.
Dataframe of relationships between tables :
Dataframe of measures :
To go further, we can use the OpenAI API to employ ChatGPT to explain each measure and add a 'Measure Explanation' column that can be visualized when presenting the documentation in a Power BI report.
To do this, we define the following function:
Now, we can iterate over the dataframe of measures to add the 'Measure Explanation' column. Please note that we have used iteration instead of using "apply" method because we cannot send multiple requests to OpenAI at once.
In this part, we will convert the Pandas dataframes into Spark dataframes, and we will remove spaces from the column names because they are not supported in the Fabric lakehouse.
Finally, we will create the DeltaTables in the lakehouse.
And there you have it, abracadabra!
We can now schedule the notebook to run automatically and periodically update the various modifications to the structure of the different datasets.
Now that the majority of the work has been done, we can create a Fabric dataset from these tables within the lakehouse and define the relationships between the tables.
As a result, we can create a Power BI report using DirectLake to visualize the automated documentation. For example, by hovering over each measure, we can see an explanation of the measure with ChatGPT. This will greatly simplify the life of a data scientist who may not necessarily be proficient in DAX.
Thus, we can see that quick wins can be derived from the semantic link provided by Fabric. This is just one example among many of what can be achieved when we harmoniously leverage the different components provided by Fabric, and that's where its power as an end-to-end analytics solution truly comes into play.
Manager, Data Analytics and Data Modeling at Public Health Agency of Canada | Gestionnaire, Analyse des données et modélisation des données, Agence de la santé publique du Canada
1yLove this article! Very insightful and I'll looking to implement this on our end! Thank you!