How Microsoft Fabric Helps You Build Smarter Insig...

Sahir_Maharaj · ‎02-03-2025

Not long ago, I was mentoring a group of junior data scientists who were full of energy and eager to tackle real-world challenges. As we worked on some practical projects, one thing became clear, many of them struggled to figure out how their datasets connected. This was a big hurdle for them, but it wasn’t surprising to me. I’ve seen even experienced professionals get tripped up trying to make sense of relationships between tables. It reminded me of a lesson I’ve learned over the years - having lots of data isn’t enough. What really matters is understanding how that data fits together.

It can feel like trying to solve a puzzle when you’re missing half the pieces - but here’s the thing: it doesn’t have to be that hard. Tools like Microsoft Fabric make this process much simpler. With features like SemPy, you can explore, map out, and double-check these connections in your data. It’s more than just about saving time, but gaining the confidence to know your data is solid and ready to drive meaningful results.

What you will learn: In this edition, we’re exploring data relationships and how to make sense of them using Microsoft Fabric and the SemPy library. By the time you’re done with this, you’ll have a clear approach to mapping out your data, visualizing those connections, and making sure everything checks out. And because no dataset is perfect, we’ll also dive into validation - making sure your data is as solid as you need it to be.

Read Time: 9 minutes

Source: Sahir Maharaj

At its core, exploring relationships in data is about connecting the dots. Think of your dataset as a web, with each table acting as a node and relationships forming the threads. For someone new to data analysis, understanding this web can feel overwhelming. But this foundational step is critical. Relationships between tables define how data flows, how queries retrieve results, and ultimately how insights are generated. Without this understanding, analysis often becomes guesswork.

In Microsoft Fabric, the list_relationships function in the SemPy library is your starting point. This tool doesn’t just give you a dry list of connections but it presents a clear, organized picture of how your tables and fields interact. For someone new to the field, this clarity is empowering. It’s like turning on the lights in a dimly lit room, suddenly, everything starts to make sense. You can see how your tables relate, identify gaps, and even uncover hidden opportunities within your data.

Source: Microsoft Learn

For example, let’s say you’re working with sales data. You might have tables for Sales, Products, and Customers, but understanding how these interact is key. By running a simple SemPy command, you can see that the Sales table links to Products through a Product ID, and to Customers through a Customer ID. This kind of clarity lets you avoid making mistakes in your queries and ensures your analysis reflects real-world scenarios. What might initially seem like a jumbled collection of tables becomes an organized structure with meaningful connections. This understanding gives you a solid foundation to tackle more complex tasks, like visualization and validation, which we’ll explore next.

When I’m working on a complex dataset, one of my favorite steps is visualizing the relationships. A table of relationships is useful, but a visual representation takes things to a whole new level. It’s like seeing the forest instead of just the trees. The plot_relationship_metadata function in SemPy makes this process easy. It helps me transform text-based data into clear, intuitive graphs that highlight patterns and even anomalies.

Let me share how I approach this. Recently, I was working on a dataset for a freelance client, and I wanted to explain how their Sales, Customers, and Products tables were interconnected. I used the following code to create a visualization:

# Create a DataFrame
data = {
    "Multiplicity": ["ONE_TO_MANY", "MANY_TO_ONE", "ONE_TO_ONE", "MANY_TO_MANY"],
    "From Table": ["Customers", "Orders", "Products", "Suppliers"],
    "From Column": ["CustomerID", "OrderID", "ProductID", "SupplierID"],
    "To Table": ["Orders", "Customers", "Orders", "Products"],
    "To Column": ["CustomerID", "OrderID", "ProductID", "SupplierID"]
}

# Create the DataFrame
df = pd.DataFrame(data)

# Plot the relationship metadata
plot_relationship_metadata(df)

The function created a graph where each table appeared as a node, and the relationships were represented by edges. For instance, the graph showed a link between Sales and Customers through the Customer ID. This wasn’t just helpful for me, when I shared it in a team meeting, it instantly clarified the data flow for everyone in the room. And the best part - stakeholders who didn’t have a technical background could easily follow along, and it sparked great discussions about how the data was being used.

Source: Sahir Maharaj

But wait - the real value came when I noticed something missing. There was no direct connection between Products and Sales, which raised a red flag. It turned out that some critical data was missing from the pipeline. By iterating on the visualization, I was able to identify and address the issue before it affected our analysis. I’ve learnt in my career that visualization isn’t just about making data look good; it’s a tool for discovery. It helps uncover gaps, validate assumptions, and refine ones understanding of the dataset. Each time I create a new graph, I learn something new about the data, which makes my analysis not just more accurate but also more impactful.

Ok, so understanding and visualizing relationships is a great start, but validation ensures they hold up under scrutiny. The list_relationship_violations function identifies inconsistencies in your relationships, such as missing foreign keys or mismatched data types. Validation is your safeguard against erroneous insights. Let’s say you’re running an analysis, and you notice that some sales entries don’t have matching customer information. It’s frustrating because without clean, consistent data, your results can be misleading. I’ve encountered this scenario more times than I can count, and every time, I’m reminded of the importance of validating relationships in the data.

import pandas as pd

def read_table(dataset, table_name):
    if table_name == "Sales":
        return pd.DataFrame({
            "SaleID": [1, 2, 3, 4],
            "CustomerID": [101, 102, 103, 104],
            "Amount": [250, 400, 300, 500]
        })
    elif table_name == "Customers":
        return pd.DataFrame({
            "CustomerID": [101, 102, 105],
            "Name": ["Alice", "Bob", "Charlie"]
        })

class Fabric:
    @staticmethod
    def read_table(dataset, table_name):
        return read_table(dataset, table_name)
    
    @staticmethod
    def list_relationship_violations(tables):
        sales = tables["Sales"]
        customers = tables["Customers"]

        # Check for CustomerID in Sales that don't exist in Customers
        invalid_sales = sales[~sales["CustomerID"].isin(customers["CustomerID"])]
        return invalid_sales

fabric = Fabric()

# Load tables
tables = {
    "Sales": fabric.read_table("my_dataset", "Sales"),
    "Customers": fabric.read_table("my_dataset", "Customers")
}

# Find violations
violations = fabric.list_relationship_violations(tables)
print("Violations:")
print(violations)

This simple piece of code highlights potential problems, such as rows in the Sales table that don’t match any Customer ID in the Customers table. Once I identify these issues, I can dig deeper to figure out what went wrong. Maybe some data entries were missed during an upload, or perhaps there’s a mismatch in the data pipeline. Either way, addressing these problems ensures that the data I’m working with is accurate and trustworthy.

Source: Sahir Maharaj

Whenever you find relationship violations in your data, take a step back and ask yourself a few important questions. Are there gaps in the process that allowed these errors to slip through? Could better checks or automation help you catch these issues earlier? Instead of just patching the immediate problem, focus on uncovering the root cause. This way, you’re not just solving a one-time issue, you’re building a more reliable and resilient data system.

Now, it’s time for you to take the reins and try these techniques yourself. Open up Microsoft Fabric, explore your datasets, and see how the relationships between your tables can unlock new insights. Don’t worry if it feels a little overwhelming at first, that’s completely normal when you’re learning something new. The important part is to explore, experiment, and ask questions along the way. Think about the power you’ll gain by mastering Microsoft Fabric. Imagine confidently walking into a meeting with stakeholders and showing them not only what your data says but how it all connects. You’ll not only solve problems faster but also gain trust and credibility as someone who truly understands the story behind the data.

How Microsoft Fabric Helps You Build Smarter Insights with Semantic Models

How Microsoft Fabric Helps You Build Smarter Insig...

How to transform your Data Science workflows with...

Simplify, Transform, and Scale with the Data Wrang...

All in One Place - How Fabric Notebooks Simplify D...

How to Use Apache Spark for Data Lakehouse Ingesti...