Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
Check it out now!Fabric Ideas just got better! New features, better search, and direct team engagement. Learn more
We mirrored one of our Azure SQL databases to Fabric using the Mirrored Azure SQL database item. One of the table has about 500M records. In another workspace than the mirrored database, we have a Lakehouse in which we created shortcuts to the mirrored tables.
We query the large table using pyspark with a date filter, to get all records of one or more dates into a dataframe.
What we are noticing is that when filtering the table, a table scan occures over all 500M records. Although this takes a couple of minutes, we would like to avoid this table scan. Partitioning mirrored the table should help prevent this and should also decrease the amount of capacity CU's consumed to do this task.
I would love to have the possibility to configure partitions on a mirrored table. This should be possible in the initial phase when creating the mirrored database and also afterwards. When initializing the partitioned table, the partitions are immediately applied. When applying a partition afterwards, when a sync is already active, the table would probably have to resync into the new partition.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.