Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Dataflow Gen2 now supports Continuous Integration/Continuous Deployment (CI/CD) and Git integration. This feature allows you to create, edit, and manage dataflows in a Git repository that's connected to your fabric workspace. Additionally, you can use the deployment pipelines feature to automate the deployment of dataflows from your workspace to other workspaces. This article goes deeper into how to use Dataflow Gen2 with CI/CD and Git integration in Fabric Data Factory.
Important
Git integration and deployment pipeline (CI/CD) for Dataflows Gen2 in Data Factory for Microsoft Fabric are currently in public preview. This information relates to a pre-release product that may be substantially modified before it's released. Microsoft makes no warranties, expressed or implied, with respect to the information provided here.
New features
With Dataflow Gen2 (CI/CD preview), you can now:
- Use Git integration support for Dataflow Gen2.
- Use the deployment pipelines feature to automate the deployment of dataflows from your workspace to other workspaces.
- Use the Fabric settings and scheduler to refresh and edit settings for Dataflow Gen2.
- Create your Dataflow Gen2 directly into a workspace folder.
Prerequisites
To get started, you must complete the following prerequisites:
- Have a Microsoft Fabric tenant account with an active subscription. Create an account for free.
- Make sure you have a Microsoft Fabric enabled workspace.
- To enjoy Git integration, make sure it's enabled for your workspace. To learn more about enabling Git integration, go to Get started with Git integration.
Create a Dataflow Gen2 with CI/CD and Git support
To create a Dataflow Gen2 with CI/CD and Git support, follow these steps:
In the Fabric workspace, select Create new item and then select Dataflow Gen2.
Give your dataflow a name and enable the Git integration. Then select Create.
The dataflow is created and you're redirected to the dataflow authoring canvas. You can now start creating your dataflow.
When you're done, select Save and run.
After you publish, the dataflow has a status of uncommitted.
To commit the dataflow to the Git repository, select the source control icon in the top right corner of the workspace view.
Select all the changes you want to commit and then select Commit.
You now have a Dataflow Gen2 with CI/CD and Git support. We suggest you follow the best practices for working with CI/CD and Git integration in Fabric described in the Scenario 2 - Develop using another workspace tutorial.
Refresh a Dataflow Gen2 or schedule a refresh
You can refresh a Dataflow Gen2 with CI/CD and Git support in two ways—manually or by scheduling a refresh. The following sections describe how to refresh a Dataflow Gen2 with CI/CD and Git support.
Refresh now
In the fabric workspace, select the more options ellipsis icon next to the dataflow you want to refresh.
Select refresh now.
Schedule a refresh
If your dataflow needs to be refreshed on a regular interval, you can schedule the refresh using the Fabric scheduler.
In the Fabric workspace, select the more options ellipsis icon next to the dataflow you want to refresh.
Select Schedule.
On the schedule page, you can set the refresh frequency and the start time and end time, after which you can apply changes.
To start the refresh now, select the Refresh button.
Refresh history and settings
To view the refresh history of the dataflow, you can either select the recent runs tab in the dropdown menu or go into the monitor hub and select the dataflow you want to view the refresh history of.
Settings for Dataflow Gen2 with CI/CD
Accessing the settings of the new Dataflow Gen2 with CI/CD and Git support is similar to any other Fabric item. You can access the settings by selecting the more options ellipsis icon next to the dataflow and selecting the settings.
Saving replaces the publish operation
With Dataflow Gen2 with CI/CD and Git support, the save operation replaces the publish operation. This means that when you save your dataflow, it automatically "publishes" the changes to the dataflow. This is a significant change from the previous version of Dataflow Gen2, where you had to explicitly publish your changes. The saving operation is directly overwriting the dataflow in the workspace. If you want to discard the changes, you can do that by selecting the Discard changes when closing the editor. During the save operation we also check if the dataflow is in a valid state. If the dataflow is not in a valid state, we will show an error message in the dropdown menu in the workspace view. We determine the validity of the dataflow by running a "zero row" evaluation for all the queries in the dataflow. This means that we run all the queries in the dataflow in a manner that only requests the schema of the query result, without returning any rows. If a query evaluation fails or a query’s schema cannot be determined within 10 minutes, we fail validation and use the previously saved version of the dataflow for refreshes.
Limitations and known issues
While Dataflow Gen2 with CI/CD and Git support offers a powerful set of features for enterprise ready collaboration, this required us to rebuild the backend to the fabric architecture. This means that some features are not yet available or have limitations. We are actively working on improving the experience and will update this article as new features are added.
- When you delete the last Dataflow Gen2 with CI/CD and Git support, the staging artifacts become visible in the workspace and are safe to be deleted by the user.
- Workspace view doesn't show if a refresh is ongoing for the dataflow.
- When branching out to another workspace, a Dataflow Gen2 refresh might fail with the message that the staging lakehouse couldn't be found. When this happens, create a new Dataflow Gen2 with CI/CD and Git support in the workspace to trigger the creation of the staging lakehouse. After this, all other dataflows in the workspace should start to function again.
- When syncing changes from GIT into the workspace, you need to open the new or updated dataflow and save changes manually with the editor. This triggers a publish action in the background to allow the changes to be used during refresh of your dataflow.