Near-real-time on-premises data in Fabric without any coding

Yes, it is true.

It is now possible to get on-premises SQL data into Fabric in near-real-time without any coding.

THANK YOU, Fabric team πŸ™‚

For me this is a real game changer.

Until now, if you wanted to delta load data from an on-premises SQL database into Fabric, you would have to create a pipeline with logic to keep track of timestamps, watermarks, …, call it what you will, but basically the last time you last checked, and then you could get all records inserted, updated or deleted since then.
In theory simple, but still something that could cost several hours or more before it was tested and delivered.

But now in Fabric Microsoft have created the feature named Mirrored SQL Server (announcement), which enables you to mirror selected tables from an on-premises database into Fabric, without any coding or difficult configuration.
When selecting the database and tables to be mirrored Microsoft enables Change Data Capture (CDC) on the on-premises database and tables, set up a schedule, and that is all. Done.
Now changes in the on-premises tables are available in Fabric within 1-2 minutes.
That is very much near-real-time compared to running pipelines.

In this blog post I will show you have how easy it is to set it up using the AdventureWorks sample database.
Microsoft have this article which describes the steps, but it is not that detailed.

I have also done the setup using a Dynamics AX2012R3 database and that was exactly the same experience, easy and similar straight forward.

Prepare on-premises database

The AdventureWorks database I am using is the AdventureWorksDW2022 database which can be downloaded from here.

Create login to be used from Fabric

In this demonstration I am using a SQL login as my demo domain is not integrated into Azure, but preferable you should use a Microsoft Entra authenticated login.

USE master;
CREATE LOGIN [FabricUser]
    WITH PASSWORD=N'Password101',
    DEFAULT_DATABASE=[master],
    CHECK_EXPIRATION=OFF,
    CHECK_POLICY=OFF;

Associated the Fabric login to the AdventureWorksDW2022 database

The sysadmin role is only needed during the initial configuration where CDC is enabled on the database and the two SQL agent jobs to execute CDC are created.

USE master;
ALTER SERVER ROLE [sysadmin] ADD MEMBER [FabricUser];

After the initial setup meaning selecting the initial tables, the sysadmin role is no longer needed.

From then on, the db_owner role on the on-premises database is sufficient to add additional tables.

USE AdventureWorksDW2022;
CREATE USER [FabricUser] FOR LOGIN [FabricUser];
ALTER ROLE [db_owner] ADD MEMBER [FabricUser];

Create a workspace with Fabric capacity

In this demonstration I am using a Fabric F2 capacity to see if it really works with the lowest possible Fabric capacity.

Create connection for the Mirrored SQL Server feature

Before starting configuring the mirroring of the on-premises database we will setup the connection the be used.

In Microsoft’s article they describe how creating the connection can be done as part of creating the mirroring, but in case of trouble getting access it is more convenient to break the setup in 2 parts, first setup the connection, and then configure the mirroring.

In the Fabric portal click on the gear symbol in upper right corner and select Manage connections and gateways.

Click New to create a new connection.

Fill in the parameters and click Create.

After the connection has been created successfully click Close.

Create Mirrored SQL Server

In the Fabric workspace create the new item Mirrored SQL Server.

Chose a database connection and here we will choose the one that we just created.

In the Choose data dialog we will select the following tables to be mirrored and then click Create:

  • dbo.DimCustomer
  • dbo.DimDate
  • dbo.DimProduct
  • dbo.DimProductCategory
  • dbo.DimProductSubcategory
  • dbo.FactInternetSales

Type the name of the destination database in Fabric and click Create mirrored database.

Fabric will now start creating and configuring the necessary artifacts.

It will take some time to get the initial data loaded.

When Fabric has completed the setup, you can see that it has enabled CDC on the on-premises database and created the necessary SQL agent jobs to run CDC.

And by executing the following T-SQL statements on the on-premises SQL database, you can see which tables that has been enabled for CDC.

USE AdventureWorksDW2022;

SELECT *
FROM sys.tables
WHERE (is_tracked_by_cdc = 1)
ORDER BY modify_date DESC;

SELECT *
FROM cdc.change_tables
ORDER BY create_date DESC;

In the Fabric portal you can now see the status for the mirrored tables.

Using the Mirrored SQL Server

When selecting the Fabric workspace, you can see the created mirrored database and the underlying default semantic model and SQL endpoint.

To demonstrate the near-real time update in Fabric I created a new sematic model with the AdventureWork tables using Direct Lake storage mode and added a few measures to the model.

On top of the custom semantic model, I added a simple Power BI report configured to auto refresh every 5 minutes.

The Power BI report shows internet sales over time and for January 2011 the total is 323,743.21

On the on-premises SQL server I then updates one record in the dbo.FactInternetSales table with due date in January 2011 to have it SalesAmount to be 1,003,578.27 instead of 3,578.27

UPDATE fis
SET fis.SalesAmount = 1003578.27
FROM dbo.FactInternetSales AS fis
WHERE (fis.SalesOrderNumber = 'SO43697') AND (fis.SalesOrderLineNumber = '1');

Approximately 60-80 seconds later the data in the Fabric mirrored database was updated.

I checked it with SQL Server Profiler on the on-premises SQL server, and I could see that Fabric β€œchecked-in” every 45-70 seconds.

I expect that depending on the number of tables selected for mirroring that this will increase.

And without doing anything on my Power BI report, which refreshes every 5 minutes, the data for January 2011 changed automatically and without any interaction from my side to show 1,323,743.21 for January 2011.

Amazing – That is a data delay of under 10 minutes from on-premises data changes to be visible in a Power BI report that is connected to a Fabric dataset 😊

THANK YOU, Fabric team πŸ™‚

The key component in this setup is Change Data Capture (CDC).

CDC has been available in standard edition of SQL sever since version 2016 SP1.

One requirement to use CDC is that the tables to be mirrored has a primary key, and that the primary key is of a certain data type.

The limitations in Microsoft Fabric mirrored databases from SQL Server are listed in this article.

Finally, remember it is still a preview feature, but I must also say that I have not experienced any difficulties using it.

Have fun πŸ™‚