October 20th 2023

Azure Data Factory

Azure Data Factory (ADF) is a cloud-based data integration service provided by Microsoft as part of the Microsoft Azure ecosystem. It allows you to create, schedule, manage, and monitor workflows that move and transform data between different sources and destinations.

ÙƒØªØ§Ø¨ Ù„ØºØ© Ø§Ù„Ø¬Ø³Ø¯: Ø§Ù„Ø¯Ù„Ù…

Scenario

In the process of loading data into the data warehouse, the dimension tables are loaded before the fact Table. When loading fact data from a CSV or Excel file, a situation may arise where the file contains values instead of IDs. These IDs are typically system-generated and lack business context. In our fact table structure, we prefer storing the IDs from the associated dimension table rather than the provided values.

The blog demonstrates how to handle this scenario in a data flow, showcasing how to reference the foreign key value in our data flow when ingesting data.

Structure of Dimension Tables

Month (‘month’,’month_id’)
Spend_type(‘spend_type’,’spend_type_id’)
Customers(‘FullName’,’FirstName’,’LastName’,’CustomerId’)

Structure of Excel/CSV file

Monthly_avg_csv(‘FirstName’,’LastName’,’month’,’spend_type’,’ Avg Amount Spent’)

Structure of Fact Table

Avg_Monthly_Spends(‘avg_monthly_spend_id’, ‘CustomerID’ ,’spent_type_id’, ‘month_id’, ‘Avg Amount Spent’)

Steps to Handling Value-to-ID Mapping for Fact Tables

Create the respective facts and dimensions table and insert sample values
Create the sample CSV file with values present in the dimension table.
Upload the file to the data lake storage
Create a link service Azure blob storage and a dataset with CSV file format support
Create a link service Azure SQL DB and data set to SQL DB
Create a pipeline in Azure Data Factory and add data flow activity.
Next click on the + icon to create a new data flow
Create a source in data flow to get data from CSV file by providing an appropriate dataset and configuring source options according to requirement

Create a data flow source to retrieve information from the dimension table within the SQL Database, aiming to obtain corresponding IDs.

Select the “+” icon to incorporate a lookup transformation. Within the lookup stream, choose the ‘month’ field. Then, within the lookup condition, specify the business key, for instance, the actual value of the month, utilizing the dynamic editor using the byName() function and provide column name

Do a similar for getting the ID value from the other dimension table.
Add sink transformation and keep mapping as auto mapping and in the dataset select the sql db and configure it with Avg_Monthly_Spends table.

The post Optimizing Data Loading in Data Warehousing: Handling Value-to-ID Mapping for Fact Tables appeared first on Addend Analytics.

This post first appeared on Addend Analytics, please read the originial post: here

People also like

Optimizing Data Loading in Data Warehousing: Handling Value-to-ID Mapping for Fact Tables

Azure Data Factory

Related Articles

Scenario

Structure of Dimension Tables

Structure of Excel/CSV file

Structure of Fact Table

Steps to Handling Value-to-ID Mapping for Fact Tables

Share the post

Subscribe to Addend Analytics

Thank you for your subscription