Overview of Azure Data Factory Components
My First Blog on Azure
Azure Data Factory Components
Pipelines are the things you execute or run in Azure Data Factory, similar to packages in SQL Server Integration Services (SSIS). This is where you define your workflow: what you want to do and in which order. For example, a pipeline can first copy data from an on-premises data center to Azure Data Lake Storage, and then transform the data from Azure Data Lake Storage into Azure Synapse Analytics (previously Azure SQL Data Warehouse).
When you open a pipeline, you will see the pipeline authoring interface. On the left side, you will see a list of all the activities you can add to the pipeline. On the right side, you will see the design canvas with the properties panel underneath it.
Activities are the individual steps inside a pipeline, where each activity performs a single task. You can chain activities or run them in parallel. Activities can either control the flow inside a pipeline, move or transform data, or perform external tasks using services outside of Azure Data Factory.
You add an activity to a pipeline by dragging it onto the design canvas. When you click on an activity, it will be highlighted, and you will see the activity properties in the properties panel. These properties will be different for each type of activity.
Data Flows are a special type of activity for creating visual data transformations without having to write any code. There are two types of data flows: mapping and wrangling.
If you are moving or transforming data, you need to specify the format and location of the input and output data. Datasets are like named views that represent a database table, a single file, or a folder.
Linked Services are like connection strings. They define the connection information for data sources and services, as well as how to authenticate to them.
Integration runtimes specify the infrastructure to run activities on. You can create three types of integration runtimes: Azure, Self-Hosted, and Azure-SSIS. Azure integration runtimes use infrastructure and hardware managed by Microsoft. Self-Hosted integration runtimes use hardware and infrastructure managed by you, so you can execute activities on your local servers and data centers. Azure-SSIS integration runtimes are clusters of Azure virtual machines running the SQL Server Integration (SSIS) engine, used for executing SSIS packages in Azure Data Factory.
Triggers determine when to execute a pipeline. You can execute a pipeline on a wall-clock schedule, in a periodic interval, or when an event happens.
Finally, if you don’t want to create all your pipelines from scratch, you can use the pre-defined templates by Microsoft, or create custom templates.