DEEPAGI

Azure Data Factory Tutorial For Beginners

Mohammad Nurdin

Azure Data Factory is Azure’s cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF.

In this tutorial, we’ll create our very first ADF pipeline that simply copies data from a google sheet Azure Function & Google Drive API, stores the results in Azure SQL Server, and visualize it on Power BI.

Select SQL databases.

azure-sql-database

Click on Create SQL Database.

azure-sql-database

Select Resource group and create a new server. Once done click on OK. Then click on Configure database.

azure-sql-database

Select Basic and click on Apply.

azure-sql-database

Enter the Database name, select Locally-redundant backup storage and click on Review + create.

azure-sql-database

Click on Create.

azure-sql-database

Click on Go to resource.

azure-sql-database

Below are the overall features and details of our SQL Azure database created. Click on Set server Firewall.

azure-sql-database

 

Select Yes under Allow Azure services and resources to access this server and click on Save.

azure sql database

Download SQL Server Management Studio (SSMS) here, https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver15. Once installed, open SSMS and connect to the server.

azure data factory

Sign-in into Microsoft Azure. Once done click on OK.

azure data factory

Click on New Query and use this SQL query to create a new table. Once done click on Execute.

CREATE TABLE Users (
    Id int NOT NULL PRIMARY KEY IDENTITY(1,1),
    Name varchar(255),
    Phone varchar(255),
    Email varchar(255),
    LastContact varchar(255),
    Status varchar(255)
);

azure sql database

Open Google Sheet and create a simple datasheet like below.

google-sheet

Click on Share. Once you clicked, you will get the page like this so click on the copy link.

google-sheet

You will get a link in this format. Numeric text is your google sheet ID. (if you are reading data from someone else sheet then that sheet should be shared with your Google id).

https://docs.google.com/spreadsheets/d/1qkls44BLAmuCvFOO0P9IhYIXvjo3BGlsF4fq-g30f48/edit?usp=sharing

Enable the Google sheet API. Register first at https://cloud.google.com/. Once done go to console and make sure create a new project under resource manager before proceeding. Then go to https://console.cloud.google.com/apis/library/sheets.googleapis.com and click ENABLE.

google-sheet

Click on Credentials.

google-sheet

Click on Manage service accounts. Then click on CREATE SERVICE ACCOUNT. Input Service account ID and click on CREATE.

azure-data-factory

Grant this service account access to Owner and click on Continue.

google-sheet

Click on Done.

google-sheet

Create key for azure-data-factory@deepagi-e2f97.iam.gserviceaccount.com select JSON format, download the JSON file and store it in the same folder where our code will be saved. Copy service account email and paste it in Google Sheet.

google-sheet

Download and install Python 3.9.1 https://www.python.org/downloads. Once completed, install Jupiter notebook via command line.

pip3 install jupyterlab

Import the following packages using Pip

pip install google_spreadsheet
pip install google-auth-oauthlib
pip install gspread
pip install pandas

Create a new workspace, change directory into new workspace created and run this command to launch Jupiter notebook.

cd new_workpspace;
jupyter notebook

Click on New and select Python 3. Import library.

import gspread
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials

Define the scope of the application.

scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('deepagi-e2f97-3de355cac46e.json', scope)
client = gspread.authorize(creds)

Create the Sheet Instance

sheet = client.open('Azure Data Factory Tutorial')
sheet_instance = sheet.get_worksheet(0)

Get all records

records_data = sheet_instance.get_all_records()
print(records_data)

jupyter-notebook

Convert the Dictionary to the Dataframe

records_df = pd.DataFrame.from_dict(records_data)
records_df.head(10)

jupyter notebook

Connect to SQL using Python and this code to insert the data frame into the SQL database.

import pyodbc

df = records_df

server = 'deepagi.database.windows.net' 
database = 'deepagi' 
username = 'username' 
password = 'password' 
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
    cursor.execute("INSERT INTO Users (Name, Phone, Email, LastContact, Status) values(?,?,?,?,?)", row.Name, row.Phone, row.Email, row.LastContact, row.Status)
cnxn.commit()
cursor.close()

Run the Jupyter again and check the result.

azure sql database

Next, we will create an Azure Function. Make sure you upgrade your subscription account before using the Azure Function. For this service, we need to create a new resource group. Select on Resource groups.

resource group

Click on Add.

resource group

Enter deepagi-linux in Resource group and select (Asia) Southeast Asia.

resource group

Click on Create.

resource group

Select Function App from the Azure portal.

azure function

Click on Create Function App.

azure function

On the Basics page, use the function app settings as specified in the following details.

Subscription: <your_subscription_account>

Resource Group: <your_resource_group>

Function App Name: <your_function_app_name>

Runtime stack: Python

Version: 3.8

Region: Southeast Asia

Once done click on Review + create.

resource group

Click on Create.

resource group

Once done, click on Go to resource.

azure function app

Click on App keys.

azure data factory

Copy _master key for later usage.

azure data factory

Download and install Visual Studio Code. https://code.visualstudio.com. Install the Azure Functions extension.

visual studio code

After installation, select the Azure icon on the Activity bar. You should see an Azure Functions area in the Side Bar. From Azure: Functions, select the Create Function icon.

visual studio code

Click on Create new project.

visual studio code

Create a new folder or existing folder for your new project and click on Select.

visual studio code

Select Python.

visual studio code

Select python3.8.

visual studio code

Select HTTP trigger.

visual studio code

Enter extract.

visual studio code

Select Function.

visual studio code

Select Open in current window.

visual studio code

Copy credential JSON file downloaded from Google Cloud Platform into the new workspace.

visual studio code

Update __init__.py with this code.

import logging

import azure.functions as func

import gspread
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
import pyodbc
import json

def main(req: func.HttpRequest) -&gt; func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')
    
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
    creds = ServiceAccountCredentials.from_json_keyfile_name('deepagi-e2f97-3de355cac46e.json', scope)
    client = gspread.authorize(creds)
    
    sheet = client.open('Azure Data Factory Tutorial')
    sheet_instance = sheet.get_worksheet(0)
    
    records_data = sheet_instance.get_all_records()
    records_df = pd.DataFrame.from_dict(records_data)
    records_df.rename(columns = {'Date of last contact':'LastContact'}, inplace = True) 
    
    df = records_df

    server = 'deepagi.database.windows.net' 
    database = 'deepagi' 
    username = 'deepagi' 
    password = 'your_password' 
    cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
    cursor = cnxn.cursor()

    cursor.execute("TRUNCATE TABLE Users")

    for index, row in df.iterrows():
        cursor.execute(&quot;INSERT INTO Users (Name, Phone, Email, LastContact, Status) values(?,?,?,?,?)&quot;, row.Name, row.Phone, row.Email, row.LastContact, row.Status)
    cnxn.commit()
    cursor.close()

    return func.HttpResponse(&quot;This HTTP triggered function executed successfully.&quot;)

Update requirements.txt file with these packages.

azure-functions
gspread
pandas
pyodbc
oauth2client

Click on Attach to Python.

visual studio code

Download and install Postman. https://www.postman.com/. Once debugging mode ready, you may copy the link inside the terminal.

http://localhost:7071/api/extract

visual studio code

Open postman and run that link.

postman

Once successful, sign in to Azure.

visual studio code

Select your function app in Azure. Mine is Deepagi.

visual studio code

Click on Deploy.

visual studio code

Click on Functions > Functions.

azure function app

You can see a new function appeared after deployment. Click on extract function.

azure function

Click on Code + Test.

azure function app

 

Click on Test/Run and Run to test the API.

azure function app

Make sure we truncate the table Users and run one more time the extract function.

TRUNCATE TABLE dbo.Users

Click on create a resource and search for Data Factory then click on create.

azure data factory

Provide a name for your data factory, select the resource group, and git configuration where you want to deploy your data factory and the version.

azure data factory

azure data factory

Click on Review + create. Once validation passed, click on Create.

azure-data-factory

Click on Go to resource.

azure-data-factory

Click on Author & Monitor.

azure-data-factory

Click on Log into Github.

azure-data-factory

Click on Authorize AzureDataFactory.

azure-data-factory

The Azure Data Factory completely had a different portal, as shown in the following figure. Click on Pencil symbol.

 

azure data factory

Select + > Pipeline.

azure data factory

Drag and drop Azure Function. Name it as extract.

azure data factory

Click on Settings then click on New.

azure data factory

Make sure you fill up all the details and click on Create once done filled up.

azure data factory

Enter extract in Function name and GET under Method. Once done, click on Debug for testing.

azure data factory

Once successful, click on Add trigger > New/Edit.

azure data factory

Select New.

azure data factory

Enter all the details and click on OK.

azure data factory

Click on Save. Then click on Publish.

azure data factory

Click on Ok.

azure data factory

Download and install Power BI. https://powerbi.microsoft.com/en-us/downloads/. Select Get data > More.

power bi

Select Azure > Azure SQL Database.

power bi

Enter these both details for Server and Database.

power bi

Select Database. Enter your Username and Password.

power bi

Select table Users and click on Load.

power bi

Once you add a data source, it is presented on the right sidebar. Since we can’t publish power bi using personal email, we skip this part.

power bi

I think that’s all for the basic azure data factory. Let me know if you have any feedback about this tutorial.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top