Great, but what is Azure Data Lake?
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.
According to Microsoft, customers can provision Azure Data Lakes to store an unlimited amount of structured, semi-structured or unstructured data from a variety of sources. The service does not impose limits on account sizes, file sizes, or the amount of data that can be stored in a data lake.
Pricing for Azure Data Lake is dependent upon numerous variables, including storage capacity, the number of analytics units (AUs) per minute. As of this writing, the Azure Data Lake Store service is priced at $0.039 per GB per month for pay as you go, with capacity-based discounts up to 33% for monthly commitments. The Azure Pricing Calculator can help customers determine exact data lake costs.
In this tutorial, you use the Azure portal to create a data factory. Then, you use the Copy Data tool to create a pipeline that copies data from CSV file data to a SQL database.
Prerequisites
- Azure storage account: Use Blob storage as the source data store. If you don’t have an Azure storage account, see the instructions in Create a storage account.
- Azure SQL Database: Use a SQL database as the sink data store. If you don’t have a SQL database, see the instructions in Create a SQL database.
- The Sales Jan 2009 file contains some sales transactions. You can download CSV data here.
Prepare your Blob storage and your SQL database tables by performing these steps.
- Create a container named adfv2 and upload the CSV file to the container. You can use various tools to perform these tasks, such as Azure Storage Explorer.
Use the following SQL script to create a database named dbo.adl_db in your SQL database:
CREATE DATABASE dbo.adl_db
Also, create a sink SQL table:
CREATE TABLE dl_sales (
Transaction_date DATETIME,
Product TEXT,
Price DECIMAL,
Payment_Type TEXT,
Name TEXT,
City TEXT,
State TEXT,
Country TEXT,
Account_Created DATETIME,
Last_Login DATETIME,
Latitude DECIMAL,
Longitude DECIMAL,
)
Let’s create a data factory.
Azure Data Factory, a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
This platform for these kinds of scenarios below. It is a cloud-based data integration service that allows you to create data-driven workflows in the cloud that orchestrate and automate data movement and data transformation.
Well Done!! Verify that the data is inserted into the dbo.dl_sales table in your SQL database.
Conclusion
Azure Data Lake is an important new part of Microsoft’s ambitious cloud offering. With Data Lake, Microsoft provides service to store and analyze data of any size at an affordable cost. In related posts, we will learn more about Data Lake Store, Data Lake Analytics, and HDInsight.
Original post at https://bit.ly/2NAA1BF