The U-SQL Catalog is the way U-SQL organizes data and code for re-use and sharing.
Catalog organization
ADLA Account
|
|--Catalog
|
|--Databases
|
|--Schemas
| |--Tables
| |--Views
| |--Table-valued functions
| |--Procedures
| |--Assemblies
| |--Credentials
| |--External Data Sources
|
|--Assemblies
- Every ADLA account has a single U-SQL catalog. The catalog cannot be deleted.
- Each U-SQL catalog contains one or more U-SQL databases.
- Every catalog has a
master
database that cannot be deleted. - Each U-SQL database can contain code and data.
- Data is stored in the form of U-SQL tables.
- U-SQL code is stored in a database in the form of views, table-valued functions, procedures.
- .NET code (.NET assemblies) is stored in the database in the form of U-SQL assemblies.
Creating a database
To create a database is simple.
CREATE DATABASE MyDB;
This command will fail if the database already exists. Often, it will be the case that you want to create a database only if it does not exist. In this case the following command is used.
CREATE DATABASE IF NOT EXISTS MyDB;
Deleting a Database
DROP DATABASE MyDB;
DROP DATABASE IF EXISTS MyDB;
Reusing U-SQL code with a table-valued function (TVF)
Many of the scripts in the tutorial have required reading from the searchlog and the code to read from the searchlog is shown below
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM "/SearchLog.tsv"
USING Extractors.Tsv();
Now instead of writing this code over and over in every script we will store the code as a TVF in a database. The name of this function is going to be MyDB.dbo.ExtractSearchLog
. The dbo
part of that name is the “schema”.
CREATE FUNCTION MyDB.dbo.ExtractSearchLog()
RETURNS @rows
AS BEGIN
@rows =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM "/SearchLog.tsv"
USING Extractors.Tsv();
RETURN;
END;
Notice that CREATE FUNCTION indicates it will return a RowSet call @rows. Then in the @rows RowSet is defined.
Now that the TVF is created, we can call it this way.
@searchlog = MyDB.dbo.ExtractSearchLog();
U-SQL Tables
U-SQL tables offer a way to store data in form that preserves it schema and organize data to high-performance queries which is very important as data sizes grow.
Creating a table from a RowSet
If you have a RowSet creating a table from it is very simple. The one thing you must remember is that a table must have an index defined. The table gets its schema from the schema of the rowset.
@customers =
SELECT * FROM
(VALUES
("Contoso", 123 ),
("Woodgrove", 456 )
) AS D( Customer, Id );
DROP TABLE IF EXISTS MyDB.dbo.Customers;
CREATE TABLE MyDB.dbo.Customers
(
INDEX idx
CLUSTERED(customer ASC)
DISTRIBUTED BY HASH(customer)
) AS SELECT * FROM @customers;
Creating an empty table and filling it latter
If you need don’t have the data available at the time of table creation. You can create an empty table as shown below. Notice that this time the schema has to be specified.
DROP TABLE IF EXISTS MyDB.dbo.Customers;
CREATE TABLE MyDB.dbo.Customers
(
Customer string,
Id int,
INDEX idx
CLUSTERED(Customer ASC)
DISTRIBUTED BY HASH(Customer)
);
Then separately, you can fill the table.
@customers =
SELECT * FROM
(VALUES
("Contoso", 123 ),
("Woodgrove", 456 )
) AS D( Customer, Id );
INSERT INTO MyDB.dbo.Customers
SELECT * FROM @customers;
Reading from a table
@rs =
SELECT *
FROM MyDB.dbo.Customers;
Assemblies
Step 1: create a .NET assembly
Create a .NET assembly with a filename of OrdersLib.dll
with this code. In this case we will simply crate a single static string in the assembly that we will reuse in a U-SQL script.
namespace OrdersLib
{
public static class Helpers
{
public static string CustPrefix = "CUST_";
}
}
Step 2 Upload the assembly into the default ADLS store of the ADLA account
For example place the assembly in this location
"/DLLs/OrdersLib.dll"
Step 3: Register the assembly in a U-SQL database
For example place the assembly in this location
CREATE ASSEMBLY MyDB.OrdersLibAsm
FROM @"/DLLs/OrdersLib.dll";
The identifier OrdersLibAsm
is the name the U-SQL catalog uses for the assembly. It is independent of the assembly filename "OrdersLib.dll"
.
The dll has been copied to the U-SQL database called MyDB. You can now safely delete the file from "/DLLs/OrdersLib.dll"
.
Step 4: Run a script that uses the assembly
The REFERENCE ASSEMBLY
statement makes the code from the assembly named OrdersLibAsm
to the script.
REFERENCE ASSEMBLY MyDB.OrdersLibAsm;
@customers =
SELECT * FROM
(VALUES
("Contoso", 123 ),
("Woodgrove", 456 )
) AS D( Customer, Id );
@customers =
SELECT
(OrdersLib.Helpers.CustPrefix + Customer) AS Customer,
Id
FROM @customers;
Using assemblies
A U-SQL Assembly let’s you share code.
Although a U-SQL assembly is often used to share .NET code – specifically a single .NET assembly. As you’ll see in this section. A U-SQL assembly can do much more than that.
System assemblies
Some assemblies are part of the .NET Base Class Library. They aren’t in the U-SQL catalog, but we still need a way to make them available to a U-SQL script.
REFERENCE SYSTEM ASSEMBLY [System.Xml];
Notice that no database name is provided.
User-provided assemblies
Referencing assemblies in the U-SQL Catalog
CREATE ASSEMBLY MyDB.OrdersLibAsm
FROM @"/DLLs/OrdersLib.dll";
Below is the standard syntax for referencing an assembly that is in the U-SQL catalog.
REFERENCE SYSTEM ASSEMBLY [DBName].[AssemblyName];
Simplifying code with the USING statement
Similar, to C#’s using statement, U-SQL lets you provide an alias for a namespace, this improve the readability of your code.
USING Exc = System.Exceptions;
Referencing assemblies in the U-SQL Catalog of other ADLA accounts
If the assembly you want is in another ADLA account, and you have access to that assembly you can reference it as shown below.
REFERENCE SYSTEM ASSEMBLY [Account][DBName].[AssemblyName];