Set operations are a way of merging rowsets together based on set theoretic operations such as union (UNION). intersection (INTERSECT), complement (EXCEPT).

Sample data

Let’s define two RowSets: @a and @b. Notice that both RowSets have duplicate rows.

@a  =
    SELECT * FROM
        (VALUES
            (1,    "Smith"),
            (1,    "Smith"),
            (2,    "Brown"),
            (3,    "Case")
        ) AS D( DepID, DepName );

@b  =
    SELECT * FROM
        (VALUES
            (1,    "Smith"),
            (1,    "Smith"),
            (1,    "Smith"),
            (2,    "Brown"),
            (4,    "Dey"),
            (4,    "Dey")
        ) AS D( DepID, DepName );

UNION

UNION combines two rowsets.

UNION ALL

As you can see UNION ALL clearly leaves in duplicate rows.

@union_all = 
    SELECT * FROM @a
    UNION ALL
    SELECT * FROM @b;

DepID	Name
1	Smith
1	Smith
2	Brown
3	Case
1	Smith
1	Smith
1	Smith
2	Brown
4	Dey
4	Dey

UNION DISTINCT

UNION DISTINCT discards duplicate rows.

@union_distinct =
    SELECT * FROM @a
    UNION DISTINCT
    SELECT * FROM @b;

DepID	Name
1	Smith
2	Brown
3	Case
4	Dey

Schema requirements

UNION by default require that the RowSets have the same schema.

Each column must have the same name and data type
The columns must appear in the same order in the rowset schema

Finding Common Rows with INTERSECT

Sometimes, we only care about the rows both rowsets have in common. We use the INTERSECT operator to accomplish this. INTERSECT ALL preserves duplicates while INTERSECT removes duplicates.

INTERSECT ALL

@intersect_all =
    SELECT * FROM @a
    INTERSECT ALL
    SELECT * FROM @b;

@intersect_all

DepID	name
1	Smith
1	Smith
2	Brown

INTERSECT DISTINCT

@intersect_distinct =
    SELECT * FROM @a
    INTERSECT DISTINCT
    SELECT * FROM @b;

@intersect_distinct

DepID	name
1	Smith
2	Brown

Finding Rows That Are NOT in the Other RowSet with EXCEPT

The EXCEPT operator returns all the rows in the left RowSet that are not in the right RowSet.

EXCEPT ALL

@a_except_b_all =
   SELECT * FROM @a
   EXCEPT ALL
   SELECT * FROM @b;

@a_except_b_all

DepID	name
3	Case

@b_except_a_all =
   SELECT * FROM @b
   EXCEPT ALL
   SELECT * FROM @a;

@b_except_a_all

DepID	name
1	Smith
4	Dey
4	Dey

EXCEPT DISTINCT

@a_except_b_distinct =
   SELECT * FROM @a
   EXCEPT DISTINCT
   SELECT * FROM @b;

@a_except_b_distinct

DepID	name
3	Case

@b_except_a_distinct =
   SELECT * FROM @b
   EXCEPT DISTINCT
   SELECT * FROM @a;

@b_except_a_distinct

DepID	name
4	Dey

OUTER UNION

By default UNION requires both RowSets to have matching schemas. OUTER UNION allows the schemas to be different. If one RowSet is missing a column that the other has, that row will be included in the result with a default value for the missing columns.

NOTE: OUTER UNION only supports ALL. It does not support DISTINCT.

The following script will union the two rowsets @left and @right with the partially overlapping schema on columns A and K while filling in null into the “missing cells” of column C and 0 as the default value for type int for column B.

@left =
    SELECT *
    FROM (VALUES ( 1, "x", (int?) 50 ),
                 ( 1, "y", (int?) 60 )
         ) AS L(K, A, C);

@right =
    SELECT *
    FROM (VALUES ( 5, "x", 1 ),
                 ( 6, "x", 2 ),
                 (10, "y", 3 )
         ) AS R(B, A, K);

@res =
    SELECT * FROM @left
    OUTER UNION BY NAME ON (*)
    SELECT * FROM @right;

The result is:

K	A	C	B
1	“x”	50	0
1	“x”		5
1	“y”	60	0
2	“x”		6
3	“y”		10

Joins

A JOIN is a way of combining rowsets together based on key.

Sample data

@departments =
    SELECT * FROM
    (VALUES
    ("31", "Sales"),
    ("33", "Engineering"),
    ("34", "Clerical"),
    ("35", "Marketing")
    ) AS D( DepID, DepName );

@employees =
SELECT * FROM
    (VALUES
    ("31", "Rafferty"),
    ("33", "Jones"),
    ("33", "Heisenberg"),
    ("34", "Robinson"),
    ("34", "Smith"),
    ((string)null, "Williams")
    ) AS D( DepID, EmpName );

CROSS JOIN

@cross_join =
    SELECT
        @departments.DepID AS DepID_Dep,
        @employees.DepID AS DepID_Emp,
        @employees.EmpName, 
        @departments.DepName
    FROM @employees CROSS JOIN @departments;

DepIDDep	DepIDEmp	EmpName	DepName
31	31	Rafferty	Sales
31	31	Rafferty	Sales

|31|33|Jones|Sales| |31|33|Heisenberg|Sales| |31|34|Robinson|Sales| |31|34|Smith|Sales| |31|ERR|Williams|Sales| |33|31|Rafferty|Engineering| |33|33|Jones|Engineering| |33|33|Heisenberg|Engineering| |33|34|Robinson|Engineering| |33|34|Smith|Engineering| |33|ERR|Williams|Engineering| |34|31|Rafferty|Clerical| |34|33|Jones|Clerical| |34|33|Heisenberg|Clerical| |34|34|Robinson|Clerical| |34|34|Smith|Clerical| |34|ERR|Williams|Clerical| |35|31|Rafferty|Marketing| |35|33|Jones|Marketing| |35|33|Heisenberg|Marketing| |35|34|Robinson|Marketing| |35|34|Smith|Marketing| |35|NULL|Williams|Marketing|

CROSS JOIN

@cross_join =
    SELECT
        @departments.DepID AS DepID_Dep,
        @employees.DepID AS DepID_Emp,
        @employees.EmpName,
        @departments.DepName
     FROM 
        @employees CROSS JOIN @departments;

DepID_Dep	DepID_Emp	EmpName	DepName
31	31	Rafferty	Sales
31	33	Jones	Sales
31	33	Heisenberg	Sales
31	34	Robinson	Sales
31	34	Smith	Sales
31	NULL	Williams	Sales
33	31	Rafferty	Engineering
33	33	Jones	Engineering
33	33	Heisenberg	Engineering
33	34	Robinson	Engineering
33	34	Smith	Engineering
33	NULL	Williams	Engineering
34	31	Rafferty	Clerical
34	33	Jones	Clerical
34	33	Heisenberg	Clerical
34	34	Robinson	Clerical
34	34	Smith	Clerical
34	NULL	Williams	Clerical
35	31	Rafferty	Marketing
35	33	Jones	Marketing
35	33	Heisenberg	Marketing
35	34	Robinson	Marketing
35	34	Smith	Marketing
35	NULL	Williams	Marketing

SEMIJOIN

SEMIJOIN is a way to filter a RowSet using another RowSet.

A simple example is: “Find all the employees in this RowSet A, where the employee appears also in RowSet B”.

There are two variants:

LEFT SEMIJOIN -> Give only those rows in the left rowset that have a matching row in the right rowset.
RIGHT SEMIJOIN -> Give only those rows in the right rowset that have a matching row in the left rowset.

NOTE: If you leave out LEFT or RIGHT, and instead simply write SEMIJOIN then what you get is LEFT SEMIJOIN. Do not leave out LEFT or RIGHT always explicitly it.

Find all employees that are in valid departments

@left_semijoin1 =
    SELECT
        @employees.DepID,    
        @employees.EmpName
    FROM @employees
    LEFT SEMIJOIN @departments
        ON @employees.DepID == @departments.DepID;

DepID	EmpName
31	Rafferty
33	Jones
33	Heisenberg
34	Robinson
34	Smith

Find all departments that has an employee listed in the employee RowSet.

@left_semijoin2 =
    SELECT 
        @departments.DepID,
        @departments.DepName
    FROM @departments
    LEFT SEMIJOIN @employees
    ON @departments.DepID == @employees.DepID;

DepID	DepName
31	Sales
33	Engineering
34	Clerical

ANTISEMIJOIN

An ANTISEMIJOIN is the opposite of SEMIJOIN

There are two variants:

LEFT ANTISEMIJOIN -> Give only those rows in the left RowSet that DO NOT have a matching row in the right rowset.
RIGHT ANTISEMIJOIN -> Give only those rows in the right RowSet that DO NOT have a matching row in the left RowSet.

Find all departments that don’t have an employee listed in the employees RowSet.

@left_antisemijoin =
    SELECT 
        @departments.DepID,
        @departments.DepName
    FROM @departments
    LEFT ANTISEMIJOIN @employees
        ON @departments.DepID == @employees.DepID;

DepID	DepName
35	Marketing

Complex Types

Nullability

Complex types are reference types. You MUST check if they are null first before using them.

Maximum size of a complex value

There is no specific limit inherent to Complex Types

However, rowsets must still fit within the limits defined by U-SQL (currently no row can be bigger than 4MB)

Persistence

U-SQL Tables

Complex Types can be stored in and read from U-SQL Tables without any additional effort.

Files

To read/write Complex Types with files, you must use your own Custom Outputter & Extractor.

ARRAYs

Overview

An ordered list of values (all of the same type)
Immutable
Single-Dimension only

Creating arrays initialized with data

You can create a SQlArray using the SqlArray constructor or use the SqlArray.Create static method

The SqlArray.Create static method can also be used to create arrays.

@output =
    SELECT * 
    FROM
    ( VALUES
        ( "West Virginia", new SqlArray<string>( new string [] { "Charleston", "Huntington", "Parkersburg", "Morgantown", "Wheeling"} ) ),
        ( "Wisconsin", SqlArray.Create( new string [] { "Milwaukee", "Madison", "Green Bay", "Kenosha", "Racine"} ) )

    ) AS T(State, Cities);

Creating Empty arrays

new SqlArray<string> ( )

new SqlArray<string> { }

Creating an ARRAY from an .NET Array

DECLARE @wv_cities = new string [] { "Charleston", "Huntington", "Parkersburg", "Morgantown", "Wheeling"};
DECLARE @wi_cities = new string [] { "Milwaukee", "Madison", "Green Bay", "Kenosha", "Racine" };

@cities =
    SELECT * 
    FROM
    ( VALUES
        ( "West Virginia", new SqlArray<string>( @wv_cities ) ),
        ( "Wisconsin", SqlArray.Create( @wi_cities ) )

    ) AS T(State, Cities);

Sample data

@cities =
    SELECT * 
    FROM
    ( VALUES
        ( "Vermont","Burlington,Essex,South,Burlington,Colchester,Rutland" ),
        ( "Virginia","Virginia Beach,Norfolk,Chesapeake,Richmond,Newport News" ),
        ( "Washington","Seattle,Spokane,Tacoma,Vancouver,Bellevue" ),
        ( "West Virginia", "Charleston,Huntington,Parkersburg,Morgantown,Wheeling"),
        ( "Wisconsin","Milwaukee,Madison,Green Bay,Kenosha,Racine"),
        ( "Wyoming","Cheyenne,Casper,Laramie,Gillette,Rock Springs" )
    ) AS T(State, Cities);

@cities =
    SELECT 
        State,
        SqlArray.Create( Cities.Split(',') ) AS Cities 
    FROM @cities;

State string	Cities SqlArray
Vermont	SqlArray{ “Burlington”, “Essex”, “South”, “Burlington”, “Colchester”, “Rutland” }
Virginia	SqlArray{ “Virginia Beach”, “Norfolk”, “Chesapeake”, “Richmond”, “Newport News” }
Washington	SqlArray{ “Seattle”, “Spokane”, “Tacoma”, “Vancouver”, “Bellevue” }
West Virginia	SqlArray{ “Charleston”, “Huntington”, “Parkersburg”, “Morgantown”, “Wheeling” }
Wisconsin	SqlArray{ “Milwaukee”, “Madison”, “Green Bay”, “Kenosha”, “Racine” }
Wyoming	SqlArray{ “Cheyenne”, “Casper”, “Laramie”, “Gillette”, “Rock Springs” }

Array Indexing

Use the array indexing operator [n] where n is a long. The first index is 0 – just like .NET


@output =
    SELECT
        State,
        Cities[0] AS FirstCity
    FROM @output;

State	FirstCity
Vermont	Burlington
Virginia	Virginia Beach
Washington	Seattle
West Virginia	Charleston
Wisconsin	Milwaukee
Wyoming	Cheyenne

Removing members

@output =
    SELECT
        State ,
        SqlArray.Create( Cities.Where( c=>c.StartsWith("C") ) ) AS Cities
    FROM @output;

State	Cities
Vermont	SqlArray{ “Colchester” }
Virginia	SqlArray{ “Chesapeake” }
Washington	SqlArray{ }
West Virginia	SqlArray{ “Charleston” }
Wisconsin	SqlArray{ }
Wyoming	SqlArray{ “Cheyenne”, “Casper” }

Counting members


@output =
    SELECT
        State ,
        SqlArray.Create( Cities.Where( c=>c.StartsWith("C") ) ) AS Cities
    FROM @output;

@output =
    SELECT
        State ,
        Cities ,
        Cities.Count AS NumCities
    FROM @output;

State	Cities	NumCities
Vermont	SqlArray{ “Colchester” }	1
Virginia	SqlArray{ “Chesapeake” }	1
Washington	SqlArray{ }	0
West Virginia	SqlArray{ “Charleston” }	1
Wisconsin	SqlArray{ }	0
Wyoming	SqlArray{ “Cheyenne”, “Casper” }	2

MAPs

Overview

Creating a SqlMap from constant values

The following snippet creates two rows and each has a SqlMap where the key (K) is a string and the value is also a string.

@projectmembers = 
    SELECT *
    FROM
    ( VALUES
        ( "Website", new SqlMap<string,string> { 
                {"Mallory", "PM"}, 
                {"Bob", "Dev"} ,
                {"Alice", "Dev"} ,
                {"Stan", "Dev"} ,
                {"Chris", "UX"} ,
             } 
        ),
        ( "DB", new SqlMap<string,string> { 
                {"Ted", "Test"}, 
                {"Joe", "Dev"} ,
                {"Chuck", "Dev"} 
             } 
        )
)
AS T(Project, Members);

Project	Members
Website	SqlMap{ Alice=Dev; Bob=Dev; Chris=UX; Mallory=PM; Stan=Dev }
DB	SqlMap{ Chuck=Dev; Joe=Dev; Ted=Test }

Creating Empty maps

new SqlMap<string,string> ( )

new SqlMap<string,string> { }

Maps from Maps: Removing members based on keys

@output =
    SELECT Project,
           new SqlMap<string,string>(Members.Where(kv => kv.Key != "Mallory")) AS Members
    FROM @projectmembers;

Project	Members
Website	SqlMap{ Alice=Dev; Bob=Dev; Chris=UX; Stan=Dev }
DB	SqlMap{ Chuck=Dev; Joe=Dev; Ted=Test }

Alternatively you can use the SqlMap.Create static method instead

@output =
    SELECT Project,
           SqlMap.Create(Members.Where(kv => kv.Key != "Mallory")) AS Members
    FROM @projectmembers;

Project	Members
Website	SqlMap{ Alice=Dev; Bob=Dev; Chris=UX; Stan=Dev }
DB	SqlMap{ Chuck=Dev; Joe=Dev; Ted=Test }

Combining rows into maps with MAP_AGG

@projectmembers = 
    SELECT *
    FROM
    ( VALUES
        ( "Website","Mallory", "PM" ), 
        ( "Website","Bob", "Dev" ),
        ( "Website","Alice", "Dev" ) ,
        ( "Website","Stan", "Dev" ) ,
        ( "Website","Chris", "UX" ) ,
        ( "DB", "Ted", "Test" ), 
        ( "DB", "Joe", "Dev" ) ,
        ( "DB", "Chuck", "Dev" ) 
    )
AS T(Project, Employee, Role);

@projectmembers =
    SELECT Project,
           MAP_AGG<string, string>(Employee, Role) AS Members
    FROM @projectmembers_raw
    GROUP BY Project;

Project	Members
DB	SqlMap{ Chuck=Dev; Joe=Dev; Ted=Test }
Website	SqlMap{ Alice=Dev; Bob=Dev; Chris=UX; Mallory=PM; Stan=Dev }

Removing members based on values

@output =
    SELECT Project,
           SqlMap.Create(Members.Where(kv => kv.Value != "Dev")) AS Members
    FROM @projectmembers;

Project	Members
Website	SqlMap{ Chris=UX; Mallory=PM }
DB	SqlMap{ Ted=Test }

counting members

@output =
    SELECT Project,
           Members,
           Members.Count AS Count
FROM @projectmembers;

Project	Members	Count
Website	SqlMap{ Alice=Dev; Bob=Dev; Chris=UX; Mallory=PM; Stan=Dev }	5
DB	SqlMap{ Chuck=Dev; Joe=Dev; Ted=Test }	3

Retrieving values

@output =
    SELECT 
        Project,
        Members["Mallory"] AS MalloryRole
   FROM @projectmembers;

Note that if the key is missing then the default value for the type is returned

Project	MalloryRole
Website	PM
DB	null

Checking if a key exists

@output =
    SELECT 
        Project,
        Members.ContainsKey("Mallory") AS ContainsMallory
FROM @projectmembers;

Project	ContainsMallory
Website	True
DB	False