Skip to content
Register for a course

Techfura

All about Programming and Data Engineering

  • Blog
  • Register

Including File Properties and Metadata in a U-SQL Script

September 18, 2020March 2, 2022 Kennedy

U-SQL adds support for computed file property columns on EXTRACT statement. Sometimes customers would like to get information about the files that they process, such as the full URI path or information about size, creation or modification dates. Likewise customers would like to use that information to filter the set of files that being processed.

U-SQL adds support for computed file property columns on EXTRACT expressions. It provides the following capabilities:

  1. Provide built-in file property functions that expose specific file properties.
  2. Provide a way to assign the properties to “virtual” columns in the EXTRACT‘s schema. This is done using a “calculated column” syntax.
  3. Allow constant-foldable comparisons on these “virtual” columns in subsequent query predicates to limit the files being processed by the EXTRACT expression.

U-SQL provides the following file property functions in the EXTRACT expression’s computed column definition:

File Property FunctionReturn typeDescription
FILE.URI()stringReturns the file’s absolute URI or the full local path if used in a local run.
FILE.MODIFIED()DateTime?Returns the file’s last modification time stamp in UTC-0 timezone.
FILE.CREATED()DateTime?Returns the file’s creation time stamp in UTC-0 timezone.
FILE.LENGTH()long?Returns the file’s size in bytes.
Table. File properties function

N.B: The EXTRACT expression’s schema definition needs to contain at least one non-virtual, non-computed column.

The following U-SQL script extracts data, and file properties, as well as virtual column filename.

 @SampleData = 
   EXTRACT 
     EID int
   , FirstName string
   , LastName string
   , DoB Date
   , JoiningDate DateTime  
   , filename string // virtual column
   , uri = FILE.URI()
   , modified_date = FILE.MODIFIED()
   , created_date = FILE.CREATED()
   , file_sz = FILE.LENGTH()
   FROM "/Data/{filename}.csv"
   USING Extractors.Csv();

 @RefinedData =
   SELECT *
   FROM @sampleData
   WHERE modified_date > "2020-09-18";

 OUTPUT @RefinedData 
 TO "/output/SampleData_" + DateTime.Now.ToString("yyyyMMdd") + ".csv"
 USING Outputters.Csv(outputHeader : true);

By adding file properties and other metadata within the data results, we can easily troubleshoot issues (handling duplicate rows , or files that’re not processed – specially invalid JSON format files).

ADLA, USQLADLA, U_SQL File Metadata, U-SQL, USQL

Post navigation

SQL Server: STUFF() Function
SQL Server – Indexes

Categories

  • AI/ML (1)
    • RAG (1)
  • Apache Kafka (1)
    • Apache Kafka 101 (1)
  • Azure (23)
    • ADF (4)
    • ADLA (14)
      • USQL (14)
    • Azure Services (1)
    • Azure SQL (2)
    • Data and Analytics (1)
  • Azure Databricks (1)
  • Data Engineering (2)
    • DE Roadmap (1)
  • DevOps (1)
    • Azure DevOps CI/CD Pipeline (1)
  • ETL/ELT (1)
    • SSIS (1)
  • Excel (2)
    • Functions (1)
    • Split csv Files (1)
  • Linux (2)
    • Basic Linux Commands (1)
    • Shell Script (1)
  • MSSQL (4)
    • CDC (1)
    • TDE (3)
  • Programming (42)
    • C# (5)
    • C++ (4)
    • Fundamentals (1)
    • HTML/CSS (3)
    • JavaScript (6)
    • Python (1)
    • Spark (4)
      • Intro to Apache Spark (1)
      • PySpark (1)
      • Spark DataFrame Cheat Sheet (1)
      • SparkSQL (1)
    • TSQL (16)
      • Built-in Functions (3)
      • Data Conversion (1)
      • Errors (2)
      • Indexes (1)
      • Merge Statement (1)
      • Mostly Asked Questions (4)
      • Stored Procedure and UDF (4)
  • Reporting (2)
    • SSRS (2)

Archives

  • May 2025 (1)
  • February 2025 (1)
  • September 2024 (2)
  • May 2024 (7)
  • January 2024 (1)
  • June 2023 (2)
  • March 2022 (1)
  • April 2021 (8)
  • March 2021 (3)
  • January 2021 (3)
  • December 2020 (2)
  • October 2020 (1)
  • September 2020 (10)
  • July 2020 (1)
  • June 2020 (6)
  • May 2020 (30)
  • April 2020 (1)
  • April 2019 (1)

Recent Posts

  • Cloud Platforms Comparison Cheat Sheet
  • SQL – CRUD and CT
  • Building a Real-Time Data Pipeline with Python, Docker, Airflow, Spark, Kafka, and Cassandra
  • Mastering Personalized Conversations with RAG
  • Top 10 Core Concepts in every Programming Languages

Tags

ADLA Azure Big Data Cheat Sheet CTE Data Data Engineering Dataframe Data Pipeline Dataset Functions Linux MSSQL Procedures PySpark Python RDD Scala SLQ Delete Duplicates Spark SQL SQL SERVER SSIS SSIS Catalog error SSISDB Stored procedures STUFF() Stuff() Function TSQL U-SQL U-SQL Table UDF USQL U_SQL File Metadata

Traffic Count

traffic counter

Categories

  • AI/ML (1)
    • RAG (1)
  • Apache Kafka (1)
    • Apache Kafka 101 (1)
  • Azure (23)
    • ADF (4)
    • ADLA (14)
      • USQL (14)
    • Azure Services (1)
    • Azure SQL (2)
    • Data and Analytics (1)
  • Azure Databricks (1)
  • Data Engineering (2)
    • DE Roadmap (1)
  • DevOps (1)
    • Azure DevOps CI/CD Pipeline (1)
  • ETL/ELT (1)
    • SSIS (1)
  • Excel (2)
    • Functions (1)
    • Split csv Files (1)
  • Linux (2)
    • Basic Linux Commands (1)
    • Shell Script (1)
  • MSSQL (4)
    • CDC (1)
    • TDE (3)
  • Programming (42)
    • C# (5)
    • C++ (4)
    • Fundamentals (1)
    • HTML/CSS (3)
    • JavaScript (6)
    • Python (1)
    • Spark (4)
      • Intro to Apache Spark (1)
      • PySpark (1)
      • Spark DataFrame Cheat Sheet (1)
      • SparkSQL (1)
    • TSQL (16)
      • Built-in Functions (3)
      • Data Conversion (1)
      • Errors (2)
      • Indexes (1)
      • Merge Statement (1)
      • Mostly Asked Questions (4)
      • Stored Procedure and UDF (4)
  • Reporting (2)
    • SSRS (2)
© 2024 Techfura. All Rights Reserved
Proudly powered by WordPress | Education Hub by WEN Themes