April 15, 2024

Analytic functions

Analytic functions - TSQL

Analytic Functions - SQL Server (T-SQL)

Analytic functions process a group of rows to compute aggregate values while preserving detailed insights. Unlike aggregate functions, which reduce groups to a single row, analytic functions can output multiple rows for each group. They are ideal for calculating metrics like moving averages, running totals, percentage distributions, or identifying top-N results within a group.

You may refer the link https://learn.microsoft.com/en-us/sql/t-sql/functions/analytic-functions-transact-sql?view=sql-server-ver16.

Different types of analytic functions in SQL Server:

  • CUME_DIST:
    • Stands for cumulative distribution
    • Calculates the percentage of rows with values less than or equal to the current row's value.
    • Formula: text{CUME_DIST} = {Rank}/{Total Rows}
    • Values range from 0 to 1, inclusive.
  • PERCENT_RANK:
    • Represents the percentage of rows with values less than the current row's value.
    • Formula: text{PERCENT_RANK} = (Rank - 1)/(Total Rows - 1)
    • Values range from 0 to 1, but the highest-ranked value is excluded.
  • Source Table CUME DIST & PERCENT RANK
  • PERCENTILE_CONT:
    • The PERCENTILE_CONT function calculates a percentile based on a continuous distribution of values. It is used to find the percentile value for a specified percentage within a group of rows. It may return a value that doesn't exist in the dataset, as it interpolates between values.
    • Sort the values in ascending order.
    • Formula: Position = (Percentile \times (Total Rows - 1)) + 1
    • If the position is an integer, return the value at that position and if the position is not an integer, interpolate between the two closest values.
  • PERCENTILE_DISC:
    • The PERCENTILE_DISC function calculates a percentile based on a discrete distribution of values. It returns the value at the specified percentile, which must exist in the dataset.
    • Sort the values in ascending order.
    • Formula: Position = (Percentile \times (Total Rows - 1)) + 1
    • Return the value at that position. For example, if the salaries are {30.00, 40.00, 50.00}, the median would be 40.00.
  • Source Table PERCENTILE CONT & DISC
  • FIRST_VALUE:
    • Returns the first value in an ordered partition.
    • Formula: FIRST_VALUE(column_name) OVER (PARTITION BY column_name ORDER BY column_name)
    • Useful for retrieving the first occurrence of a value within a group.
  • LAST_VALUE:
    • Returns the last value in an ordered partition.
    • Formula: LAST_VALUE(column_name) OVER (PARTITION BY column_name ORDER BY column_name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    • Useful for retrieving the last occurrence of a value within a group.
  • Source Table First & Last Value
  • LEAD:
    • Returns the value from a subsequent row in the result set.
    • Formula: LEAD(column_name, offset, default_value) OVER (PARTITION BY column_name ORDER BY column_name)
    • Useful for comparing the current row with a subsequent row.
  • LAG:
    • Returns the value from a previous row in the result set.
    • Formula: LAG(column_name, offset, default_value) OVER (PARTITION BY column_name ORDER BY column_name)
    • Useful for comparing the current row with a previous row.
  • Source Table LEAD & LAG

No comments:

Post a Comment