How do you calculate data completeness for multiple tables based on null values within columns?

Jordan Davis

The query below calculates what we need but for only one specific column. How can we do this for all the columns within that table, without having to duplicate the case statement multiple times. This needs to be done for hundreds of tables, so duplicating the case statement is not ideal.

 Select SUM(cast(case when column is null then 0 else 1 end as float))/count(*) from [Table]

So the output would be something like

Column Name: Data completeness

Customer Name: 88%

Lukasz Szozda

Solution by Jens Suessmeyer from Finding the percentage of NULL values for each column in a table

SET NOCOUNT ON
DECLARE @Statement NVARCHAR(MAX) = ''
DECLARE @Statement2 NVARCHAR(MAX) = ''
DECLARE @FinalStatement NVARCHAR(MAX) = ''

DECLARE @TABLE_SCHEMA SYSNAME = <SCHEMA_NAME>
DECLARE @TABLE_NAME SYSNAME = <TABLE_NAME>

SELECT
        @Statement = @Statement + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) AS ' + COLUMN_NAME + ',' + CHAR(13) ,
        @Statement2 = @Statement2 + COLUMN_NAME + '*100 / OverallCount AS ' + COLUMN_NAME + ',' + CHAR(13)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TABLE_NAME 
    AND TABLE_SCHEMA = @TABLE_SCHEMA

IF @@ROWCOUNT = 0
    RAISERROR('TABLE OR VIEW with schema "%s" and name "%s" does not exists or you do not have appropriate permissions.',16,1, @TABLE_SCHEMA, @TABLE_NAME)
ELSE
BEGIN
    SELECT @FinalStatement =
            'SELECT ' + LEFT(@Statement2, LEN(@Statement2) -2) + ' FROM (SELECT ' + LEFT(@Statement, LEN(@Statement) -2) +
            ', COUNT(*) AS OverallCount FROM ' + @TABLE_SCHEMA + '.' + @TABLE_NAME + ') SubQuery'
    EXEC(@FinalStatement)
END

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to calculate the sum of a subset within a Dataframe based on multiple columns

How do you calculate a single Median for multiple columns in a dataframe?

How do you concatenate multiple columns in a DataFrame into a another column when some values are null?

How do you replace certain values from row to row in python pandas based on data in other columns?

How do you calculate multiple objects values in a list

How do you combine columns based on shared values in another column?

How to alter values in multiple columns based on values in other columns within a loop

How to calculate multiple columns of data?

How do you find the number of unique values that span multiple columns?

how do you aggregate data based on several columns?

How to calculate values based on different columns in BigQuery?

How do I fill in values of a column based on multiple columns in R?

How do I query based on values in multiple columns

How to assign values on multiple columns of a pandas data frame based on condition

How to extract data in rows based on multiple-columns values?

how do you convert data frame to json with multiple columns in R

How do I replace null values of multiple columns with values from multiple different columns

How to replace values in a subset of data, based on values of other columns, within a Pandas method chaining expression

How to Calculate the Average Variation of a Value Based on Grouping by Multiple Columns, Using the Difference of Extreme Time Values in SQL?

How do I create multiple flag columns based on multiple columns with NA Values using ifelse?

Join 2 tables on multiple columns and pull values based on each join

Pandas: filling null values based on values in multiple other columns

How do you calculate time complexity for a function that has function within?

Sort data frame based on the prefix of the values within the columns

How to replace values of multiple columns with other columns within the same dataframe?

How do you join two tables based on substring matches in Azure Data Explorer / Kusto?

How to remove duplicate values based on multiple columns

How do you sort an array on multiple columns?

How do i count values in multiple columns based on multiple criteria and create a new column row wise?