Write SQL to identify multiple subgroupings within a grouping

Chad

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.

For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.

INSERT INTO CompanySum
(
    [State] 
    ,[Zone] 
    ,[Company]
    ,[Code] 
    ,[Revenue] 
)
SELECT 
    --Keys of target
    [State] 
    ,[Zone] 

    --We are expecting to have one distinct combination of these fields per key grouping
    ,[Company]
    ,[Code] 

    --Aggregate
    ,SUM([Revenue])
    
FROM COMPANIES

GROUP BY
    [State] 
    ,[Zone] 
    ,[Company]
    ,[Code]

I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:

Data:

RecordNumber    State   Zone    Company         Code    Revenue
------------    -----   ----    -------         ----    --------
1               CT      B       State of CT     65453    10
2               CT      B       State of CT     65453     3
3               CT      B       Travelers       33443    20
4               CT      C       Cigna           45678    24
5               CT      C       Cigna           45678   234
6               MI      A       GM              48089   100
7               MI      A       GM              54555   200
8               MI      B       Chrysler        43434    44


Desired Output:

RecordNumber    State   Zone    Company         Code    Revenue
------------    -----   ----    -------         ----    --------
1               CT      B       State of CT     65453     10
2               CT      B       State of CT     65453      3
3               CT      B       Travelers       33443     20
6               MI      A       GM              48089    100
7               MI      A       GM              54555    200

Here is the DDL and DML needed to create this test scenario

CREATE TABLE [dbo].[Companies](
    [RecordNumber] [int] NULL,
    [State] [char](2) NOT NULL,
    [Zone] [varchar](30) NOT NULL,
    [Company] [varchar](30) NOT NULL,
    [Code] [varchar](30) NOT NULL,
    [Revenue] [numeric](9, 1) NULL
) ON [PRIMARY]


CREATE TABLE [dbo].[CompanySum](
    [State] [char](2) NOT NULL,
    [Zone] [varchar](30) NOT NULL,
    [Company] [varchar](30) NOT NULL,
    [Code] [varchar](30) NOT NULL,
    [Revenue] [numeric](9, 1) NULL,
 CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED 
(
    [State] ASC,
    [Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]


DELETE FROM [dbo].[Companies]
GO

INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1)))
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1)))
GO

This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.

Please see this SQL Fiddle:

http://sqlfiddle.com/#!18/d0141/1

Marcus Vinicius Pompeu

Is this a solution?

Fiddle: http://sqlfiddle.com/#!18/12e9a0/9

select c.*
from
    Companies c
        inner join (
            select State, Zone
            from Companies
            group by State, Zone
            having count(distinct Company + Code) > 1
        ) as dup_state_zone
        on(
                c.State = dup_state_zone.State
            and c.Zone  = dup_state_zone.Zone
        )

Edited - Fix the having clause, with a little cheat...

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do you write a sql to combine multiple variables with underscore within each variables?

SQL Server Weird Grouping Scenario by multiple columns and OR

Identify common rows based on a column value and adding conditions for the grouping in SQL

SQL: Identify first and last date within each consecutive group of days

SQL Grouping within Sub Queries

Randomly assigning equal sized subgroupings within each quantile

Grouping output with SQL across multiple fields

Aggregate SQL query across multiple tables by grouping

Grouping subsets within ordered data in SQL

SQL to join multiple tables while summing and grouping

SQL multiple columns grouping in one row

Widening data with ordinal subgroupings

Grouping by multiple columns in SQL

grouping multiple rows in to single row sql

Array Grouping and Identify in python

SQL group by Custom grouping sets within a group by column

Grouping continuous sessions over multiple rows in sql

(SQL) Identify positions of multiple occurrences of a string format within a field

Sql select by grouping multiple column

Grouping an SQL search without looping within in each loop - is there a better way?

Multiple grouping

SQL Grouping with Multiple Column Criteria

SQL Query for grouping within a segment with a condition

Perform multiple calculations based on multiple columns, grouping by a value within a separate column

Calculate total time within grouping in SQL

ORACLE SQL: Multiple Grouping Layers and Intermediate Results

Write conditional statement to identify dates within 1 year and greater than 30 days apart

How to write a function in R which will identify peaks within a dataframe

SQL - grouping Multiple select sums