Summation/counting over overlapping values or dates with group by over id's in sql

Mr. Confused

I am working with an sas table and the dates are represented as numbers given in columns "entered" and "left" . I have to count the days the member remained in the system. Like, for example below for id 1, the person entered on 7071 and again used a different product on 7075 although he remained continuously in system from 7071 to 7083. That is the dates overlap. I want to count the final duration a member stayed in the system like as for id 1 it is 12 days (7083-7071) + 2 days (7087 to 7089) + 4 days (7095 to 7099). So the total is 18 days. (There are some duplicate entered and left values but other columns (not shown here) are not same, so these rows were not removed.) . Since i'm working in sas so the idea can be both in sas data or the sas-sql format.

For member 2, there is no overlap of values. So the day count is 2 (8921 to 8923) + 5 days (8935 to 8940) = 7 days. I was able to solve this case as the days didn't overlap but for overlap case, any suggestion or code/advice is appreciated.

id  Entered  left
 1    7071   7077
 1    7071   7077
 1    7075   7079
 1    7077   7083
 1    7077   7083
 1    7078   7085
 1    7087   7089
 1    7095   7099
 2    8921   8923
 2    8935   8940

So the final table should be of the form

id  days_in_system
 1       18
 2       7
user667489

This is a surprisingly tricky problem as every row has to be compared to every other row for the same id to check for overlaps and if there are multiple overlaps you have to be very careful not to double-count them.

Here's a hash-based solution - the idea is to build up a hash containing all of the individual days a member has stayed as you go along, then count the number of items in it at the end:

data have;
input id  Entered  left;
cards;
 1    7071   7077
 1    7071   7077
 1    7075   7079
 1    7077   7083
 1    7077   7083
 1    7078   7085
 1    7087   7089
 1    7095   7099
 2    8921   8923
 2    8935   8940
 ;
 run;

data want;

length day 8;
if _n_ = 1 then do;
  declare hash h();
  rc = h.definekey('day');
  rc = h.definedone();
end;

do until(last.id);
  set have;
  by id;
  do day = entered to left - 1;
    rc = h.add();
  end;
end;

total_days = h.num_items;
rc = h.clear();
keep id total_days;

run;

This should be fairly light on memory as it only has to load the days for 1 id at a time.

The output from id 1 is 20, not 18 - here's a breakdown of the new days added row-by-row that I generated by adding a bit of debugging logic. If this is wrong, please indicate where:

_N_=1
7071 7072 7073 7074 7075 7076
_N_=2
No new days
_N_=3
7077 7078
_N_=4
7079 7080 7081 7082
_N_=5
No new days
_N_=6
7083 7084
_N_=7
7087 7088
_N_=8
7095 7096 7097 7098
_N_=1
8921 8922
_N_=2
8935 8936 8937 8938 8939

If you want to add only days for rows matching a particular condition, you can pick those using a where clause on the set statement, e.g.

  set have(where = (var1 in ('value1', 'value2', ...)));

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

SQL running total group by id and limited by a condition (over window)

SQL SUM GROUP BY where id changed over time

Get Multiple values over group by

Is there a DAX function that can run multiple IF statements over a group of ID's?

Iterating over Dates by Group in R using FOR loops

SQL Over_rumber on group by

Loop over group IP's

Need to Transpose some SQL values over 2 indices grouped by unmatching dates (SQL and Python)

Running Oracle SQL query over several dates

SQL Aggregate Over Range of Values

SQL Consuming Values over Rows

Hive sql rank over id

A query to get the minimum and max dates for values as they change over time in SQL Server

SQL query that finds dates between a range and takes values from another query & iterates range over them?

R Group ID's with overlapping time intervals

Distribute values of latest date over the group

MySQL Group By values that are equal over multiple columns

Summarizing unique values by group over multiple columns

Create a group id over a window in Spark Dataframe

How can I create a dummy variable over consecutive values by group id?

Group manually entered date values depending on whether they are continuously the same over system log dates - Follow-Up Question

Pandas - Calculate mean for group over expanding window of dates

SQL percentage aggregation over group by clause

How to use SUM over GROUP BY in SQL Server?

Group by code and not null columns or over(partiton in SQL

Using Over Partition in sql compared with group by

Group By clause over a series of data SQL

SQL How to select original(distinct) values from table without using distinct, group by and over keywords?

looping over a simple query with different values of "id"