I am working with an sas table and the dates are represented as numbers given in columns "entered" and "left" . I have to count the days the member remained in the system. Like, for example below for id 1, the person entered on 7071 and again used a different product on 7075 although he remained continuously in system from 7071 to 7083. That is the dates overlap. I want to count the final duration a member stayed in the system like as for id 1 it is 12 days (7083-7071) + 2 days (7087 to 7089) + 4 days (7095 to 7099). So the total is 18 days. (There are some duplicate entered and left values but other columns (not shown here) are not same, so these rows were not removed.) . Since i'm working in sas so the idea can be both in sas data or the sas-sql format.
For member 2, there is no overlap of values. So the day count is 2 (8921 to 8923) + 5 days (8935 to 8940) = 7 days. I was able to solve this case as the days didn't overlap but for overlap case, any suggestion or code/advice is appreciated.
id Entered left
1 7071 7077
1 7071 7077
1 7075 7079
1 7077 7083
1 7077 7083
1 7078 7085
1 7087 7089
1 7095 7099
2 8921 8923
2 8935 8940
So the final table should be of the form
id days_in_system
1 18
2 7
This is a surprisingly tricky problem as every row has to be compared to every other row for the same id to check for overlaps and if there are multiple overlaps you have to be very careful not to double-count them.
Here's a hash-based solution - the idea is to build up a hash containing all of the individual days a member has stayed as you go along, then count the number of items in it at the end:
data have;
input id Entered left;
cards;
1 7071 7077
1 7071 7077
1 7075 7079
1 7077 7083
1 7077 7083
1 7078 7085
1 7087 7089
1 7095 7099
2 8921 8923
2 8935 8940
;
run;
data want;
length day 8;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey('day');
rc = h.definedone();
end;
do until(last.id);
set have;
by id;
do day = entered to left - 1;
rc = h.add();
end;
end;
total_days = h.num_items;
rc = h.clear();
keep id total_days;
run;
This should be fairly light on memory as it only has to load the days for 1 id at a time.
The output from id 1 is 20, not 18 - here's a breakdown of the new days added row-by-row that I generated by adding a bit of debugging logic. If this is wrong, please indicate where:
_N_=1
7071 7072 7073 7074 7075 7076
_N_=2
No new days
_N_=3
7077 7078
_N_=4
7079 7080 7081 7082
_N_=5
No new days
_N_=6
7083 7084
_N_=7
7087 7088
_N_=8
7095 7096 7097 7098
_N_=1
8921 8922
_N_=2
8935 8936 8937 8938 8939
If you want to add only days for rows matching a particular condition, you can pick those using a where clause on the set statement, e.g.
set have(where = (var1 in ('value1', 'value2', ...)));
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments