A way to have a rolling summation
I have the below dataset. In the below example records for the year 1993. The Tgrowth
column is start - end
. Started
is the number of employees that joined on a specific month and ended
is the number of employees that left for the same month.
SELECT r.Tgrowth, CASE WHEN t.mon_num = 1 THEN 'JAN' WHEN t.mon_num = 2 THEN 'FEB' WHEN t.mon_num = 3 THEN 'MAR' WHEN t.mon_num = 4 THEN 'APR' WHEN t.mon_num = 5 THEN 'MAY' WHEN t.mon_num = 6 THEN 'JUN' WHEN t.mon_num = 7 THEN 'JUL' WHEN t.mon_num = 8 THEN 'AUG' WHEN t.mon_num = 9 THEN 'SEP' WHEN t.mon_num = 10 THEN 'OCT' WHEN t.mon_num = 11 THEN 'NOV' WHEN t.mon_num = 12 THEN 'DEC' END AS myMONTH FROM (SELECT 1 mon_num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) t LEFT JOIN Reports r ON t.mon_num = r.theMONTH AND r.Tyear = 1993 GROUP BY r.Tgrowth , myMONTH ORDER BY t.mon_num ASC
The result set for the above is as follows,
Tgrowth Month 1 JAN 0 FEB 2 MAR 0 APR 0 MAY 0 JUN 0 JUL 0 AUG 0 SEP 0 OCT 0 NOV 0 DEC
Instead I would like the result to show a rolling sum i.e. add to the Tgrowth
field. Something like the below,
growth Emp_Count myMONTH 1 1 JAN 0 1 FEB 2 3 MAR 0 3 APR 0 3 MAY 0 3 JUN 0 3 JUL 0 3 AUG 0 3 SEP 0 3 OCT 0 3 NOV 0 3 DEC
There are 2 options:
- use join
- use variables
The method of using join is as following:
SELECT t1.Tgrowth, sum(t2.Tgrowth) as Emp_Count, CASE WHEN t1.Month = 1 THEN 'JAN' WHEN t1.Month = 2 THEN 'FEB' WHEN t1.Month = 3 THEN 'MAR' WHEN t1.Month = 4 THEN 'APR' WHEN t1.Month = 5 THEN 'MAY' WHEN t1.Month = 6 THEN 'JUN' WHEN t1.Month = 7 THEN 'JUL' WHEN t1.Month = 8 THEN 'AUG' WHEN t1.Month = 9 THEN 'SEP' WHEN t1.Month = 10 THEN 'OCT' WHEN t1.Month = 11 THEN 'NOV' WHEN t1.Month = 12 THEN 'DEC' END AS myMONTH FROM ( SELECT case when r.growth is not null then r.growth when r.growth is null then 0 END as Tgrowth, t.mon_num AS Month FROM (SELECT 1 mon_num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) t LEFT JOIN Reports r ON t.mon_num = r.themonth AND r.theYear = 1993 GROUP BY r.growth , Month ORDER BY t.mon_num ASC ) as t1 join ( SELECT case when r.growth is not null then r.growth when r.growth is null then 0 END as Tgrowth, t.mon_num AS Month FROM (SELECT 1 mon_num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) t LEFT JOIN Reports r ON t.mon_num = r.themonth AND r.theYear = 1993 GROUP BY r.growth , Month ORDER BY t.mon_num ASC ) as t2 on t1.Month >= t2.Month group by t1.Month;
Use variables solution is as following:
SET @num := 0; select Tgrowth, @num := @num + Tgrowth as Emp_Count, CASE WHEN t1.Month = 1 THEN 'JAN' WHEN t1.Month = 2 THEN 'FEB' WHEN t1.Month = 3 THEN 'MAR' WHEN t1.Month = 4 THEN 'APR' WHEN t1.Month = 5 THEN 'MAY' WHEN t1.Month = 6 THEN 'JUN' WHEN t1.Month = 7 THEN 'JUL' WHEN t1.Month = 8 THEN 'AUG' WHEN t1.Month = 9 THEN 'SEP' WHEN t1.Month = 10 THEN 'OCT' WHEN t1.Month = 11 THEN 'NOV' WHEN t1.Month = 12 THEN 'DEC' END AS myMONTH from ( SELECT case when r.growth is not null then r.growth when r.growth is null then 0 END as Tgrowth, t.mon_num AS Month FROM (SELECT 1 mon_num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) t LEFT JOIN Reports r ON t.mon_num = r.themonth AND r.theYear = 1993 GROUP BY r.growth , Month ORDER BY t.mon_num ASC ) t1;
Since you are running MySQL 8.0, I would recommend a recursive query to generate the dates, and then window functions and aggregation.
If you want the whole 1993 year:
with dates as ( select '1993-01-01' dt union all select dt + interval 1 month from dates where dt < '1993-12-01' ) select date_format(d.dt, '%b') mymonth, coalesce(sum(started), 0) - coalesce(sum(ended), 0) growth, sum(coalesce(sum(started), 0) - coalesce(sum(ended), 0)) over(order by d.dt) emp_count from dates d left join reports r on r.theDate >= d.dt and r.theDate < d.dt + interval 1 month group by d.dt order by d.dt
This assumes that theDate
is stored as a date
datatype and not a string (else, you would need to convert it first, using str_to_date()
).
This also takes in account the possibility that the table may contain several rows for a given month. If that’s not the case, then there is no need for aggregation:
with dates as ( select '1993-01-01' dt union all select dt + interval 1 month from dates where dt < '1993-12-01' ) select date_format(d.dt, '%b') mymonth, coalesce(started, 0) - coalesce(ended, 0) growth, sum(coalesce(started, 0) - coalesce(ended, 0)) over(order by d.dt) emp_count from dates d left join reports r on r.theDate >= d.dt and r.theDate < d.dt + interval 1 month order by d.dt