Filter and Aggregate a column not in the group by clause MySQL
I have the following table. I’m using mysql-5.7. Here is the dbfiddle
select * from report; +----+---------------------+---------+------+------+------------+------+-------------------+ | id | date_hour | speed | name | nor | validation | p_i | speed_performance | +----+---------------------+---------+------+------+------------+------+-------------------+ | 1 | 2018-02-03 13:23:00 | -12.3 | SYN | 10 | 1 | 3 | 4 | | 2 | 2018-02-03 11:23:00 | -6.36 | SYN | 13 | 0 | 4 | 5 | | 3 | 2018-02-03 01:23:00 | -26.36 | SYN | 24 | 0 | 2 | 4 | | 4 | 2020-04-06 21:23:00 | -156.36 | SYN | 16 | 1 | 3 | 6 | | 5 | 2020-04-06 03:23:00 | -36.36 | YRT | 136 | 0 | 2 | 5 | | 6 | 2020-04-06 12:23:00 | -16.36 | SYN | 13 | 1 | 4 | 4 | +----+---------------------+---------+------+------+------------+------+-------------------+
I would like to group by
date_hour
such that after 12:00:00
, it counts as the next day. The p_i
column should be averaged based on this condition: take average
of p_i
where validation
is 1 and p_i
is non-zero. The values should be averaged based on the new version of date_hour
. Additionally, another column (nor
) is averaged without such a condition.
The output should look like this
+------+------------+----------+------+ | name | date_hr | avg(nor) | p_i | +------+------------+----------+------+ | SYN | 2018-02-03 | 18.5 | NULL | | SYN | 2018-02-04 | 10 | 3 | | SYN | 2020-04-07 | 14.5 | 3.5 | | YRT | 2020-04-06 | 136 | NULL | +------+------------+----------+------+
I tried the following query. I had to disable only_full_group_by
to get it to work.:
select name, case when time(date_hour) > '12:00:00' then date(date_hour) + interval 1 day else date(date_hour) end as date_hr, avg(nor), case when validation = 1 and p_i > 0 then avg(p_i) else null end as p_i from report group by name, date_hr; +------+------------+----------+------+ | name | date_hr | avg(nor) | p_i | +------+------------+----------+------+ | SYN | 2018-02-03 | 18.5 | 3 | | SYN | 2018-02-04 | 10 | NULL | | SYN | 2020-04-07 | 14.5 | NULL | | YRT | 2020-04-06 | 136 | 2 | +------+------------+----------+------+
You can do it with conditional aggregation:
select max(name) name, date(date_add(date_hour, interval 12 hour)) date, avg(nor) avg_nor, avg(case when validation = '1' and p_i <> 0 then p_i end) avg_p_i from report group by date
I used the aggregate function max()
to return the column name
, but you can also use min()
or any_value()
.
See the demo.
Results:
> name | date | avg_nor | avg_p_i > :--- | :--------- | ------: | ------: > SYN | 2018-02-03 | 18.5 | null > SYN | 2018-02-04 | 10 | 3 > YRT | 2020-04-06 | 136 | null > SYN | 2020-04-07 | 14.5 | 3.5
You’d have to disable strict_mode for this, which is frustrating, because this query is otherwisee valid…
CREATE TABLE report ( id INT SERIAL PRIMARY KEY , date_hour DATETIME DEFAULT NULL , speed FLOAT DEFAULT NULL , name VARCHAR(20) DEFAULT NULL , nor FLOAT DEFAULT NULL , validation TINYINT DEFAULT NULL , p_i VARCHAR(20) DEFAULT NULL , speed_performance VARCHAR(20) DEFAULT NULL ); INSERT INTO report VALUES (1,'2018-02-03 13:23:00', -12.30,'SYN', 10,1,3,4), (2,'2018-02-03 11:23:00', -6.36,'SYN', 13,0,4,5), (3,'2018-02-03 01:23:00', -26.36,'SYN', 24,0,2,4), (4,'2020-04-06 21:23:00',-156.36,'SYN', 16,1,3,6), (5,'2020-04-06 03:23:00', -36.36,'YRT',136,0,2,5), (6,'2020-04-06 12:23:00', -16.36,'SYN', 13,1,4,4); SELECT name , DATE(date_hour) dt , avg(nor) avg_nor FROM report GROUP BY name , DATE(date_hour+INTERVAL 12 HOUR); +------+---------------------+---------+ | name | dt | avg_nor | +------+---------------------+---------+ | SYN | 2018-02-03 11:23:00 | 18.5 | | SYN | 2018-02-03 13:23:00 | 10 | | SYN | 2020-04-06 21:23:00 | 14.5 | | YRT | 2020-04-06 03:23:00 | 136 | +------+---------------------+---------+