How would I calculate the average of one column based on the specific conditions of two other columns?
I apologize for the convoluted title I’m new to SQL and don’t really know how to ask it better than this.
Essentially I have a very large table with many columns. I need to find the average of any selected column’s values but only if the ID for that row has a specific date.
For example I’ve written this simple query to find the average of the the column by date…
SELECT AVG(Col6), date FROM Schema.Table WHERE date = 2019
This returns the average of Col6 from all dates that have 2019 in the date column.
There are two years in the date column (2019 and 2020). Some ID’s don’t have data for 2019 and some don’t have data for 2020.
I want to write a query that basically only gives me the average of Col6 if an ID has a 2020 date and a 2019 date but I’m not sure how I would do that.
Any help would be greatly appreciated!
SELECT AVG(Col6), date FROM Schema.Table WHERE (date = 2019 or date = 2020), you can also group them using group by, but I guess that date is or 2019 or 2020, because it can't be at the same time both values
In order to check the date is present in both years you can use an inner join, as in:
select a.date, avg(a.col6) from schema.table a join schema.table b on b.id = a.id where a.date = 2019 and b.date = 2020
You would use GROUP BY date
or other similar grouping, to segregate the rows returned by the groupings. optionally WITH ROLLUP
to include the total averages. Use the WHERE
to limit the averaged data.
Example: DB-Fiddle
SELECT AVG(`Col6`) AS avg_col6, COALESCE(`date`, 'Total') AS date FROM `table1` WHERE `date` BETWEEN '2019' AND '2020' GROUP BY `date` WITH ROLLUP
An alternative to BETWEEN
(which retrieves a range), would be to use IN()
to limit only specific dates.
WHERE `date` IN('2019', '2020')
Data
CREATE TABLE table1 ( `Col6` INTEGER, `date` INTEGER ); INSERT INTO table1 (`Col6`, `date`) VALUES ('1', '2020'), ('2', '2020'), ('1', '2019'), ('1', '2019'), ('1', '2018'), ('1', NULL);
Result
| avg_col6 | date | | -------- | ----- | | 1.0000 | 2019 | | 1.5000 | 2020 | | 1.2500 | Total |
Without specifying a GROUP BY
the date
column values become ambiguous and can return "any value" within the subset and should not be relied upon.
The server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. [sic]
SELECT AVG(`Col6`) AS avg_col6, ANY_VALUE(`date`) AS date FROM `table1` WHERE `date` IN('2019', '2020')
Result
Note: 2020 may not actually be returned
| avg_col6 | date | | -------- | ----- | | 1.2500 | 2020 |