I have a fairly large SQL query that is being slowed by the GROUP BY and ORDER BY statements. What is the best way to optimize?
What is the best way to optimize this query as its taking up to 12 second to execute?
Apologies for the size of the query, it executes almost instantly without the group by and order by statements.
I’m fairly new to SQL optimization, I’ve only been coding for around a year. I didn’t write this query but I need to fix it.
SELECT `products`.*, `product_alternative`.`alternative_product_code`, `product_alternative`.`id` as `product_alternative_id`, `vat_rate`.`rate` AS `vat_rate`, `product_images`.`filename` AS `product_image_file`, `product_docs`.`filename` AS `product_doc_file`, `suppliers`.`supplier_code`, `suppliers`.`name` AS `supplier_name`, `commission`.`commission` AS `supplier_commission`, `categories`.`name` AS `category_name`, `sub_categories`.`name` AS `subcategory_name`, IF(`products`.`product_doc_id` = 0, NULL, CONCAT( "/product-docs/", `products`.`id` )) AS `product_doc_url`,IF(`products`.`product_image_id` = 0, NULL, CONCAT( "/product-images/", `products`.`id`, "/original/" )) AS `original_image_url`,IF(`products`.`product_image_id` = 0, NULL, CONCAT( "/product-images/", `products`.`id`, "/medium/" )) AS `medium_image_url`,IF(`products`.`product_image_id` = 0, NULL, CONCAT( "/product-images/", `products`.`id`, "/thumb/" )) AS `thumb_image_url`,CASE WHEN `group_favourite`.`id` IS NOT NULL THEN 1 ELSE 0 END AS `group_favourite` ,CASE WHEN `product_favourite`.`org_id` != 0 AND `product_favourite`.`org_id` IS NOT NULL THEN 1 ELSE 0 END AS `favourite` ,CASE WHEN `product_favourite`.`org_id` != 0 AND `product_favourite`.`org_id` IS NOT NULL THEN 1 ELSE 0 END AS `favourite`, `product_favourite`.`needs_authorisation`, IF(`customer_personal_favourite`.`id` IS NOT NULL,1,0) AS `is_personal_favourite` FROM `products` LEFT JOIN `vat_rate` ON `products`.`vat_rate_id` = `vat_rate`.`id` LEFT JOIN `product_images` ON `products`.`product_image_id` = `product_images`.`id` LEFT JOIN `product_docs` ON `products`.`product_doc_id` = `product_docs`.`id` LEFT JOIN `orgs` AS `suppliers` ON `products`.`supplier_id` = `suppliers`.`id` LEFT JOIN `commission` ON `suppliers`.`id` = `commission`.`org_id` AND `commission`.`status` = 1 AND DATE(`commission`.`start_date`) <= DATE('2020-07-15') AND ( DATE(`commission`.`end_date`) >= DATE('2020-07-15') OR `commission`.`end_date` = '0000-00-00' ) LEFT JOIN `categories` AS `categories` ON `products`.`category_id` = `categories`.`id` LEFT JOIN `categories` AS `sub_categories` ON `products`.`sub_category_id` = `sub_categories`.`id` LEFT JOIN `product_attribute_product` ON `product_attribute_product`.`product_id` = `products`.`id` LEFT JOIN `product_alternative` ON `product_alternative`.`product_original_id` = `products`.`original_id` JOIN `orgs` AS `customer` ON `customer`.`id` IN (320) LEFT JOIN `product_supplier_account` ON `product_supplier_account`.`product_id` = `products`.`id` JOIN `supplier_account` AS `default_supplier_account` ON `default_supplier_account`.`supplier_id` = `products`.`supplier_id` AND `default_supplier_account`.`is_default` = 1 AND `default_supplier_account`.`is_deleted` = 0 JOIN `customer_to_supplier` ON `customer_to_supplier`.`supplier_id` = `products`.`supplier_id` AND `customer_to_supplier`.`customer_id` IN (`customer`.`id`) AND `customer_to_supplier`.`allow_access` = 1 AND `customer_to_supplier`.`status` = 1 JOIN `customer_to_supplier_account` ON `customer_to_supplier_account`.`customer_to_supplier_id` = `customer_to_supplier`.`id` AND ( `customer_to_supplier_account`.`supplier_account_id` = `product_supplier_account`.`supplier_account_id` OR `customer_to_supplier_account`.`supplier_account_id` = `default_supplier_account`.`id` ) AND `customer_to_supplier_account`.`allow_access` = 1 AND `customer_to_supplier_account`.`is_deleted` = 0 LEFT JOIN `statement_products_categories` ON `statement_products_categories`.`product_id` = `products`.`original_id` LEFT JOIN `product_favourite` ON `product_favourite`.`product_id` = `products`.`original_id` AND `product_favourite`.`org_id` IN (`customer`.`id`) LEFT JOIN `group_favourite` AS `group_favourite` ON `products`.`original_id` = `group_favourite`.`product_id` AND `group_favourite`.`group_id` IN (37) LEFT JOIN `budget` AS `category_budget` ON `category_budget`.`org_id` IN (`customer`.`id`) AND `category_budget`.`budget_type_id` = 1 AND `category_budget`.`category_id` = `products`.`category_id` AND `category_budget`.`start_date` <= CURDATE() AND ( `category_budget`.`final_date` >= CURDATE() OR `category_budget`.`final_date` IS NULL ) LEFT JOIN `budget` AS `supplier_budget` ON `supplier_budget`.`org_id` IN (`customer`.`id`) AND `supplier_budget`.`budget_type_id` = 3 AND `supplier_budget`.`supplier_id` = `products`.`supplier_id` AND `supplier_budget`.`start_date` <= CURDATE() AND ( `supplier_budget`.`final_date` >= CURDATE() OR `supplier_budget`.`final_date` IS NULL ) LEFT JOIN `budget` AS `product_budget` ON `product_budget`.`org_id` IN (`customer`.`id`) AND `product_budget`.`budget_type_id` = 2 AND `product_budget`.`start_date` <= CURDATE() AND ( `product_budget`.`final_date` >= CURDATE() OR `product_budget`.`final_date` IS NULL ) LEFT JOIN `customer_personal_favourite` ON `customer_personal_favourite`.`org_id` = `customer`.`id` AND `customer_personal_favourite`.`product_original_id` = `products`.`original_id` WHERE 0 OR (1 AND `products`.`status` = 1 AND (( `products`.`parent_id` = 0 OR `products`.`parent_id` IS NULL ) OR ( `product_favourite`.`org_id` IS NOT NULL AND `product_favourite`.`org_id` != 0 ) ) AND `products`.`id` = `products`.`current_id` ) GROUP BY `products`.`id` ORDER BY `favourite` DESC, `active_date` ASC, `products`.`code` ASC, `products`.`name` ASC, `products`.`parent_id` ASC LIMIT 0, 20
I had once faced this issue and I decided at that time to write an article on this and I have compiled from different sources and optimized my query if someone wishes to see how I achieved it. Please have a look at this. https://junaidtechblog.wordpress.com/2019/09/04/optimize-sql-query-groupby-having/
AND DATE(`commission`.`start_date`) <= DATE('2020-07-15') AND ( DATE(`commission`.`end_date`) >= DATE('2020-07-15') OR `commission`.`end_date` = '0000-00-00'
has several problems:
OR
is hard to optimize. Find some way to avoid it — such as by removing such rows, or using an end date that is far in the future, not past.- Don’t hide a column in a function call.
- Let’s see
SHOW CREATE TABLE
. - You don’t need
DATE()
around the literal ‘2020-07-15’.
This, on the other hand, is OK: product_budget
.start_date
<= CURDATE()
ORDER BY `favourite` DESC, `active_date` ASC,
Unless you have MySQL 8.0, no index can handle an ORDER BY
that mixes DESC and ASC. (It may not be possible to use an index for other reasons.)
WHERE 0 OR (1 AND ...
Dynamically build the query, rather than using kludges like these.
LEFT JOIN ... ON ... foo = 1
ON
should state how the tables are related. foo = 1
feels more like something that should be in the WHERE
clause (for filtering). Moving that between ON
and WHERE
will probably change the set of rows produced. But I am questioning whether that should be JOIN
(instead of LEFT JOIN
) or the test should be in WHERE
, or whether it is actually valid the way you wrote it.
Please provide EXPLAIN SELECT ...
The lines with the largest "rows" are likely to be the tables to focus on.
GROUP BY `products`.`id`
is likely to lead to mangled output. Read about "only_full_group_by".
That leads to a potential speedup.
-
Do the minimal amount of work to get a list of
id
to satisfy theORDER BY
andLIMIT
. -
Then
JOIN
to all the other tables, including back toproducts
, to get the other columns. This is likely to speed things up and get rid of the above bug. -
Where practical change
LEFT JOIN
into a subquery:SELECT ... `vat_rate`.`rate` AS `vat_rate`, ... LEFT JOIN `vat_rate` ON `products`.`vat_rate_id` = `vat_rate`.`id` ...
–>
SELECT ... ( SELECT `rate` FROM vat_rate WHERE id = `products`.`vat_rate_id` ) AS `vat_rate`, ... ...
Get most of that done, then come back for more suggestions.