MySQL query with many joins
I have query with many joins and I’m searching for optimization for it. It’s about computers: For examples I have: Lenovo 8gbRAM 1TB core i5 ips etc. (all these after brand name are attributes) I have configuration where, I want to change attribute 8gbRAM to 16gbRAM and I have to search for other item with all these attributes and 16gbRAM
Two tables:
**st_item** - id - name ... **st_item_specification_attribute** - id - st_item_id - attribute_id - attribute_value_id ...
My problem is that my item has 15 attributes. When I tested with lower number of attributes I use this structure of query and it works, but now system has 85k items and over 1kk item attributes This is the query:
SELECT `st_item`.id FROM `st_item` LEFT JOIN `st_item_specification_attribute` `sisa_36590` ON st_item.id = sisa_36590.item_id AND sisa_36590.attribute_id = 365 LEFT JOIN `st_item_specification_attribute` `sisa_367910` ON st_item.id = sisa_367910.item_id AND sisa_367910.attribute_id = 367 LEFT JOIN `st_item_specification_attribute` `sisa_374641` ON st_item.id = sisa_374641.item_id AND sisa_374641.attribute_id = 374 LEFT JOIN `st_item_specification_attribute` `sisa_378366` ON st_item.id = sisa_378366.item_id AND sisa_378366.attribute_id = 378 LEFT JOIN `st_item_specification_attribute` `sisa_382500` ON st_item.id = sisa_382500.item_id AND sisa_382500.attribute_id = 382 LEFT JOIN `st_item_specification_attribute` `sisa_372134` ON st_item.id = sisa_372134.item_id AND sisa_372134.attribute_id = 372 LEFT JOIN `st_item_specification_attribute` `sisa_41268` ON st_item.id = sisa_41268.item_id AND sisa_41268.attribute_id = 412 LEFT JOIN `st_item_specification_attribute` `sisa_413368` ON st_item.id = sisa_413368.item_id AND sisa_413368.attribute_id = 413 LEFT JOIN `st_item_specification_attribute` `sisa_414929` ON st_item.id = sisa_414929.item_id AND sisa_414929.attribute_id = 414 LEFT JOIN `st_item_specification_attribute` `sisa_418496` ON st_item.id = sisa_418496.item_id AND sisa_418496.attribute_id = 418 LEFT JOIN `st_item_specification_attribute` `sisa_385748` ON st_item.id = sisa_385748.item_id AND sisa_385748.attribute_id = 385 LEFT JOIN `st_item_specification_attribute` `sisa_36625` ON st_item.id = sisa_36625.item_id AND sisa_36625.attribute_id = 366 LEFT JOIN `st_item_specification_attribute` `sisa_366355` ON st_item.id = sisa_366355.item_id AND sisa_366355.attribute_id = 366 LEFT JOIN `st_item_specification_attribute` `sisa_366816` ON st_item.id = sisa_366816.item_id AND sisa_366816.attribute_id = 366 LEFT JOIN `st_item_specification_attribute` `sisa_366370` ON st_item.id = sisa_366370.item_id AND sisa_366370.attribute_id = 366 WHERE (`parent_id`=1032) AND (sisa_36590.attribute_value_id = 2230) AND (sisa_367910.attribute_value_id = 2451) AND (sisa_374641.attribute_value_id = 3793) AND (sisa_378366.attribute_value_id = 2955) AND (sisa_382500.attribute_value_id = 3879) AND (sisa_372134.attribute_value_id = 2780) AND (sisa_41268.attribute_value_id = 3363) AND (sisa_413368.attribute_value_id = 3373) AND (sisa_414929.attribute_value_id = 3378) AND (sisa_418496.attribute_value_id = 3844) AND (sisa_385748.attribute_value_id = 3036) AND (sisa_36625.attribute_value_id = 2315) AND (sisa_366355.attribute_value_id = 2408) AND (sisa_366816.attribute_value_id = 2412) AND (sisa_366370.attribute_value_id = 2420)
Query must compare specific pair attribute_id => attribute_value_id
, that’s the reason my "ON clause" to be with item_id
and attribute_id
and specific alias
You can use aggregation:
select i.id from st_item i join st_item_specification_attribute sisa ON sisa.item_id = i.item_id where i.parent_id = 1032 and (sisa.attribute_id, attribute_value_id) in ( (365, 2230), (367, 2451), . . .) group by i.id having count(*) = 15;
You can move your WHERE
conditions into ON
conditions and change LEFT JOIN
to INNER JOIN
.
SELECT `st_item`.id FROM `st_item` JOIN `st_item_specification_attribute` `sisa_36590` ON st_item.id = sisa_36590.item_id AND sisa_36590.attribute_id = 365 AND sisa_36590.attribute_value_id = 2230 JOIN `st_item_specification_attribute` `sisa_367910` ON st_item.id = sisa_367910.item_id AND sisa_367910.attribute_id = 367 AND sisa_367910.attribute_value_id = 2451 ... WHERE `parent_id`=1032
2nd approach
SELECT `st_item`.id FROM `st_item` JOIN `st_item_specification_attribute` `sisa` ON st_item.id = sisa.item_id AND ( (sisa.attribute_id = 365 AND sisa.attribute_value_id = 2230) OR (sisa.attribute_id = 367 AND sisa.attribute_value_id = 2451) ... ) WHERE `parent_id`=1032 GROUP BY `st_item`.id HAVING COUNT(*) = 15
I cannot predict the performance, but I think you could make on subselect out of all that joins
(I presume atribute_id and Attribute_value_id pairs are unique per item_id)
SELECT `st_item`.id FROM `st_item` WHERE (`parent_id`=1032) AND 15 = (SELECT COUNT(*) FROM st_item_specification_attribute attr WHERE `st_item`.id = attr.item_id AND ( attribute_id = 365 AND attribute_value_id = 2230 OR ... )
I would use a UNION ALL approach for this. It’s easy to change, and fairly simple to read. Performance should be pretty good too:
--A CTE so you only have to change parent_id in one place --I believe not all mysql versions support this, though --You could of course just select the parent_id in de UNION ALL and -- use a single WHERE in the outer query. WITH st_item_id AS ( SELECT id , attribute_id , attribute_value_id FROM st_item WHERE parent_id = 1032 ) SELECT UA.id FROM ( SELECT st_item.id FROM st_item_id INNER JOIN st_item_specification_attribute sisa_36590 ON st_item_id.id = sisa_36590.item_id AND sisa_36590.attribute_id = 365 AND sisa_36590.attribute_value_id = 2230 UNION ALL SELECT st_item.id FROM st_item_id INNER JOIN st_item_specification_attribute sisa_367910 ON st_item_id.id = sisa_367910.item_id AND sisa_367910.attribute_id = 367 AND sisa_367910.attribute_value_id = 2451 ) UA