How to create a MySQL hierarchical recursive query
I have a MySQL table which is as follows:
id | name | parent_id 19 | category1 | 0 20 | category2 | 19 21 | category3 | 20 22 | category4 | 21 ......
Now, I want to have a single MySQL query to which I simply supply the id [for instance say ‘id = 19’] then I should get all its child ids [i.e. result should have ids ‘20,21,22’]…. Also, the hierarchy of the children is not known it can vary….
Also, I already have the solution using the for loop….. Let me know how to achieve the same using a single MySQL query if possible.
For MySQL 8+: use the recursive with
syntax.
For MySQL 5.x: use inline variables, path IDs, or self-joins.
MySQL 8+
with recursive cte (id, name, parent_id) as ( select id, name, parent_id from products where parent_id = 19 union all select p.id, p.name, p.parent_id from products p inner join cte on p.parent_id = cte.id ) select * from cte;
The value specified in parent_id = 19
should be set to the id
of the parent you want to select all the descendants of.
MySQL 5.x
For MySQL versions that do not support Common Table Expressions (up to version 5.7), you would achieve this with the following query:
select id, name, parent_id from (select * from products order by parent_id, id) products_sorted, (select @pv := '19') initialisation where find_in_set(parent_id, @pv) and length(@pv := concat(@pv, ',', id))
Here is a fiddle.
Here, the value specified in @pv := '19'
should be set to the id
of the parent you want to select all the descendants of.
This will work also if a parent has multiple children. However, it is required that each record fulfills the condition parent_id < id
, otherwise the results will not be complete.
Variable assignments inside a query
This query uses specific MySQL syntax: variables are assigned and modified during its execution. Some assumptions are made about the order of execution:
- The
from
clause is evaluated first. So that is where@pv
gets initialised. - The
where
clause is evaluated for each record in the order of retrieval from thefrom
aliases. So this is where a condition is put to only include records for which the parent was already identified as being in the descendant tree (all descendants of the primary parent are progressively added to@pv
). - The conditions in this
where
clause are evaluated in order, and the evaluation is interrupted once the total outcome is certain. Therefore the second condition must be in second place, as it adds theid
to the parent list, and this should only happen if theid
passes the first condition. Thelength
function is only called to make sure this condition is always true, even if thepv
string would for some reason yield a falsy value.
All in all, one may find these assumptions too risky to rely on. The documentation warns:
you might get the results you expect, but this is not guaranteed […] the order of evaluation for expressions involving user variables is undefined.
So even though it works consistently with the above query, the evaluation order may still change, for instance when you add conditions or use this query as a view or sub-query in a larger query. It is a “feature” that will be removed in a future MySQL release:
Previous releases of MySQL made it possible to assign a value to a user variable in statements other than
SET
. This functionality is supported in MySQL 8.0 for backward compatibility but is subject to removal in a future release of MySQL.
As stated above, from MySQL 8.0 onward you should use the recursive with
syntax.
Efficiency
For very large data sets this solution might get slow, as the find_in_set
operation is not the most ideal way to find a number in a list, certainly not in a list that reaches a size in the same order of magnitude as the number of records returned.
Alternative 1: with recursive
, connect by
More and more databases implement the SQL:1999 ISO standard WITH [RECURSIVE]
syntax for recursive queries (e.g. Postgres 8.4+, SQL Server 2005+, DB2, Oracle 11gR2+, SQLite 3.8.4+, Firebird 2.1+, H2, HyperSQL 2.1.0+, Teradata, MariaDB 10.2.2+). And as of version 8.0, also MySQL supports it. See the top of this answer for the syntax to use.
Some databases have an alternative, non-standard syntax for hierarchical look-ups, such as the CONNECT BY
clause available on Oracle, DB2, Informix, CUBRID and other databases.
MySQL version 5.7 does not offer such a feature. When your database engine provides this syntax or you can migrate to one that does, then that is certainly the best option to go for. If not, then also consider the following alternatives.
Alternative 2: Path-style Identifiers
Things become a lot easier if you would assign id
values that contain the hierarchical information: a path. For example, in your case this could look like this:
ID | NAME 19 | category1 19/1 | category2 19/1/1 | category3 19/1/1/1 | category4
Then your select
would look like this:
select id, name from products where id like '19/%'
Alternative 3: Repeated Self-joins
If you know an upper limit for how deep your hierarchy tree can become, you can use a standard sql
query like this:
select p6.parent_id as parent6_id, p5.parent_id as parent5_id, p4.parent_id as parent4_id, p3.parent_id as parent3_id, p2.parent_id as parent2_id, p1.parent_id as parent_id, p1.id as product_id, p1.name from products p1 left join products p2 on p2.id = p1.parent_id left join products p3 on p3.id = p2.parent_id left join products p4 on p4.id = p3.parent_id left join products p5 on p5.id = p4.parent_id left join products p6 on p6.id = p5.parent_id where 19 in (p1.parent_id, p2.parent_id, p3.parent_id, p4.parent_id, p5.parent_id, p6.parent_id) order by 1, 2, 3, 4, 5, 6, 7;
See this fiddle
The where
condition specifies which parent you want to retrieve the descendants of. You can extend this query with more levels as needed.
From the blog Managing Hierarchical Data in MySQL
Table structure
+-------------+----------------------+--------+ | category_id | name | parent | +-------------+----------------------+--------+ | 1 | ELECTRONICS | NULL | | 2 | TELEVISIONS | 1 | | 3 | TUBE | 2 | | 4 | LCD | 2 | | 5 | PLASMA | 2 | | 6 | PORTABLE ELECTRONICS | 1 | | 7 | MP3 PLAYERS | 6 | | 8 | FLASH | 7 | | 9 | CD PLAYERS | 6 | | 10 | 2 WAY RADIOS | 6 | +-------------+----------------------+--------+
Query:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4 FROM category AS t1 LEFT JOIN category AS t2 ON t2.parent = t1.category_id LEFT JOIN category AS t3 ON t3.parent = t2.category_id LEFT JOIN category AS t4 ON t4.parent = t3.category_id WHERE t1.name = 'ELECTRONICS';
Output
+-------------+----------------------+--------------+-------+ | lev1 | lev2 | lev3 | lev4 | +-------------+----------------------+--------------+-------+ | ELECTRONICS | TELEVISIONS | TUBE | NULL | | ELECTRONICS | TELEVISIONS | LCD | NULL | | ELECTRONICS | TELEVISIONS | PLASMA | NULL | | ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH | | ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL | | ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL | +-------------+----------------------+--------------+-------+
Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a flat list. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. Read more
Refer the blog for more details.
EDIT:
select @pv:=category_id as category_id, name, parent from category join (select @pv:=19)tmp where parent=@pv
Output:
category_id name parent 19 category1 0 20 category2 19 21 category3 20 22 category4 21
Try these:
Table definition:
DROP TABLE IF EXISTS category; CREATE TABLE category ( id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20), parent_id INT, CONSTRAINT fk_category_parent FOREIGN KEY (parent_id) REFERENCES category (id) ) engine=innodb;
Experimental rows:
INSERT INTO category VALUES (19, 'category1', NULL), (20, 'category2', 19), (21, 'category3', 20), (22, 'category4', 21), (23, 'categoryA', 19), (24, 'categoryB', 23), (25, 'categoryC', 23), (26, 'categoryD', 24);
Recursive Stored procedure:
DROP PROCEDURE IF EXISTS getpath; DELIMITER $$ CREATE PROCEDURE getpath(IN cat_id INT, OUT path TEXT) BEGIN DECLARE catname VARCHAR(20); DECLARE temppath TEXT; DECLARE tempparent INT; SET max_sp_recursion_depth = 255; SELECT name, parent_id FROM category WHERE id=cat_id INTO catname, tempparent; IF tempparent IS NULL THEN SET path = catname; ELSE CALL getpath(tempparent, temppath); SET path = CONCAT(temppath, '/', catname); END IF; END$$ DELIMITER ;
Wrapper function for the stored procedure:
DROP FUNCTION IF EXISTS getpath; DELIMITER $$ CREATE FUNCTION getpath(cat_id INT) RETURNS TEXT DETERMINISTIC BEGIN DECLARE res TEXT; CALL getpath(cat_id, res); RETURN res; END$$ DELIMITER ;
Select example:
SELECT id, name, getpath(id) AS path FROM category;
Output:
+----+-----------+-----------------------------------------+ | id | name | path | +----+-----------+-----------------------------------------+ | 19 | category1 | category1 | | 20 | category2 | category1/category2 | | 21 | category3 | category1/category2/category3 | | 22 | category4 | category1/category2/category3/category4 | | 23 | categoryA | category1/categoryA | | 24 | categoryB | category1/categoryA/categoryB | | 25 | categoryC | category1/categoryA/categoryC | | 26 | categoryD | category1/categoryA/categoryB/categoryD | +----+-----------+-----------------------------------------+
Filtering rows with certain path:
SELECT id, name, getpath(id) AS path FROM category HAVING path LIKE 'category1/category2%';
Output:
+----+-----------+-----------------------------------------+ | id | name | path | +----+-----------+-----------------------------------------+ | 20 | category2 | category1/category2 | | 21 | category3 | category1/category2/category3 | | 22 | category4 | category1/category2/category3/category4 | +----+-----------+-----------------------------------------+
The best approach I’ve come up with is
- Use lineage to store\sort\trace trees. That’s more than enough, and works thousands times faster for reading than any other approach. It also allows to stay on that pattern even if DB will change(as ANY db will allow that pattern to be used)
- Use function that determines lineage for specific ID.
- Use it as you wish (in selects, or on CUD operations, or even by jobs).
Lineage approach descr. can be found wherever, for example Here or here. As of function – that is what enspired me.
In the end – got more-or-less simple, relatively fast, and SIMPLE solution.
Function’s body
-- -------------------------------------------------------------------------------- -- Routine DDL -- Note: comments before and after the routine body will not be stored by the server -- -------------------------------------------------------------------------------- DELIMITER $$ CREATE DEFINER=`root`@`localhost` FUNCTION `get_lineage`(the_id INT) RETURNS text CHARSET utf8 READS SQL DATA BEGIN DECLARE v_rec INT DEFAULT 0; DECLARE done INT DEFAULT FALSE; DECLARE v_res text DEFAULT ''; DECLARE v_papa int; DECLARE v_papa_papa int DEFAULT -1; DECLARE csr CURSOR FOR select _id,parent_id -- @n:=@n+1 as rownum,T1.* from (SELECT @r AS _id, (SELECT @r := table_parent_id FROM table WHERE table_id = _id) AS parent_id, @l := @l + 1 AS lvl FROM (SELECT @r := the_id, @l := 0,@n:=0) vars, table m WHERE @r <> 0 ) T1 where T1.parent_id is not null ORDER BY T1.lvl DESC; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; open csr; read_loop: LOOP fetch csr into v_papa,v_papa_papa; SET v_rec = v_rec+1; IF done THEN LEAVE read_loop; END IF; -- add first IF v_rec = 1 THEN SET v_res = v_papa_papa; END IF; SET v_res = CONCAT(v_res,'-',v_papa); END LOOP; close csr; return v_res; END
And then you just
select get_lineage(the_id)
Hope it helps somebody 🙂
Did the same thing for another quetion here
Mysql select recursive get all child with multiple level
The query will be :
SELECT GROUP_CONCAT(lv SEPARATOR ',') FROM ( SELECT @pv:=( SELECT GROUP_CONCAT(id SEPARATOR ',') FROM table WHERE parent_id IN (@pv) ) AS lv FROM table JOIN (SELECT @pv:=1)tmp WHERE parent_id IN (@pv) ) a;
If you need quick read speed, the best option is to use a closure table. A closure table contains a row for each ancestor/descendant pair. So in your example, the closure table would look like
ancestor | descendant | depth 0 | 0 | 0 0 | 19 | 1 0 | 20 | 2 0 | 21 | 3 0 | 22 | 4 19 | 19 | 0 19 | 20 | 1 19 | 21 | 3 19 | 22 | 4 20 | 20 | 0 20 | 21 | 1 20 | 22 | 2 21 | 21 | 0 21 | 22 | 1 22 | 22 | 0
Once you have this table, hierarchical queries become very easy and fast. To get all the descendants of category 20:
SELECT cat.* FROM categories_closure AS cl INNER JOIN categories AS cat ON cat.id = cl.descendant WHERE cl.ancestor = 20 AND cl.depth > 0
Of course, there is a big downside whenever you use denormalized data like this. You need to maintain the closure table alongside your categories table. The best way is probably to use triggers, but it is somewhat complex to correctly track inserts/updates/deletes for closure tables. As with anything, you need to look at your requirements and decide what approach is best for you.
Edit: See the question What are the options for storing hierarchical data in a relational database? for more options. There are different optimal solutions for different situations.
Simple query to list child’s of first recursion:
select @pv:=id as id, name, parent_id from products join (select @pv:=19)tmp where parent_id=@pv
Result:
id name parent_id 20 category2 19 21 category3 20 22 category4 21 26 category24 22
… with left join:
select @pv:=p1.id as id , p2.name as parent_name , p1.name name , p1.parent_id from products p1 join (select @pv:=19)tmp left join products p2 on p2.id=p1.parent_id -- optional join to get parent name where p1.parent_id=@pv
The solution of @tincot to list all child’s:
select id, name, parent_id from (select * from products order by parent_id, id) products_sorted, (select @pv := '19') initialisation where find_in_set(parent_id, @pv) > 0 and @pv := concat(@pv, ',', id)
Test it online with Sql Fiddle and see all results.
You can do it like this in other databases quite easily with a recursive query (YMMV on performance).
The other way to do it is to store two extra bits of data, a left and right value. The left and right value are derived from a pre-order traversal of the tree structure you’re representing.
This is know as Modified Preorder Tree Traversal and lets you run a simple query to get all parent values at once. It also goes by the name “nested set”.
Just use BlueM/tree php class for make tree of a self-relation table in mysql.
Tree and Tree\Node are PHP classes for handling data that is structured hierarchically using parent ID references. A typical example is a table in a relational database where each record’s “parent” field references the primary key of another record. Of course, Tree cannot only use data originating from a database, but anything: you supply the data, and Tree uses it, regardless of where the data came from and how it was processed. read more
Here is an example of using BlueM/tree:
<?php require '/path/to/vendor/autoload.php'; $db = new PDO(...); // Set up your database connection $stm = $db->query('SELECT id, parent, title FROM tablename ORDER BY title'); $records = $stm->fetchAll(PDO::FETCH_ASSOC); $tree = new BlueM\Tree($records); ...
It’s a category table.
SELECT id, NAME, parent_category FROM (SELECT * FROM category ORDER BY parent_category, id) products_sorted, (SELECT @pv := '2') initialisation WHERE FIND_IN_SET(parent_category, @pv) > 0 AND @pv := CONCAT(@pv, ',', id)