MySQL: query with two many to many relations and duplicates, with double intermediate table
This question is about selecting data across many-to-many relations in MySQL. Is related to another two questions, but with some differences:
- MySQL: query with two many to many relations and duplicates
- MySQL: query with two many to many relations and duplicates, with full data from subqueries
Those questions used a simple mockup database with simple many to many relations:
article article_author author article_tag tag
Now I will introduce next level of complexity. We want each author to be able to tag each of their articles. Thus, we will connect tags
to the intermediate table article_author
instead of to author directly.
article article_author author article_author_tag tag
Here is in MySQL:
CREATE TABLE `article` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`) ); CREATE TABLE `author` ( `id` INT NOT NULL, `name` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`) ); CREATE TABLE `tag` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ); CREATE TABLE `article_author` ( `id` int NOT NULL AUTO_INCREMENT, `author_id` INT NOT NULL, `article_id` int NOT NULL, `createdAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `unique_index` (`author_id`,`article_id`), KEY `fk_article_author_author1_idx` (`author_id`), KEY `fk_article_author_article1_idx` (`article_id`), CONSTRAINT `fk_article_author_article1` FOREIGN KEY (`article_id`) REFERENCES `article` (`id`), CONSTRAINT `fk_article_author_author1` FOREIGN KEY (`author_id`) REFERENCES `author` (`id`) ); CREATE TABLE `article_author_tag` ( `article_author_id` int NOT NULL, `tag_id` int NOT NULL, PRIMARY KEY (`article_author_id`,`tag_id`), KEY `fk_article_author_tag_article_author1_idx` (`article_author_id`), KEY `fk_article_author_tag_tag1_idx` (`tag_id`), CONSTRAINT `fk_article_author_tag_article_author1` FOREIGN KEY (`article_author_id`) REFERENCES `article_author` (`id`), CONSTRAINT `fk_article_author_tag_tag1` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) ); INSERT INTO article (id, name) VALUES (1, 'first article'), (2, 'second article'); INSERT INTO `author` (id, name) VALUES (1, 'first author'), (2, 'second author'); INSERT INTO tag (id, name) VALUES (1, 'first tag'), (2, 'second tag'); INSERT INTO article_author (author_id, article_id) VALUES (1, 1), (2, 1); INSERT INTO article_author_tag (article_author_id, tag_id) VALUES (1, 1), (1, 2), (2, 1), (2, 2);
And now, I want just to select the tags that authors of an article used to tag it, as a JSON array; but I can’t get rid of duplicates:
SELECT JSON_ARRAYAGG(tag.id) FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1;
Here it is in a db<>fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=253f30ecd2f87b06c3894ef02b2ee35d
Any idea how can I get rid of them?
Edit: I can do it with CONCAT and GROUP_CONCAT, and then casting to JSON. But it looks quite hacky:
SELECT CAST(CONCAT('[', GROUP_CONCAT(DISTINCT tag.id SEPARATOR ','), ']') AS JSON) AS tags FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1;
Here it is in a db<>fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=20087a9036acb00637be8d2f58747ba5
Any other idea will be welcome!
There is no distinct
functionality for json yet (something like JSON_ARRAYAGG(distinct tag.id)
), but there is a common workaround for it:
SELECT JSON_EXTRACT(JSON_OBJECTAGG(tag.id,tag.id),"$.*") FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1;
JSON_OBJECTAGG
works as an implict distinct, because json tags are distinct by definition, so adding {"1": 1}
twice results in just one of those remaining. Afterwards, you JSON_EXTRACT
just the values to get the format you intended (e.g. without the artificially added tags).
Another method would be to feed the json function with the already correct, distinct data:
SELECT JSON_ARRAYAGG(id) FROM ( SELECT distinct tag.id FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1 ) subquery;
You first prepare the data the way you want it (e.g. the distinct tag-ids), then use JSON_ARRAYAGG
to format your output.