MySQL: query with two many to many relations and duplicates, with double intermediate table

This question is about selecting data across many-to-many relations in MySQL. Is related to another two questions, but with some differences:

Those questions used a simple mockup database with simple many to many relations:

article article_author author article_tag tag 

Now I will introduce next level of complexity. We want each author to be able to tag each of their articles. Thus, we will connect tags to the intermediate table article_author instead of to author directly.

article article_author author article_author_tag tag 

Here is in MySQL:

CREATE TABLE `article` (   `id` int NOT NULL AUTO_INCREMENT,   `name` varchar(255) NOT NULL,   PRIMARY KEY (`id`) );  CREATE TABLE `author` (   `id` INT NOT NULL,   `name` varchar(255) DEFAULT NULL,   PRIMARY KEY (`id`),   UNIQUE KEY `name` (`name`) );  CREATE TABLE `tag` (   `id` int NOT NULL AUTO_INCREMENT,   `name` varchar(255) DEFAULT NULL,   PRIMARY KEY (`id`) );  CREATE TABLE `article_author` (   `id` int NOT NULL AUTO_INCREMENT,   `author_id` INT NOT NULL,   `article_id` int NOT NULL,   `createdAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,   PRIMARY KEY (`id`),   UNIQUE KEY `unique_index` (`author_id`,`article_id`),   KEY `fk_article_author_author1_idx` (`author_id`),   KEY `fk_article_author_article1_idx` (`article_id`),   CONSTRAINT `fk_article_author_article1` FOREIGN KEY (`article_id`) REFERENCES `article` (`id`),   CONSTRAINT `fk_article_author_author1` FOREIGN KEY (`author_id`) REFERENCES `author` (`id`) );  CREATE TABLE `article_author_tag` (   `article_author_id` int NOT NULL,   `tag_id` int NOT NULL,   PRIMARY KEY (`article_author_id`,`tag_id`),   KEY `fk_article_author_tag_article_author1_idx` (`article_author_id`),   KEY `fk_article_author_tag_tag1_idx` (`tag_id`),   CONSTRAINT `fk_article_author_tag_article_author1` FOREIGN KEY (`article_author_id`) REFERENCES `article_author` (`id`),   CONSTRAINT `fk_article_author_tag_tag1` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) );    INSERT INTO article (id, name) VALUES (1, 'first article'), (2, 'second article'); INSERT INTO `author` (id, name) VALUES (1, 'first author'), (2, 'second author'); INSERT INTO tag (id, name) VALUES (1, 'first tag'), (2, 'second tag'); INSERT INTO article_author (author_id, article_id) VALUES (1, 1), (2, 1); INSERT INTO article_author_tag (article_author_id, tag_id) VALUES (1, 1), (1, 2), (2, 1), (2, 2); 

And now, I want just to select the tags that authors of an article used to tag it, as a JSON array; but I can’t get rid of duplicates:

SELECT   JSON_ARRAYAGG(tag.id) FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1; 

Here it is in a db<>fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=253f30ecd2f87b06c3894ef02b2ee35d

Any idea how can I get rid of them?


Edit: I can do it with CONCAT and GROUP_CONCAT, and then casting to JSON. But it looks quite hacky:

SELECT    CAST(CONCAT('[', GROUP_CONCAT(DISTINCT tag.id SEPARATOR ','), ']') AS JSON) AS tags FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1; 

Here it is in a db<>fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=20087a9036acb00637be8d2f58747ba5

Any other idea will be welcome!

Add Comment
1 Answer(s)

There is no distinct functionality for json yet (something like JSON_ARRAYAGG(distinct tag.id)), but there is a common workaround for it:

SELECT JSON_EXTRACT(JSON_OBJECTAGG(tag.id,tag.id),"$.*") FROM article_author JOIN article_author_tag ON article_author_tag.article_author_id = article_author.id JOIN tag ON article_author_tag.tag_id = tag.id WHERE article_author.article_id = 1; 

JSON_OBJECTAGG works as an implict distinct, because json tags are distinct by definition, so adding {"1": 1} twice results in just one of those remaining. Afterwards, you JSON_EXTRACT just the values to get the format you intended (e.g. without the artificially added tags).

Another method would be to feed the json function with the already correct, distinct data:

SELECT JSON_ARRAYAGG(id)  FROM (   SELECT distinct tag.id   FROM article_author   JOIN article_author_tag    ON article_author_tag.article_author_id = article_author.id   JOIN tag ON article_author_tag.tag_id = tag.id   WHERE article_author.article_id = 1 ) subquery;  

You first prepare the data the way you want it (e.g. the distinct tag-ids), then use JSON_ARRAYAGG to format your output.

Answered on July 16, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.