Monday, January 02, 2006

Hail to GROUP_CONCAT!

MySQL's GROUP_CONCAT command is a phantastic thing, it can simplify big efforts to a small piece of work that would not even be worth mentioning - if the way that the problem is being solved wouldn't be so nice ;-).

The reason why I'm writing this is because I have to work on such a task with PostgreSQL, and I already know that it will be quite a tough piece of work, which could be solved with MySQL within seconds. So to everybody, who doesn't still know the power of the GROUP_CONCAT aggrigate function, here's a little example, which I think is self-explaining:
mysql> CREATE TABLE group_concat_test (
-> id INT UNSIGNED NOT NULL auto_increment PRIMARY KEY,
-> _key INT UNSIGNED NOT NULL,
-> _value CHAR(1) NOT NULL);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO group_concat_test
-> (_key, _value) VALUES
-> (1, 'b'),
-> (1, 'c'),
-> (2, 'a'),
-> (2, 'd'),
-> (2, 'c'),
-> (3, 'b'),
-> (3, 'e');
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0

mysql> SELECT _key,
-> COUNT(*) as cnt,
-> GROUP_CONCAT(_value ORDER BY _value SEPARATOR ', ')
-> as value_list
-> FROM group_concat_test
-> GROUP BY _key
-> ORDER BY _key;
+------+-----+------------+
| _key | cnt | value_list |
+------+-----+------------+
| 1 | 2 | b, c |
| 2 | 3 | a, c, d |
| 3 | 2 | b, e |
+------+-----+------------+
3 rows in set (0.00 sec)

mysql>

The GROUP_CONCAT function is available since MySQL 4.1, so for everyone still running an older version, this would be one more (of many) reason to update.

10 comments:

gmax said...

I have had GROUP_CONCAT in my bag of tricks for quite a while, and I agree with your enthusiasm.
Be careful with GROUP_CONCAT, though, because it can bite you if you don't take into account its limits.

Try this:

select group_concat(column_name) from information_schema.columns where table_schema='mysql'\G

That will return a partial string and a warning.

show warnings\G *************************** 1. row ***************************
Level: Warning
Code: 1260
Message: 1 line(s) were cut by GROUP_CONCAT()


The reason of the warning is in GROUP_CONCAT maximum length.

select @@group_concat_max_len\G
*************************** 1. row ***************************
@@group_concat_max_len: 1024

If you set @@group_concat_max_len to a large enough value, it will accomodate your data. You have just to keep this in mind.

Giuseppe

David Fetter said...

As is often the case, PostgreSQL has a more flexible solution with
fewer gotchas, but it takes more work and/or specialized knowlege to
find it.

Let's say you've got your array_accum_test table. Now, do
CREATE AGGREGATE array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);

per http://www.postgresql.org/docs/current/static/xaggr.html

You only need to do this once.

Next,

SELECT
_key,
count(*) AS "cnt",
array_to_string( /* built-in */
array_accum(_value)
, ', '
)
FROM
array_accum_test
GROUP BY _key
ORDER BY _key;

If you insist on doing this only using built-ins (But why? PostgreSQL
is built to be extensible for a reason!), and you're willing to take a small speed penalty, you can do it like this:


SELECT
t._key,
count(t.*) AS "cnt",
array_to_string( /* built-in */
ARRAY(
SELECT _value
FROM array_accum_test
WHERE _key = t._key
ORDER BY _value /* pwnz0r3d!
Try that with group_concat! */
)
, ', '
)
FROM
array_accum_test AS "t"
GROUP BY t._key
ORDER BY t._key;

Markus Popp said...

Thank you guys for the additional infos :-).

David Fetter said...

Stop in at irc://irc.freenode.net/postgresql if you like. There are lots of friendly, helpful people there. :)

Paul Carey said...

A litte problem I had with the group_concat function was when selecting integers only I was getting a blob instead of a string to solve this I used a cast to convert the integers first:

SELECT GROUP_CONCAT(CAST(myInt as CHAR)) myInts FROM aTable;

Anonymous said...

However, I have found that this function actually omits values when aggregating them. For example, I would like to aggregate the sequence of status changes (like "created","in process", "on-hold" etc) for a purchase order. Now I have discovered that sometimes it simply omits values. There seems to be no pattern to which values are lost - it really looks random. Trawling through the bug reports from MySQL has not really helped. Shame really, since th function is just great.

Artem Russakovskii said...

Anonymous, are you sure you're using a LEFT JOIN rather than an INNER JOIN? This could be why some values are omitted.

Tudor Olariu said...

For David Fetter:

I beg to differ: Your postgres solution is much less versatile. All your Postgres code (including your aggregate function and the two selects) can be easily replaced by:

select group_concat(name order by surname) from persons;

GaneshXL said...

Thanks a lot for sharing this. I've discovered this post after googling for "crosstab"; glad I found it. This will indeed make lots of things easier in the future :)

Anonymous said...

I like the idea of GROUP_CONCAT(), but unfortunately its implementation is filled with bugs. See the MySQL Changelogs. Garbled text (!!!), mixed charsets, missing values, segfaults, etc. I've been bitten by bugs in GROUP_CONCAT(), and even as I write this the 5.0.32 version that comes in Debian stable still has serious bugs in it.