The reason why I'm writing this is because I have to work on such a task with PostgreSQL, and I already know that it will be quite a tough piece of work, which could be solved with MySQL within seconds. So to everybody, who doesn't still know the power of the GROUP_CONCAT aggrigate function, here's a little example, which I think is self-explaining:
mysql> CREATE TABLE group_concat_test (
-> id INT UNSIGNED NOT NULL auto_increment PRIMARY KEY,
-> _key INT UNSIGNED NOT NULL,
-> _value CHAR(1) NOT NULL);
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO group_concat_test
-> (_key, _value) VALUES
-> (1, 'b'),
-> (1, 'c'),
-> (2, 'a'),
-> (2, 'd'),
-> (2, 'c'),
-> (3, 'b'),
-> (3, 'e');
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> SELECT _key,
-> COUNT(*) as cnt,
-> GROUP_CONCAT(_value ORDER BY _value SEPARATOR ', ')
-> as value_list
-> FROM group_concat_test
-> GROUP BY _key
-> ORDER BY _key;
+------+-----+------------+
| _key | cnt | value_list |
+------+-----+------------+
| 1 | 2 | b, c |
| 2 | 3 | a, c, d |
| 3 | 2 | b, e |
+------+-----+------------+
3 rows in set (0.00 sec)
mysql>
The GROUP_CONCAT function is available since MySQL 4.1, so for everyone still running an older version, this would be one more (of many) reason to update.


10 comments:
I have had GROUP_CONCAT in my bag of tricks for quite a while, and I agree with your enthusiasm.
Be careful with GROUP_CONCAT, though, because it can bite you if you don't take into account its limits.
Try this:
select group_concat(column_name) from information_schema.columns where table_schema='mysql'\G
That will return a partial string and a warning.
show warnings\G *************************** 1. row ***************************
Level: Warning
Code: 1260
Message: 1 line(s) were cut by GROUP_CONCAT()
The reason of the warning is in GROUP_CONCAT maximum length.
select @@group_concat_max_len\G
*************************** 1. row ***************************
@@group_concat_max_len: 1024
If you set @@group_concat_max_len to a large enough value, it will accomodate your data. You have just to keep this in mind.
Giuseppe
As is often the case, PostgreSQL has a more flexible solution with
fewer gotchas, but it takes more work and/or specialized knowlege to
find it.
Let's say you've got your array_accum_test table. Now, do
CREATE AGGREGATE array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
per http://www.postgresql.org/docs/current/static/xaggr.html
You only need to do this once.
Next,
SELECT
_key,
count(*) AS "cnt",
array_to_string( /* built-in */
array_accum(_value)
, ', '
)
FROM
array_accum_test
GROUP BY _key
ORDER BY _key;
If you insist on doing this only using built-ins (But why? PostgreSQL
is built to be extensible for a reason!), and you're willing to take a small speed penalty, you can do it like this:
SELECT
t._key,
count(t.*) AS "cnt",
array_to_string( /* built-in */
ARRAY(
SELECT _value
FROM array_accum_test
WHERE _key = t._key
ORDER BY _value /* pwnz0r3d!
Try that with group_concat! */
)
, ', '
)
FROM
array_accum_test AS "t"
GROUP BY t._key
ORDER BY t._key;
Thank you guys for the additional infos :-).
Stop in at irc://irc.freenode.net/postgresql if you like. There are lots of friendly, helpful people there. :)
A litte problem I had with the group_concat function was when selecting integers only I was getting a blob instead of a string to solve this I used a cast to convert the integers first:
SELECT GROUP_CONCAT(CAST(myInt as CHAR)) myInts FROM aTable;
However, I have found that this function actually omits values when aggregating them. For example, I would like to aggregate the sequence of status changes (like "created","in process", "on-hold" etc) for a purchase order. Now I have discovered that sometimes it simply omits values. There seems to be no pattern to which values are lost - it really looks random. Trawling through the bug reports from MySQL has not really helped. Shame really, since th function is just great.
Anonymous, are you sure you're using a LEFT JOIN rather than an INNER JOIN? This could be why some values are omitted.
For David Fetter:
I beg to differ: Your postgres solution is much less versatile. All your Postgres code (including your aggregate function and the two selects) can be easily replaced by:
select group_concat(name order by surname) from persons;
Thanks a lot for sharing this. I've discovered this post after googling for "crosstab"; glad I found it. This will indeed make lots of things easier in the future :)
I like the idea of GROUP_CONCAT(), but unfortunately its implementation is filled with bugs. See the MySQL Changelogs. Garbled text (!!!), mixed charsets, missing values, segfaults, etc. I've been bitten by bugs in GROUP_CONCAT(), and even as I write this the 5.0.32 version that comes in Debian stable still has serious bugs in it.
Post a Comment