counter customizable free hit

Thursday, February 09, 2006

Cartesian Joins are Useful (Honest)

For those that don't know a cartesian join is one that produces a cartesian product. A cartesian product is the result of joining two sets of data in such a way that all rows in one set are joined with all rows in the other. In MySQL this happens when we have two tables in an SQL statement but no join statement. For example...

mysql> select * from emp_dept;
+--------+---------+
| emp_id | dept_id |
+--------+---------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+---------+
4 rows in set (0.00 sec)

mysql> select * from dept;
+---------+------+
| dept_id | name |
+---------+------+
| 1 | IT |
| 2 | HR |
+---------+------+
2 rows in set (0.00 sec)

Normally these two tables would be joined using a join condition on dept_id so that we return only the rows where a match is found.

mysql> select emp_id, name
from emp_dept e join dept d
on e.dept_id = d.dept_id;
+--------+------+
| emp_id | name |
+--------+------+
| 1 | IT |
| 2 | IT |
| 3 | HR |
| 4 | HR |
+--------+------+
4 rows in set (0.26 sec)

However if we remove the join each row in emp_dept is joined with dept if a match exists or not.

mysql> select emp_id, name from emp_dept, dept;
+--------+------+
| emp_id | name |
+--------+------+
| 1 | IT |
| 1 | HR |
| 2 | IT |
| 2 | HR |
| 3 | IT |
| 3 | HR |
| 4 | IT |
| 4 | HR |
+--------+------+
8 rows in set (0.01 sec)

This is a cartesian product. In most cases this isn't a great idea, I had a problem last year where the assumption was that there was a 1 to 1 relationship in a table join, the introduction of some code broke this rule and it resulted in a select returning 3 million rows where once it only returned 150 or so. The users were not happy when their application took 5 minutes to load when previously it took seconds.

Result sets can get large quickly because the amount of data in the select is the number of rows in Table A * the number of rows in Table B, if you have more than two tables this multiplies at an alarming rate.

But there are times when we can use this to our advantage. I'm working on a site which needs some combinations stored for a game engine, there are 6561 different combinations and I didn't fancy having to type them in by hand. So what I did was create a dummy table with all values I needed in each position, I then used a cartesian join by having a copy of the table for as many columns as I needed.

So lets take a simplified example, if we want all the combinations of 1,2 and 3 using 2 columns we can do this like so.

mysql> select * from dummy_table;
+------+
| nums |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)

mysql> select concat(d1.nums,d2.nums)
from dummy_table d1, dummy_table d2;
+-------------------------+
| concat(d1.nums,d2.nums) |
+-------------------------+
| 11 |
| 21 |
| 31 |
| 12 |
| 22 |
| 32 |
| 13 |
| 23 |
| 33 |
+-------------------------+
9 rows in set (0.00 sec)

0 Comments:

Post a Comment

<< Home