Finding Posts in Multiple Categories

January 8, 2010

When you're browsing around a blog, it is a big help to be able to browse posts by category -- it helps you find related posts you might be interested in. For example, if you're reading a post on something having to do with mysql, you can click on a link to see other similar posts.

That's pretty easy, and you can do it like this:

$sql = 'SELECT post_id
    FROM categories_posts
    WHERE category_id = '.$cat_id;

But what if you want to see posts in two categories, say mysql and php? You can't adjust the query to use an AND, because that just wouldn't work, and you can't use the IN keyword, because that equates to a logical OR. So what do you do? One possibility is doing two separate queries and comparing the results. That's not terrible efficient especially when you get more categories to compare. So, here's what I did:

$sql = "
    SELECT Post.id
    FROM (
        SELECT post_id, COUNT(category_id) as num
        FROM categories_posts
        WHERE category_id IN (".implode(",", $cat_ids).")
        GROUP BY post_id
        HAVING num=".count($cat_ids)."
    ) postids
    LEFT JOIN posts as Post ON post_id=Post.id
";

The inner query uses the IN keyword to find all posts in any of the categories, and then groups them by the post id and counting how many rows were found. Then the HAVING keyword filters the post group results to only those rows that have the same number of results as categories I asked for. So, If I query for posts in "mysql and php", it finds all rows with either, groups them by the post_id, and then counts up how many rows are grouped.

If a post was tagged with just PHP, the number would be one. Just MySQL would be the same. Posts without either MySQL or PHP wouldn't show because of the IN command, but any post with both PHP and MySQL would have a 2, and be found by the query. The outer query is designed to return the list of post ids similar to Cake's query results, so the data can be reused later to get comment and author data.