Wedge

Public area => The Pub => Off-topic => Topic started by: Dragooon on June 4th, 2012, 11:01 PM

Title: MySQL query optimization
Post by: Dragooon on June 4th, 2012, 11:01 PM
I'm fairly dense when it comes to MySQL optimization mostly because I never had to do it. But now it seems to be required as I got a DB that's bigger than anything I've managed (which hasn't been randomly populated with a bunch of rand() calls). So, basically I got a table called bills which has only one primary key and index as id_bill

I run this query
Code: [Select]
                        SELECT <columns>
                        FROM bhd_bills AS b
                                INNER JOIN bhd_accounts AS a ON (a.id_account = b.id_account)
                                INNER JOIN bhd_users AS u ON (u.id_user = b.id_raised)
                                LEFT JOIN bhd_accounts AS ua ON (ua.id_user = u.id_user)
                        WHERE cancelled = 0
                                 AND 1=1
                        ORDER BY b.id_bill DESC
                        LIMIT 0, 20
And it causes the entire table to be copied into tmp, causing timeouts as the table contains roughly ~40k rows and takes a lot of time to process, whereas dividing it into 2 queries
Code: [Select]
                        SELECT b.id_bill
                        FROM {prefix}bills AS b
                        WHERE cancelled = 0
                                AND 1=1
                        ORDER BY b.id_bill DESC
                        LIMIT 0, 20;

                        SELECT <columns>
                        FROM bhd_bills AS b
                                INNER JOIN bhd_accounts AS a ON (a.id_account = b.id_account)
                                INNER JOIN bhd_users AS u ON (u.id_user = b.id_raised)
                                LEFT JOIN bhd_accounts AS ua ON (ua.id_user = u.id_user)
                        WHERE id_bill IN (<result from above>)
It comes insanely fast. Can anyone explain why?
Title: Re: MySQL query optimization
Post by: Arantor on June 4th, 2012, 11:08 PM
Could you stick EXPLAIN in front of the first query and let me know what the output of EXPLAIN <query> says?
Title: Re: MySQL query optimization
Post by: Dragooon on June 4th, 2012, 11:10 PM
Not sure what to make of this

Code: [Select]
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 41438
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using where; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: ua
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4235
        Extra:
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: bhd.b.id_account
         rows: 1
        Extra:
4 rows in set (0.00 sec)
Title: Re: MySQL query optimization
Post by: Nao on June 4th, 2012, 11:14 PM
EXPLAIN is one of those things that make even less sense than a complex regex to me :P
Title: Re: MySQL query optimization
Post by: Arantor on June 4th, 2012, 11:19 PM
OK, so let's step through it.

The first row indicates that the first clause found 41,438 rows. It matched those rows with the WHERE, no index was used for it, it did the filtering work in a temporary table, and that temporary table was dumped into a file to sort through it.

The second row indicates it's working on the u table, and that it was using a WHERE clause again, but this time the join buffer was involved - it's using the primary key of u table to filter/find rows. That's good, and it turned up 5 relevant rows quickly.

The third row indicates that it's working on the ua table, but that it just worked on the entire table through a table scan, and apparently returned the entire table as results.

The fourth row indicates it's working on the a table, of any possible indexes it could use, it was using the primary key's index, referring back to the other table, and returned a single row.


The primary reason for the performance drain is the filesort. When you break it down into two queries, you're doing ordering still, but this time you're ordering only integers - the result can be done totally in memory, doubly so because it's a fixed-width row to be working on, and the ordering therein is actually able to be done as you go along.


@Nao, it sort of is a bit weird but I might have to lend you my copy of High Performance MySQL ;)
Title: Re: MySQL query optimization
Post by: Dragooon on June 4th, 2012, 11:25 PM
I see that makes more sense now, but I still don't get one thing. I'm using the same clauses and sorting in both cases, but why is it doing filesort in first case but not in second?
Title: Re: MySQL query optimization
Post by: Arantor on June 4th, 2012, 11:32 PM
Because you're not doing the same thing in both cases.

In the original query, you're pulling down a lot of columns, of which at least one is going to be a text column. Sorting on that always pushes it to disk, for the *entire* data set.

In the two-query set, you're pulling down a single numeric column, sorting that, and instead of applying successive filters to the data and ordering it all as you go - on disk, rather than in RAM - you're getting to cut most of the data you don't need out first, so you end up doing a lot less work on the data you do actually want.
Title: Re: MySQL query optimization
Post by: Dragooon on June 4th, 2012, 11:35 PM
Quote from Arantor on June 4th, 2012, 11:32 PM
Because you're not doing the same thing in both cases.

In the original query, you're pulling down a lot of columns, of which at least one is going to be a text column. Sorting on that always pushes it to disk, for the *entire* data set.

In the two-query set, you're pulling down a single numeric column, sorting that, and instead of applying successive filters to the data and ordering it all as you go - on disk, rather than in RAM - you're getting to cut most of the data you don't need out first, so you end up doing a lot less work on the data you do actually want.
Ohh, so SELECT is what actually causes the problems here? I see, that makes quite more sense. Thanks :)
Title: Re: MySQL query optimization
Post by: Arantor on June 4th, 2012, 11:40 PM
It's not so much the select, it's what you're getting and how you're getting it and when you do what.
Title: Re: MySQL query optimization
Post by: Dragooon on June 5th, 2012, 07:57 AM
Quote from Arantor on June 4th, 2012, 11:40 PM
It's not so much the select, it's what you're getting and how you're getting it and when you do what.
Ah...I got it. Thanks :)