I'm already here in KY
and have been to bed at least once (and now I'm actually on the right time, really, I was going to bed/getting up in American time for a while!)
OK, let me tackle the limits stuff first, since that's the most pressing matter from my perspective. mediumint(8) allows for something like 8.3 million topics, even if it's signed (16,777,215 if unsigned), now I can't imagine a forum having that many topics + attachments.
But also consider how many media items you'd have to have to get to that stage, and more importantly how big they'd be. Even at 1KB each, you'd be in to the GBs of storage to cope with it.
On a wider scale, though, you also have to consider that if you have to actually consider many millions (more than a mediumint can hold), you might want to be considering the next field up from an int, which is a bigint, and that causes *many* more problems. Going over the 32 bit threshold is a potential problem on many levels, more than you really want to be diagnosing.
But realistically, a forum that is even close to approaching that is going to require specialist attention long before they get that big, and the single biggest SMF forum I know has 19.7m posts in 520k topics (warriorcatsrpg.com),
or, alternatively I looked at Nairaland which used to run SMF, which currently has 913k topics
As far as the context goes, everything else you've pointed out is fine as far as I'm concerned.
Lastly, re negative ids, there really aren't that many instances of it but we should certainly look through it - ideally every time we find a case where it happens, we should add a comment to the SQL to indicate that. Once we know which columns have to remain signed, we can make everything else unsigned ;)
I'm honestly thinking we don't need to change the topic id column at this time, just because I don't know any forum that is going to get anywhere near it - and again, if they are even remotely close to it, they're going to run into other problems long before that which require us taking a look and helping optimise it anyway, and we can deal with scaling it closer to time.
mediumint unsigned is fine for topic and media item ids as far as I'm concerned.
The number of bytes is not a huge issue; MySQL quite happily uses 1, 2, 3 and 4 byte fields with impunity. Since MySQL does even have bitwise fields that actually work at the bit level, and it packs rows itself, it's not a really huge issue.
Of a more practical issue is the fact that all the indexes which use that field (of which there are more than one AFAIR) will all remain the same size rather than growing.
I guess that if someone's forum becomes huge, they'll have all the time they need to simply change the field sizes in the database because, thankfully, that can be done even through phpMyAdmin in a few steps..
Oh, so a 3-byte field will be packed, like, compressed..?
I don't see what that means?
The other kind is where it is not free, because it requires rewriting all the data in every row to align it to the new size. Changing the size or type of columns (mediumint to int, or the other way for example, or changing the type from numeric to textual etc) is very expensive on large tables as a result - and naturally it's usually fully table locking in the process. (Certain InnoDB operations can be done without it being a full table lock but generally assume it is)
Also note that adding a column to an existing table has exactly the same problem. For most tables it's not preposterously expensive in most cases, because in most cases people with 100k+ posts don't generally go modifying their DB too much unless absolutely necessary because once they get to that size they're already aware of the limitations.
No, it's not compressed, but it is byte-packed, meaning that fields that don't need full bytes don't generally consume full bytes, but they pack all the fields in to use the least number of whole bytes.
id_topics appear in the indexes in multiple places. Take wedge_messages, of the litany of indexes that table has, four of them have id_topic as part of those indexes, and as a result of that, expanding the id_topic field to 4 bytes means we don't lose an extra byte per row, we lose that multiplied by 5 (1 for the row, 4 for indexes), so we gain 5 bytes per message, just in that one case, and in the case of indexes, we have a vested interest in keeping them lean.
And benchmarking is something I'm not sure about doing for now... (Well, maybe we can use John's latest plugin for that :P)
Only one vote in the poll? :sob:
How would that work...? That plugin doesn't benchmark anything...
I haven't decided whether to vote or A or B. For B, how would viewing albums work?
Pete, that reminds me of this post I saw this week...
No, but you get a lot of data to work with to be able to understand how running a larger forum is likely to handle the extra load.
That's a very good question. What it means to me is that MessageIndex.php needs an overhaul, possibly loadBoard() needs to identify the type of board involved and when we would otherwise divert to MessageIndex, we figure out which file needs to be loaded for that type of board.
Anyway, yes, the point is correct: if you do a query where you're sorting by any column and there's a TEXT or larger field in there, it will generate a disk-based temporary table (this is what a filesort is). Getting around it using a subselect is an interesting way of doing it, but essentially yes it's correct, it fools the query parser as to how the table is generated and it means it isn't implicitly a filesort. At that point you just have to hope MySQL can get enough memory to do it in memory otherwise you end up with a filesort anyway.