It's because MySQL's UTF-8 before 5.5 is "fake UTF-8"? It only supports 3 bytes per character max., so there is no way to directly store unicode characters plane in MySQL. :geek:
So we should lift up to 5.5?
My test strings just have regular accents that are encoded to 2 bytes in UTF8. 4-byte chars in UTF8 are, AFAIK, exclusively for CJK languages. (Especially Chinese Traditional & Simplified -- probably represents about 80.000 characters.)
My local install has MySQL 5.5, and (still) the same problem.
This will bring the advantage of using "real UTF-8 support" (so you should not bother with serialize errors) and TEXT are also stored inline. :)
MyISAM is still the de-facto solution for fast reads. I could be convinced to go InnoDB for a few tables, and memory for the session table (if not already done), but not much else...
And from my personal point of view:
MySQL is almost at the level where InnoDB can perform fulltext search and MyISAM will be almost completely outdated and Wedge is the next generation forum software. So why .... ?
Okay, so I've done another test by turning my TEXT field into a BLOB and then into a VARCHAR(65535)... And, it didn't change anything. I'm still getting the same contents. Perhaps the problem is happening at store time, rather than retrieve time, I don't know.
I'm still at a loss about what I should do. Remain with all serialize() calls (because there are TONS of theme in the Wedge codebase, and they all work, just look at the number of serialized items stored in $settings!), or convert everything to json strings (shorter, a bit slower to load)..?
Posted: February 25th, 2014, 12:09 AM
Did another run of my original code, the one that caused the bug...
Guess what. It worked just fine this time. >_<
So... I've got to suppose, it's not related to the serializing process at all. Uh... Then why did it fail for Wedge.org..?
Oh, maybe it's working on my local install, but not on the Wedge.org MySQL database... I don't know. Maybe it's not set up as UTF, or something.