This topic was marked solved by its starter, on December 14th, 2012, 01:23 AM
SMF bug 4956 (slash in cache key causes cache to fail)

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #30, on April 17th, 2012, 07:03 PM »
Quote from Arantor on April 16th, 2012, 11:33 PM
There is, $settings['settings_updated']. But the settings are also cached...
My, you always have an answer to everything... :lol:
So, in order to retrieve the cache hash, I would need to first get the cache hash and then retrieve the cache based on that cache hash... THEN I can know what the cache hash is :^^;:
Quote
I'm not saying md5 vs base64, I'm saying something vs nothing where the result is not trivially guessable by a hacker.
Hmm, I was thinking... md5() is a 128-bit number, so it's something like at least a billion billion billion billion... Thinking about it, since we can barely have more than 2000 files in a folder, it's very, very unlikely to get a collision -- ever... No?
Quote
But the base64 can still generate characters that are invalid.
I forgot about the slash being a valid base64 char. Oh, my... They should have chosen something else! It's not like there are so few valid characters for an URI...
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #31, on April 17th, 2012, 07:11 PM »
Quote from nend on April 17th, 2012, 06:12 AM
How about convert the string to something that can't possibly be a collision risk and file name safe. I just thought of this while writing this post and ran a test in the middle of the post, bin2hex. From test I am getting real fast speeds from this function, the only problem I forsee though is filename limits.
Hmm, I like that... bin2hex seems to be fast indeed.
I was also looking into sha1(), crypt('sha256') and pack() (there are plenty of options to convert these), or even using base_convert() to automatically use alphanumeric characters only...
But bin2hex could be just as simple :P
Although not as secure as a md5() on boardurl.filemtime...

Oh, Pete -- how about we calculate that md5 hash only ONCE per page...? It seems stupid not to do it. We could simply store it in $context['cache_hash'] or something... If empty, just fill it in...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #32, on April 17th, 2012, 08:35 PM »
Quote
My, you always have an answer to everything...
Well, I've had to work with it in the past, hence I did know about it. Plus there are plenty of places where it actually checks settings_updated to avoid calling on cache (usually where on level 2+ where the cache key is the only thing, there won't be a way of invalidating it so easily)
Quote
So, in order to retrieve the cache hash, I would need to first get the cache hash and then retrieve the cache based on that cache hash... THEN I can know what the cache hash is :^^;:
It is sort of complicated, yes :P
Quote
Hmm, I was thinking... md5() is a 128-bit number, so it's something like at least a billion billion billion billion... Thinking about it, since we can barely have more than 2000 files in a folder, it's very, very unlikely to get a collision -- ever... No?
It's not quite like that, no. The problem with cache keys is that they're all in the one folder - everything goes into $cachedir, and everything would notionally get the same md5() process applied, so you're not just considering collisions across the namespace of a single album's files, but collisions across everything that the md5() is applied to, which is potentially every media item (since we only need apply the md5 if there isn't a / in the supplied cache key)

That said, md5 might be 128 bits, but it's not 128-bit wide in collision cases. It was proved a few years ago that for collision purposes, the keyspace isn't 2^128 but more like 2^40 due to weaknesses in the way it's generated.
Quote
I forgot about the slash being a valid base64 char. Oh, my... They should have chosen something else! It's not like there are so few valid characters for an URI...
Actually, there aren't that many characters truly valid in a URI, the vast majority of them have... extra meanings, and are normally just %-encoded instead.
Quote
Hmm, I like that... bin2hex seems to be fast indeed.
I was also looking into sha1(), crypt('sha256') and pack() (there are plenty of options to convert these), or even using base_convert() to automatically use alphanumeric characters only...
But bin2hex could be just as simple :P
Although not as secure as a md5() on boardurl.filemtime...
Certainly it's fast, and for short keys it's faster than even base64. I'd avoid crypt, though, there were issues with it and related functions in 5.3.7, pack() is really just a super-sized version of what bin2hex is doing anyway, heh.

You're right, it's not as secure as md5ing the filemtime, and that's reasonably important for the sake of safety.
Quote
Oh, Pete -- how about we calculate that md5 hash only ONCE per page...? It seems stupid not to do it. We could simply store it in $context['cache_hash'] or something... If empty, just fill it in...
Works for me. :)
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #33, on April 18th, 2012, 05:44 PM »
Quote from Arantor on April 17th, 2012, 08:35 PM
It's not quite like that, no. The problem with cache keys is that they're all in the one folder - everything goes into $cachedir, and everything would notionally get the same md5() process applied, so you're not just considering collisions across the namespace of a single album's files, but collisions across everything that the md5() is applied to, which is potentially every media item (since we only need apply the md5 if there isn't a / in the supplied cache key)
We could also 'simply' create subfolders in /cache/ named after the cache hash... And store files in them (without the starting hash, of course.)
At least it wouldn't break/kill the cache (at least too soon) if files can't be removed.
Quote
That said, md5 might be 128 bits, but it's not 128-bit wide in collision cases. It was proved a few years ago that for collision purposes, the keyspace isn't 2^128 but more like 2^40 due to weaknesses in the way it's generated.
I see.
Quote
Actually, there aren't that many characters truly valid in a URI, the vast majority of them have... extra meanings, and are normally just %-encoded instead.
I guess we could also use urlencode() or something... :lol:
Quote
Quote
Oh, Pete -- how about we calculate that md5 hash only ONCE per page...? It seems stupid not to do it. We could simply store it in $context['cache_hash'] or something... If empty, just fill it in...
Works for me. :)
That's a pointless micro-optimization, in the end. It's obvious that PHP caches that value.
I saved the hash into $settings['cache_hash'] (I know it's not a good idea to use that global, but I don't want to pull $context into itself within the cache functions... Bad karma?), and did some benchmarking. (1000 calls to retrieve 'settings', which is the bigger cache file...)

SVN version with md5, filemtime and base64: 0.45s in average (Which is probably already much faster than in SMF...)
My version with cache hash and bin2hex: 0.44s in average

I'm not kidding you...
I also tried removing the @ in front of the include call, to no effect.
Removing the include entirely gave me results between 0.01s and 0.03s, very unstable, so it's hard to determine which is faster. So I guess I'll still commit this, but not in an enthusiastic way :P

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #34, on April 18th, 2012, 07:23 PM »
Quote
We could also 'simply' create subfolders in /cache/ named after the cache hash... And store files in them (without the starting hash, of course.)
At least it wouldn't break/kill the cache (at least too soon) if files can't be removed.
There are problems with doing that, namely that for every folder, you'd have to make sure that index.php/.htaccess were also added to each folder. Honestly it would be better not to do that and flatten its structure.
Quote
I guess we could also use urlencode() or something...
Is % a valid filename character? ;)
Quote
That's a pointless micro-optimization, in the end. It's obvious that PHP caches that value.
I saved the hash into $settings['cache_hash'] (I know it's not a good idea to use that global, but I don't want to pull $context into itself within the cache functions... Bad karma?), and did some benchmarking. (1000 calls to retrieve 'settings', which is the bigger cache file...)
Interesting. I guess it's hard to really optimise something like that properly.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #35, on April 18th, 2012, 07:40 PM »
Quote from Arantor on April 18th, 2012, 07:23 PM
There are problems with doing that, namely that for every folder, you'd have to make sure that index.php/.htaccess were also added to each folder.
Sure. Of course people would first have to be able to find the folder... ;)
And if htaccess works -- it's only needed in the parent folder anyway.
Quote
Is % a valid filename character? ;)
If it isn't, I'll eat you! :ph34r:
Quote
Interesting. I guess it's hard to really optimise something like that properly.
include() can be (relatively) slow. Heck, simply loading SMF/Wedge's basic files (load, subs...) easily takes a few tenths of a second!

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #36, on April 18th, 2012, 07:41 PM »
Quote
And if htaccess works -- it's only needed in the parent folder anyway.
That's not strictly true. There are configurations of Apache that do not allow cascading. And not all servers run Apache (I'm looking at moving to nginx, for example)
Quote
include() can be (relatively) slow. Heck, simply loading SMF/Wedge's basic files (load, subs...) easily takes a few tenths of a second!
Yeah, I know. Part of that is the pure I/O and part of that is the parsing stage.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #37, on April 18th, 2012, 07:44 PM »
Quote from Arantor on April 18th, 2012, 07:41 PM
That's not strictly true. There are configurations of Apache that do not allow cascading. And not all servers run Apache (I'm looking at moving to nginx, for example)
Why the hell do programmers like to fuck with us every once in a while? :(
Quote
Yeah, I know. Part of that is the pure I/O and part of that is the parsing stage.
That's where php bytecode cache (or whatever it's called) probably helps the most...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #38, on April 18th, 2012, 07:47 PM »
Quote
Why the hell do programmers like to fuck with us every once in a while?
Apache is... interesting.
Quote
That's where php bytecode cache (or whatever it's called) probably helps the most...
Oh hell yes. Mind you if you're using that, you almost certainly get a proper memory cache anyway.