Carrying out 500000 iterations
String: `something short`
MD5: 0.62707901000977 seconds
SHA1: 0.62861299514771 seconds
B64: 0.44805788993835 seconds
B64/sstrtr: 1.0246500968933 seconds
B64/str_replace: 1.4543380737305 seconds
String: `something much much much much much longer that we can use to benchmark these things.`
MD5: 0.67529892921448 seconds
SHA1: 0.75934195518494 seconds
B64: 0.47315502166748 seconds
B64/sstrtr: 1.0639209747314 seconds
B64/str_replace: 1.5031769275665 seconds
String: `something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance`
MD5: 0.74138998985291 seconds
SHA1: 0.8517210483551 seconds
B64: 0.50031614303589 seconds
B64/sstrtr: 1.154757976532 seconds
B64/str_replace: 1.6336131095886 seconds
<?php
$strings = array(
'something short',
'something much much much much much longer that we can use to benchmark these things.',
'something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance',
);
$iters = 500000;
echo 'Carrying out ', $iters, ' iterations<br>';
foreach ($strings as $str)
{
echo '<hr><br>String: `', $str, '`<br>';
$time_start = microtime(true);
for ($i = 0; $i < $iters; $i++)
$dummy = md5($str);
$time_end = microtime(true);
echo ' MD5: ', $time_end - $time_start, ' seconds<br>';
$time_start = microtime(true);
for ($i = 0; $i < $iters; $i++)
$dummy = sha1($str);
$time_end = microtime(true);
echo ' SHA1: ', $time_end - $time_start, ' seconds<br>';
$time_start = microtime(true);
for ($i = 0; $i < $iters; $i++)
$dummy = base64_encode($str);
$time_end = microtime(true);
echo ' B64: ', $time_end - $time_start, ' seconds<br>';
$time_start = microtime(true);
for ($i = 0; $i < $iters; $i++)
$dummy = strtr(base64_encode($str), '+/=', '-__');
$time_end = microtime(true);
echo ' B64/sstrtr: ', $time_end - $time_start, ' seconds<br>';
$time_start = microtime(true);
$s = array('+', '/', '=');
$r = array('-', '_', '_');
for ($i = 0; $i < $iters; $i++)
$dummy = str_replace($s, $r, base64_encode($str));
$time_end = microtime(true);
echo ' B64/str_replace: ', $time_end - $time_start, ' seconds<br>';
}Carrying out 500000 iterations
String: `something short`
MD5: 0.34987902641296 seconds
SHA1: 0.48932695388794 seconds
B64: 0.16412401199341 seconds
B64/sstrtr: 0.57090711593628 seconds
B64/str_replace: 0.92343997955322 seconds
sstrtr: 0.41409516334534 seconds
str_replace: 0.85998892784119 seconds
String: `something much much much much much longer that we can use to benchmark these things.`
MD5: 0.48175501823425 seconds
SHA1: 0.70720291137695 seconds
B64: 0.23448801040649 seconds
B64/sstrtr: 0.6997401714325 seconds
B64/str_replace: 1.0725769996643 seconds
sstrtr: 0.48645901679993 seconds
str_replace: 0.88212299346924 seconds
String: `something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance`
MD5: 0.52688503265381 seconds
SHA1: 0.93148612976074 seconds
B64: 0.32354092597961 seconds
B64/sstrtr: 0.91212201118469 seconds
B64/str_replace: 1.3229742050171 seconds
sstrtr: 0.60456418991089 seconds
str_replace: 1.0461859703064 seconds
Bickering about a bug I reported to SMF, lol.
So far all I know about it is it affects file caching for sure, as to any other caching means, I am not sure.
However one could argue it is up to the developer to send safe keys to the cache but I am sure some will not follow.
My test results, I added a few on there, it looks like strtr by itself would be allot better.
$key = md5($boardurl . filemtime($sourcedir . '/Collapse.php')) . '-Wedge-' . strtr($key, ':', '-'); $key = base64_encode($key);I'm fairly sure SMG had no cache when I made it...or did it?
Also, I'm not exactly sure why it absolutely needs to retrieve filemtime here, is it for security reasons? (i.e. not allowing a user to find the correct URL for the cache...)
And voilĂ , problem solved... And supercharged cache
Instead of the filemtime, I'd really (really) rather we have a $settings variable that we increment when we ask for a cache flush...
I don't really see security being any better if using md5() over base64. What can happen?
In any case, I don't see the need for storing $boardurl in the file name. And the base64 encoding should replace the strtr, at least that is proven to be as fast, or faster.
Carrying out 500000 iterations
String: `something short`
MD5: 0.32972002029419 seconds
SHA1: 0.47672295570374 seconds
B64: 0.17632007598877 seconds
B64/strtr: 0.57628583908081 seconds
B64/str_replace: 0.93950510025024 seconds
bin2hex: 0.16528415679932 seconds
String: `something much much much much much longer that we can use to benchmark these things.`
MD5: 0.42671203613281 seconds
SHA1: 0.69267106056213 seconds
B64: 0.22752213478088 seconds
B64/strtr: 0.70388197898865 seconds
B64/str_replace: 1.0751059055328 seconds
bin2hex: 0.25826096534729 seconds
String: `something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance`
MD5: 0.51122808456421 seconds
SHA1: 0.90426898002625 seconds
B64: 0.30805087089539 seconds
B64/strtr: 0.89947199821472 seconds
B64/str_replace: 1.3907599449158 seconds
bin2hex: 0.33286499977112 seconds
There is, $settings['settings_updated']. But the settings are also cached...
I'm not saying md5 vs base64, I'm saying something vs nothing where the result is not trivially guessable by a hacker.
But the base64 can still generate characters that are invalid.
How about convert the string to something that can't possibly be a collision risk and file name safe. I just thought of this while writing this post and ran a test in the middle of the post, bin2hex. From test I am getting real fast speeds from this function, the only problem I forsee though is filename limits.
My, you always have an answer to everything...
So, in order to retrieve the cache hash, I would need to first get the cache hash and then retrieve the cache based on that cache hash... THEN I can know what the cache hash is :^^;:
Hmm, I was thinking... md5() is a 128-bit number, so it's something like at least a billion billion billion billion... Thinking about it, since we can barely have more than 2000 files in a folder, it's very, very unlikely to get a collision -- ever... No?
I forgot about the slash being a valid base64 char. Oh, my... They should have chosen something else! It's not like there are so few valid characters for an URI...
Hmm, I like that... bin2hex seems to be fast indeed.
I was also looking into sha1(), crypt('sha256') and pack() (there are plenty of options to convert these), or even using base_convert() to automatically use alphanumeric characters only...
But bin2hex could be just as simple :P
Although not as secure as a md5() on boardurl.filemtime...
Oh, Pete -- how about we calculate that md5 hash only ONCE per page...? It seems stupid not to do it. We could simply store it in $context['cache_hash'] or something... If empty, just fill it in...
It's not quite like that, no. The problem with cache keys is that they're all in the one folder - everything goes into $cachedir, and everything would notionally get the same md5() process applied, so you're not just considering collisions across the namespace of a single album's files, but collisions across everything that the md5() is applied to, which is potentially every media item (since we only need apply the md5 if there isn't a / in the supplied cache key)
That said, md5 might be 128 bits, but it's not 128-bit wide in collision cases. It was proved a few years ago that for collision purposes, the keyspace isn't 2^128 but more like 2^40 due to weaknesses in the way it's generated.
Actually, there aren't that many characters truly valid in a URI, the vast majority of them have... extra meanings, and are normally just %-encoded instead.
Works for me. :)Quote Oh, Pete -- how about we calculate that md5 hash only ONCE per page...? It seems stupid not to do it. We could simply store it in $context['cache_hash'] or something... If empty, just fill it in...
We could also 'simply' create subfolders in /cache/ named after the cache hash... And store files in them (without the starting hash, of course.)
At least it wouldn't break/kill the cache (at least too soon) if files can't be removed.
I guess we could also use urlencode() or something...
That's a pointless micro-optimization, in the end. It's obvious that PHP caches that value.
I saved the hash into $settings['cache_hash'] (I know it's not a good idea to use that global, but I don't want to pull $context into itself within the cache functions... Bad karma?), and did some benchmarking. (1000 calls to retrieve 'settings', which is the bigger cache file...)
There are problems with doing that, namely that for every folder, you'd have to make sure that index.php/.htaccess were also added to each folder.
Is % a valid filename character? ;)
Interesting. I guess it's hard to really optimise something like that properly.
And if htaccess works -- it's only needed in the parent folder anyway.
include() can be (relatively) slow. Heck, simply loading SMF/Wedge's basic files (load, subs...) easily takes a few tenths of a second!
That's not strictly true. There are configurations of Apache that do not allow cascading. And not all servers run Apache (I'm looking at moving to nginx, for example)
Yeah, I know. Part of that is the pure I/O and part of that is the parsing stage.
Why the hell do programmers like to fuck with us every once in a while?
That's where php bytecode cache (or whatever it's called) probably helps the most...