This topic was marked solved by its starter, on December 14th, 2012, 01:23 AM
SMF bug 4956 (slash in cache key causes cache to fail)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #1, on March 1st, 2012, 12:38 AM »
Two things occur to me about this.

Firstly we can of course patch cache_get_data and cache_put_data to do something with / (and \ potentially), but that won't prevent any issues if other illegal characters get inserted... I never checked what happened to something like " in a filename (which won't be legal on all filesystems) though I have a funny feeling it would be translated to its entity form (like I've seen with the shy entity)

Secondly, would it not be better to do some kind of hash on the key so that we don't inject actual names anyway?
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #3, on March 1st, 2012, 10:21 AM »
Question, are we talking about hashing the key all the time, or just the cases where it contains potentially awkward characters? If the latter, would it not be better to detect that actually in cache_*_data, and hash it there, rather than finding all the places where it's called with funny characters and hash before sending?

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #8, on April 9th, 2012, 10:08 PM »
Apart from the fact I don't know if base 64 is faster than md5 or not (I haven't benched it), I do know that base64 as PHP implements it can include +, = and / which may or may not be legal characters, and which doesn't leave us any closer to a workable solution. That would mean we'd end up with doing either a strtr or str_replace on top of the base64_encode.

What I'll do is benchmark MD5 vs SHA1 vs base64_encode for performance.
Posted: April 9th, 2012, 09:51 PM
Quote
Carrying out 500000 iterations

String: `something short`
    MD5: 0.62707901000977 seconds
    SHA1: 0.62861299514771 seconds
    B64: 0.44805788993835 seconds
    B64/sstrtr: 1.0246500968933 seconds
    B64/str_replace: 1.4543380737305 seconds

String: `something much much much much much longer that we can use to benchmark these things.`
    MD5: 0.67529892921448 seconds
    SHA1: 0.75934195518494 seconds
    B64: 0.47315502166748 seconds
    B64/sstrtr: 1.0639209747314 seconds
    B64/str_replace: 1.5031769275665 seconds

String: `something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance`
    MD5: 0.74138998985291 seconds
    SHA1: 0.8517210483551 seconds
    B64: 0.50031614303589 seconds
    B64/sstrtr: 1.154757976532 seconds
    B64/str_replace: 1.6336131095886 seconds
Code: [Select]
<?php

$strings 
= array(
'something short',
'something much much much much much longer that we can use to benchmark these things.',
'something impossibly and unrealistically long, after longer than the typical cache key will be but just in case it is worth having an idea of relative performance',
);

$iters 500000;

echo 
'Carrying out '$iters' iterations<br>';

foreach (
$strings as $str)
{
echo '<hr><br>String: `'$str'`<br>';

$time_start microtime(true);
for ($i 0$i $iters$i++)
$dummy md5($str);
$time_end microtime(true);
echo '    MD5: '$time_end $time_start' seconds<br>';

$time_start microtime(true);
for ($i 0$i $iters$i++)
$dummy sha1($str);
$time_end microtime(true);
echo '    SHA1: '$time_end $time_start' seconds<br>';

$time_start microtime(true);
for ($i 0$i $iters$i++)
$dummy base64_encode($str);
$time_end microtime(true);
echo '    B64: '$time_end $time_start' seconds<br>';

$time_start microtime(true);
for ($i 0$i $iters$i++)
$dummy strtr(base64_encode($str), '+/=''-__');
$time_end microtime(true);
echo '    B64/sstrtr: '$time_end $time_start' seconds<br>';

$time_start microtime(true);
$s = array('+''/''=');
$r = array('-''_''_');
for ($i 0$i $iters$i++)
$dummy str_replace($s$rbase64_encode($str));
$time_end microtime(true);
echo '    B64/str_replace: '$time_end $time_start' seconds<br>';
}

It's crude but illustrates the relative performance. B64 is fastest - if we can rely on it never generating invalid characters in the keys, which we can't.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #9, on April 10th, 2012, 10:33 AM »
I don't think + and = characters are invalid in any way, really...?! Whatever the file system... It's just something that's included in any standard.
If anything, to me it's less scary than the (infinitesimal) chance of collision a md5 hash may generate...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #10, on April 10th, 2012, 12:16 PM »
= isn't really an issue but + certainly used to be in Windows (though I think it's allowed in NTFS)

If you want to be sure about avoiding collisions and still fixing the problem and still being fast, instead fix where the cache key contains /... The only move is not to play.

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: SMF bug 4956 (slash in cache key causes cache to fail)
« Reply #12, on April 10th, 2012, 01:09 PM »
Windows 95 doesn't properly accept them because they are also valid syntax for its copy command to merge files, i.e. copy file1+file2.txt merged-file.txt. (Been bitten by this in the past)

It's kind of irrelevant though... + and = aren't the killers, / is, which brings us back to where we were before. I can rebenchmark it just replacing that one character out and see what difference it makes?

On a side note it does sort of challenge the long-held view that str_replace is faster than strtr...

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278