Wedge
Public area => Bug reports => The Pub => Archived fixes => Topic started by: Nao on April 30th, 2012, 12:53 PM
-
If you go to this page:
http://wedge.org/do/media/?sa=item;in=29
There will be a cache request. The key is generated as such:
aeva-embed-link-[url=http://www.youtube.com/watch?v=OgCjpA03mOI#ws]B+ Episode 5 - باسم يوسف شو (مع تامر من غمرة) الحلقة ٥[/url]
Funny eh..?
Well, it generates an error saying the filename is too long -- and indeed, with bin2hex being used on the key, it does make it extra long...!
I don't know what would be the best fix here... Direct or indirect?
PS: sorry I'm being so silent these days... I try to keep up with reading but I definitely can't post or work on Wedge (too much) -- hectic RL. It's likely that it'll be similar in the next few days, too... I hate that.
-
I fully understand about hectic RL (I'm in a similar situation but for quite different reasons)
This sounds like the sort of thing where MD5 might work rather well? Perhaps MD5 + the YouTube ID?
-
Well... You know I'm still unsure about MD5 for these.
I don't know...
-
The point of using MD5 is to give you something reasonably unique, without being overly long and adding the video ID gives you sufficient uniqueness that it should work fine.
-
Guess so.
-
That was a simple mistake, always use hash for generating unique keys unless simple id number are practical to use.
-
True, unless there's any kind of risk of hash collision; MD5 is not nearly as well distributed as its keyspace implies it should be.
-
MD5 is terrible (from a cryptographic standpoint) and SHA1 isn't a lot better, but in non-security situations it's perfectly good enough; the collision risk is acceptable on both ('though MD5 is significantly cheaper in computing terms).
-
Well, MD5's keyspace is theoretically 2^128 but vulnerabilities in the mathematics give it a collision window of 2^40 or so, while SHA1's keyspace is 2^160 with a collision window of 2^51. While that may not mean much to most people, what it really means is that both are vulnerable for really sensitive stuff, and that it's still an order of complexity more to be able to collide SHA1 hashes.
From what's required here, even an MD5 on its own would probably be OK, and that for password hashing nothing less than SHA1 should be used (though that's a separate discussion in itself)
Mind you, while we're on the subject we might as well tackle it. SMF and Wedge (and most other forums) use a hash based on username and password, combined together then hashed for comparison purposes. SMF and Wedge also score slightly higher than most other forums by sending a user's password to the server hashed if possible. Anyway, the hash used in both cases is SHA1, and changing it has large consequences.
The biggest one, really, is about conversions, where users from all other environments (including Wedge itself during migration) would have to re-enter their password. If you're coming from a system with weaker protection, no harm, no foul, you get upgraded anyway. But if you're coming from SMF or similar, you will still have that extra step which may be off-putting to users.
In all other respects about performance, the effort of using something like SHA256 (SHA-2) in place of SHA1 is no big deal, it doesn't even require a schema change in the database because the column has been declared as varchar(64) for ages.
The one thing I do want to mention is what phpBB and WP do (when using portable hashes, anyway), you take the username and password and md5 it repeatedly, making it harder to find a brute force match. I do not like that method, I'm really not convinced it's a security benefit. But I can believe that it might be if you're only working off the username and trying to match the hash through the same process (though I can also believe there are rainbow tables for that too)
-
This cache thing isn't as simple as it looks, long keys pass the file name character limit.
I was thinking though also a while back about a master table which will be loaded once and saved once every load that the cache is used. Each cache file will be saved as say numeric or alpha numeric file name.
The cache table will figure out when a key is requested which file to load. This works differently because the key has nothing to do with the file name.
When a key needs to be saved the key will be added to the cache table array and a numeric file reference will be generated. If a collision occurs then the system just needs to generate a new file name.
'some key here' => 'file1.php',
'some other key here' => 'file2'
In a perfect world though this should work fine, but it wouldn't. There are chances the file may not be available when requested. Some system has to be in place to prevent this.
-
This cache thing isn't as simple as it looks, long keys pass the file name character limit.
Yup (as the title says, heh)
The problem with using a cache table is that it's not efficient (especially as in an ideal world even something like $modSettings as was would be properly cached, which it isn't right now)
Also note that in the proper end of caching where you're using memcached or similar, there are longer (if any) limits applied to the key names, so this is really a matter just for the file cache to contend with.
-
My opinion on cache keys is that it's the responsibility of the programmer to ensure they're not too long.
Here we had a key that could definitely be way too long, and benefited from being md5'd. Thankfully, any key that's too long will be logged in the error log and thus it makes it easier to fix it. Worst case scenario, anyway, is that the cache isn't used for that particular key ;)
Here's my code... Any weaknesses? I'm just curious. Or maybe I should use \[url[]=]([^][]+) for the pattern...? Don't remember if these formats are used as well. I guess so...
preg_match('~\[url=([^]]+)~', $item_data['embed_url'], $match);
$key = md5($match[1]) . '-' . md5($item_data['embed_url']);
(Originally, $key = $item_data['embed_url'], if you will.)
-
The one thing I do want to mention is what phpBB and WP do (when using portable hashes, anyway), you take the username and password and md5 it repeatedly, making it harder to find a brute force match. I do not like that method, I'm really not convinced it's a security benefit. But I can believe that it might be if you're only working off the username and trying to match the hash through the same process (though I can also believe there are rainbow tables for that too)
Yes, I also don't see a lot of mileage in doing it multiple times. It's OK for obscurity, but otherwise it's doubtful security. Far better to use a proven technique (a "bigger" method - SHA-2+, etc) and suck up any slight loss of performance.
Just to throw one more hat in the ring, can I just say "salting" and leave it at that...?
-
Salting is the process of adding a secret string to any password before encrypting them, making it impossible to brute force a password by dictionary or whatever, right..? (I'm trying to remember :P)
-
Pretty much, yes. Salting is adding a string (of some sort) to the password prior to the hashing process to make things harder on folks attempting to reverse engineer the hashes. As long as the salt is complex enough and long enough, a rainbow attack becomes impractical (computationally speaking) - http://en.wikipedia.org/wiki/Salt_(cryptography).
Theoretically, you can store the salt value and the hash without compromising security.
-
We salt the hash by using the username as part of the hash itself meaning that the rainbow table will have to be regenerated for every username. Though the multi-iteration hashing will also slow down generation of the rainbow tables too. Salting is a given regardless of how the hash was done ;)
Nao, what you've got there seems workable; all the embed URLs are embedded as [url=blah]blah[/url] IIRC, so you should be fine with that.
-
I found a option, that works. You guys are probably going to call me crazy for doing this, because this is one of the things you planned to strip out of SMF. I ended up making a cache with another MySql database, that was slow though so I went with another option. This option allows keys to be longer than the file cache and is typically safe, so no key conversion required. I ended up using SQLite as the cache and it seems comparable to file caching, with slightly better write performance.
This is the first time I used a SQLite DB, so I probably have it all set up wrong, hopefully not.
I do have a problem with the set up, I am sure I set it up right according to the manual, but I can't get SQLite's auto vacuum going. :whistle:
So the file grows and grows. When data is removed it is deleted but with null, so you have padding. The DB system uses any empty space, but who wants a file that may be grow big without any way to shrink it. Auto vacuum is suppose to return this space to the file system but I can't get it to work.
function sicache_setup() {
global $sicacheDB;
if ($sicacheDB = new SQLiteDatabase('./cache/cache')) {
$query = @$sicacheDB->query('SELECT * FROM cache');
if ($query === false) {
@$sicacheDB->queryExec('PRAGMA auto_vacuum = 2');
$sicacheDB->queryExec('CREATE TABLE cache (key text primary key, value text, ttl int);');
}
}
}
function sicache_get($key) {
// sicache_setup();
global $sicache, $sicacheDB;
if ($sicacheDB = new SQLiteDatabase('./cache/cache')) {
$query = @$sicacheDB->query('SELECT * FROM cache WHERE key = \''.sqlite_escape_string($key).'\' LIMIT 1');
if ($query != false) {
$data = $query->fetch();
if (!isset($sicacheData['sicache_purge']) && $data['ttl'] < time()) {
// Might as well purge the whole thing, only once though and only if needed.
$sicacheDB->queryExec('DELETE FROM cache WHERE ttl < '.time());
$sicacheData['sicache_purge'] = true;
}
return $data['value'];
}
}
}
function sicache_put($key, $value, $ttl) {
global $sicacheDB;
if ($value === null) {
@$sicacheDB->queryExec('DELETE FROM cache WHERE key = \''.sqlite_escape_string($key).'\'');
} elseif ($sicacheDB = new SQLiteDatabase('./cache/cache')) {
$sicacheDB->queryExec('INSERT INTO cache VALUES (\''.sqlite_escape_string($key).'\', \''.sqlite_escape_string($value).'\', '.sqlite_escape_string(time() + $ttl).')');
}
}
-
I guess I can update on my alternative solutions, SQLite3 Cache.
After a while of tinkering and optimization, I am sort of bummed. I guess I was expecting to make a system that would surplus the file cache system, not the case though, but a alternative it is. While not surpassing the file cache it is almost equal to it, sometimes slower, sometimes faster, so equal.
I got this idea from a Word Press cache that used either the popular memory cache systems, SQLite or file cache.
I don't know what else I can do, totally bummed about the outcome of it. A simple script though so didn't waste too much time into it, but still... :(
-
The biggest delay in the file cache, really, is the physical I/O on it, everything else is going to be a push depending on everything else going on all the time; SQLite typically has more going on with respect to CPU overhead but that's invariably worn down by the I/O overhead.
It's certainly been an interesting journey and I'm sorry to hear that you weren't able to get somewhere really interesting, but this strikes me as something that isn't quite in the scope of SQLite, though I'm not sure what scope SQLite has on the server anyway. (For embedded databases or uses like the history in Chrome, I can understand it, but not on the server as a poor man's MySQL.)
-
I am keeping it though because it solves the problem with the cache keys, but no performance gain. Just disappointing that is all I can get from it.
Here is the thing, the first cache hits start out slower than the file cache system. As the cache hits keep going SQLite gets faster and faster until at the end it has surpassed the file cache system in speed.
It seems the initial connection is the hardest. My theory though is SQLite on first connection continues to load the database, sort of a read ahead and loads the database into memory while PHP is doing its work. If it hasn't got into memory yet then a file request is needed, this can explain the slow queries in the beginning. Theory though, I can't find any documentation on how SQLite loads.
-
The initial connection being the hardest is not surprising, because it's doing a bit more than just loading; it's also rejigging the file as well to a point. SQLite is pretty complicated.
I also didn't think it was async behaviour, that would certainly run counter to how PHP works (i.e. it's a blocking behaviour rather than a non blocking one)
-
Looking at the cache dir this would appear to be solved now.