Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Arantor
3751
Plugins / Re: Light URL Plugin Maybe?
« on April 16th, 2012, 11:48 AM »
Yay another URL shortener.

If you're planning to offer something whereby sites can generate their own shortened URLs that don't rely on l-url.com (so that site.com makes site.com/go/hash automatically), you're going to have *SO* much fun with all the mutant configurations out there that won't support your routing scheme properly, especially given how hard it is to actually set up site.com/go/hash instead of site.com/go.php?hash...
3752
Quote
Bickering about a bug I reported to SMF, lol.
Oh, there's no argument that there's a bug. It's how best to fix it, knowing that caching should be as fast as possible while not compromising functionality.
Quote
So far all I know about it is it affects file caching for sure, as to any other caching means, I am not sure.
As far as I know, memcache(d) and APC do not care and neither uses file based storage in any fashion and as such it doesn't matter.
Quote
However one could argue it is up to the developer to send safe keys to the cache but I am sure some will not follow.
In the case of SMF+Aeva, that's a valid proposition, because it's essentially a non-compliant mod rather than a core feature, but in Wedge's case, it's a core feature that we have to contend with and it would be better to fix it at the cache level rather than the cache caller.
Quote
My test results, I added a few on there, it looks like strtr by itself would be allot better.
Except that we can't use it, because of the inherent risk of collision. If you have path/file.ext and strtr the / out to _ (or indeed any other character, but we'll use _ for the sake of argument), path/file.ext and path_file.ext both resolve to the same cache key, and since the root name used is the album name, users have some control over this process. I'm not saying that it's a direct vulnerability, but anything that a user can (relatively) trivially manipulate in this fashion is a bad idea.

On the other hand, MD5 is faster even than strtr and while it does carry a collision risk, I'd argue that it's actually much less likely than strtr especially as it can't be manipulated as easily by the user.
3753
Archived fixes / Re: Buggy Feed links
« on April 16th, 2012, 12:06 AM »
Quote
I would be tempted to say that we should remove that compatibility code...
Then it WILL break. The default in PHP itself is &, as per http://php.net/manual/en/ini.core.php#ini.arg-separator.input and the choice to use ; is for avoiding all kinds of nightmares. QueryString.php, line 81 onwards.

Code: [Select]
// Are we going to need to parse the ; out?
if (strpos(@ini_get('arg_separator.input'), ';') === false && !empty($_SERVER['QUERY_STRING']))
{
// Get rid of the old one! You don't know where it's been!
$_GET = array();

// Was this redirected? If so, get the REDIRECT_QUERY_STRING.
$_SERVER['QUERY_STRING'] = urldecode(substr($_SERVER['QUERY_STRING'], 0, 5) === 'url=/' ? $_SERVER['REDIRECT_QUERY_STRING'] : $_SERVER['QUERY_STRING']);

// Replace ';' with '&' and '&something&' with '&something=&'. (This is done for compatibility...)
parse_str(preg_replace('/&(\w+)(?=&|$)/', '&$1=', strtr($_SERVER['QUERY_STRING'], array(';?' => '&', ';' => '&', '%00' => '', "\0" => ''))), $_GET);

// Magic quotes still applies with parse_str - so clean it up.
if (function_exists('get_magic_quotes_gpc') && @get_magic_quotes_gpc() != 0 && empty($settings['integrate_magic_quotes']))
$_GET = $removeMagicQuoteFunction($_GET);
}
elseif (strpos(@ini_get('arg_separator.input'), ';') !== false)
{
if (function_exists('get_magic_quotes_gpc') && @get_magic_quotes_gpc() != 0 && empty($settings['integrate_magic_quotes']))
$_GET = $removeMagicQuoteFunction($_GET);

// Search engines will send action=profile%3Bu=1, which confuses PHP.
foreach ($_GET as $k => $v)
{
if (is_string($v) && strpos($k, ';') !== false)
{
$temp = explode(';', $v);
$_GET[$k] = $temp[0];

for ($i = 1, $n = count($temp); $i < $n; $i++)
{
@list ($key, $val) = @explode('=', $temp[$i], 2);
if (!isset($_GET[$key]))
$_GET[$key] = $val;
}
}

// This helps a lot with integration!
if (strpos($k, '?') === 0)
{
$_GET[substr($k, 1)] = $v;
unset($_GET[$k]);
}
}
}

So we have to leave that code in, unless you plan on fixing every URL to not use ; (and with all the other problems related to it). SMF and Wedge using ; is a definite oddity though it does solve so many problems.
Quote
And we should have dealt with that long ago. Seriously, I'm surprised we had the problem at all, because if you'll look at noisen.com, the phpsessid is never injected *at all* into feeds...
I can think of multiple reasons why it might be different between the two. Either way, the bottom line is that it wasn't and while I have patched around it, I'm not happy with it (which is why I haven't yet committed it)
Quote
Added a context variable in Feed.php to say we don't want to insert the session ID. This is tested against in Subs-Template.php. The reason why I was lazy for it, is that (1) just testing for Feedfetcher-Google (or whatever it's called) isn't going to do any good for other feed reader bots, (2) there is VERY little reason to have a session ID in a feed URL
Oh, you've added it, then much as I had. And yes, exactly, there is no reason to have it in the URL, and testing for bots isn't enough because of the sheer variety of feed consumers.

Note that SID containing a & is injected by SMF and Wedge, and I have no idea why ; wasn't used there.
Quote
If you have cookies disabled, then your session won't be active forever. Your feed reader will soon end up trying to access an incorrect session ID anyway.
Sure that's the case. Hence having feeds never contain a session id - and also why I was so adamant about fixing the canonical URL link as well, which also got SID added into it.

The whole thing about using SID in URLs is an interesting one and one I've been unwilling to make a move on because I'm inclined to think that having 'probably accurate' stats about the number of guests is probably slightly more important than having an SEO benefit to it (though having a canonical URL should fix most issues)

If probably_robot were more thorough, we can be happier about leaving it in. On the other hand, note, cookies being disabled will break other functionality anyway. It's a tough one to call :/
3754
The Pub / Re: The Cookie Law (in the UK at least)
« on April 15th, 2012, 11:11 PM »
It might not, but there is always the possibility that it *does*.
3755
The Pub / Re: The Cookie Law (in the UK at least)
« on April 15th, 2012, 10:09 PM »
Quote
It would seem that site owners may be responsible and have to obtain specific opt-ins before allowing their software to invite third-party cookies. But, as I said, ICO isn't giving any clear guidance on this (that satisfies lawyers).
Have you been to the ICO's site? Their opt-in is a very big list of cookies, which lists every cookie they use (of which there are quite a few), and the opt-in is for all cookies, not a per-cookie basis, so opting in for the important cookies also opts you in by proxy for the others too, which is a very dubious state of affairs.
3756
Archived fixes / Re: Buggy Feed links
« on April 15th, 2012, 05:58 PM »
Quote
However, Wedge does it this way... First of all, it determines whether user is a robot. If it is, it will skip adding PHPSESSID to links
Except that not all feed readers are detected - and IIRC, Google Reader does not trip that test.
Quote
So the reason why it broke in GR is that the URLs returned were not prettified.
So not being prettified will make every topic show up as a new topic (because of a unique URL) every time it looks? Even here, where URLs were being prettified, PHPSESSID was being injected - I specifically checked this when it was reported.
Quote
Of course, there's still the problem of browsers that disable cookies entirely... i.e. if you're logged in and have cookies disabled like me right now, you absolutely need a phpsessid link in the URL.
There is a (valid) argument about rejecting those cases and disallowing it entirely for security reasons.
Quote
I'm not sure why, but SMF and Wedge both add "&" at the end of the SID URL... Instead of simply using ";". I'm not sure SMF/Wedge would work *at all* if the installed PHP didn't support ";" as a separator.
Because & is the argument separator defined in PHP's configuration. Check the code in QueryString, if ; is arg-separator, it does one thing, but otherwise, it does something else to manually parse out the parameters. ; is just not the default. But we get cases, just for fun, of malformed entities being prepared occasionally too.
Quote
(Jean Dujardin voice) Not so fast, Mr. Bond!
Quote
As pointed out above, SID isn't injected is $user_info['possibly_robot'] is true, which is always the case for Google Bot. (And probably Google Reader's bot as well.)
Seriously, it IS injecting PHPSESSID. It does it on the RSS validator. I have no idea what user agent Google Reader uses but I'm willing to bet it doesn't trip possibly_robot. It's actually irrelevant in this case. It does not matter whether it trips possibly_robot or not, it should NEVER be issuing PHPSESSID in feeds, ever, because some other feed readers will choke on non unique URLs, I know Thunderbird used to have issues with it, for example (because that didn't bother with cookies at one time)
3757
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 10:50 PM »
Quote
But technically would it be possible to forget phpsessid in the first page view?
PHPSESSID is invoked when there isn't an existing Wedge cookie (like the start of every browsing session from a search engine) and that first page view, it will always shove PHPSESSID links everywhere. Remove your cookie, refresh, boom you'll see them.
Quote
Maybe we could enforce loading a second page like login before logging in is accepted.
We actually do, as it happens. You cannot go into login2 if you don't have a valid session. But that's not really the problem.

The problem that we have to weigh up is accuracy of reporting vs. PHPSESSID in URLs. Specifically, it's simply about tracking how many 'probably unique' guests there are, given the requests being made, since the whole nature of cookies is a friggin' bolt on to the specification in the first place (as HTTP is specifically designed to be stateless)
Quote
As for feeds, are we sure phpsessid is used in these? If yes we should certainly ensure it isn't included...
Yes, I am sure. It's not possible to avoid them, because it still goes through ob_sessrewrite to handle pretty URLs, so the usual logic applies - namely that if no cookie was found, PHPSESSID is injected. And given the approach made by Google Reader etc., they wouldn't have a cookie, so they get PHPSESSIDs, which is what we're seeing here (and I separately validated that PHPSESSID was added)
3758
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 09:17 PM »
Re Google... for *web searches*, yes it does - if you have Google Webmaster Tools and tell it not to do so. It doesn't do so automatically. And other services - like the feed reader that started this thread, no, it doesn't, because that's what causes it to repeatedly read in topics - because the URL changes.

The problem with that solution is that it's still not that reliable, especially for those who would actually trip it - we'd be better just accepting when it's wrong instead.
3759
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 08:56 PM »
Well, it's a bit more complicated than that because cookies are used to handle sessions even for guests. Where it gets problematic is for tracking the number of unique guests, and without proper session support that just won't happen properly - and PHPSESSID is only ever sent when there isn't a cookie, which is where search engines use it.

It isn't just about not having cookie support, it is also about when there simply hasn't been a cookie, e.g. the very first visit, but search engines typically have 'new sessions', and you could very easily go from having '2 or 3' Google visits at a time to dozens where it can't properly handle the session.

That's why I haven't changed it, because I have a nasty feeling it would break the 'number of guests online at present'.
3760
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 08:51 PM »
As long as it works and works even for actions added to the list through hooks (e.g plugins), changing to /do/something would be neat.
3761
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 05:31 PM »
I like what you've done with it, it's pretty approachable and friendly in the scheme of things :)
3762
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 05:16 PM »
Oh, there's several things wrong with PHPSESSID but I'm beginning to think the magic injection into every link on the page is actually more of a hindrance than a help. Certainly it screws up a lot of search engine stuff (including SEO) and feeds are no exception.

I have thought about dropping it entirely and relying on only cookies to handle sessions but that will confuse the tracking of max users on systems that can't properly handle cookies (like some search engines, paranoid guests), but not sure yet.
3763
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 05:06 PM »
And it's currently broken, in redirectexit(), when trying to redirect back to a topic, since it redirects back to ?topic=x which causes a redirect loop... return-to-topic currently is broken for me, it saves but subsequently fails to load the page after (where it has returned to the topic)
3764
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 05:04 PM »
Yes there is something to fix.

Every single time Google reads it, it will have PHPSESSID links in it, so each time it refetches the feed, it sees different items in it as a result (because whenever it hits, it has a different session id). I've already patched it so there's a new parameter ($context['no_phpsessid']), if empty it will inject PHPSESSID=whatever into the URLs if appropriate, or forcibly disable it if it's non-empty - there is absolutely no reason to submit PHPSESSIDs in feed items, and several reasons (like this) to expressly not do so.

There is still a bug with the XML not passing validation, though that won't cause this behaviour.
3765
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 02:00 PM »
OK, so I actually fixed the bigger issue quite easily, but now I need to go away and research the specification now.