Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Nao
5476
Archived fixes / Re: Buggy Feed links
« on April 15th, 2012, 05:32 PM »
Currently testing Wedge with cookies disabled... (Fun.)
Quote from Arantor on April 14th, 2012, 10:50 PM
PHPSESSID is invoked when there isn't an existing Wedge cookie (like the start of every browsing session from a search engine) and that first page view, it will always shove PHPSESSID links everywhere. Remove your cookie, refresh, boom you'll see them.
Yeah, I know that from countless forums visited where I got the phpsessid on first page and it was removed later. ;)

However, Wedge does it this way... First of all, it determines whether user is a robot. If it is, it will skip adding PHPSESSID to links. Which is the reason why they never show up in our RSS feeds in Google Reader. So the reason why it broke in GR is that the URLs returned were not prettified.

0x, your screenshots would be more helpful if they showed what URL these entries linked to ;) (So, basically... All of them have ?topic=... in the URL, right?)
Quote
Quote
Maybe we could enforce loading a second page like login before logging in is accepted.
We actually do, as it happens. You cannot go into login2 if you don't have a valid session. But that's not really the problem.
I was thinking, maybe we could simply put the phpsessid links on the login links... Instead of all links. Maybe that would help... i.e. if you're not going to log in, there's no reason you should have a constant session (of course arises the problem of increased opened sessions, but it can be solved another way I suppose... We could even track people by IP...), while if they want to login, then they'll want to have their session followed just to give Wedge time to process the user name & password... By that time, the browser correctly reports a cookie anyway, so we won't need to add the phpsessid links by then.

Of course, there's still the problem of browsers that disable cookies entirely... i.e. if you're logged in and have cookies disabled like me right now, you absolutely need a phpsessid link in the URL.

Now, about phpsessid...
- Looking for it directly (in the code) is bad manners. Pretty URLs did that. I'm going to strip that code out -- in fact, it should be looking for SID, which is "PHPSESSID=abcdef" altogether. This also means doing a str_replace instead of a preg_replace on all links, because we know the session value. (Same for $session_var... No need for a preg_replace. I simply used $context['session_query'] instead. I knew that one would prove useful one day ;))
- I'm not sure why, but SMF and Wedge both add "&" at the end of the SID URL... Instead of simply using ";". I'm not sure SMF/Wedge would work *at all* if the installed PHP didn't support ";" as a separator.
- Also, SMF/Wedge use SID != '' in one place (ob_sessrewrite), and defined('SID') in others. I don't see how SID could possibly be, err, set to an empty string if it was defined... Or maybe it's always defined, in which case I should probably test against that when I get to re-enable cookies... Do you remember anything about that anyway?
Quote
Quote
As for feeds, are we sure phpsessid is used in these? If yes we should certainly ensure it isn't included...
Yes, I am sure.
(Jean Dujardin voice) Not so fast, Mr. Bond!
Quote
It's not possible to avoid them, because it still goes through ob_sessrewrite to handle pretty URLs, so the usual logic applies - namely that if no cookie was found, PHPSESSID is injected. And given the approach made by Google Reader etc., they wouldn't have a cookie, so they get PHPSESSIDs, which is what we're seeing here (and I separately validated that PHPSESSID was added)
As pointed out above, SID isn't injected is $user_info['possibly_robot'] is true, which is always the case for Google Bot. (And probably Google Reader's bot as well.)
Posted: April 15th, 2012, 05:30 PM

Yup, looks like SID is always defined, and just empty when cookies are set...
5477
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 10:41 PM »
But technically would it be possible to forget phpsessid in the first page view?
Maybe we could enforce loading a second page like login before logging in is accepted.

As for feeds, are we sure phpsessid is used in these? If yes we should certainly ensure it isn't included...
5478
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 09:07 PM »
Wouldn't it work if we removed phpsessid for first visits (ip based), and restore it on the second visit if the cookie still isn't there..? (visit = pageview)
5479
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 09:06 PM »
And google removes the phpsessid var from the URL iirc? Isn't it a default php name?
If it doesn't, we might want to look into whether it can remove anything.

Also my main issue with phpsessid is not with purls but with people posting links that contain it.
5480
Bug reports / Re: Mark post unread
« on April 14th, 2012, 07:25 PM »
Yeah, except in the MessageIndex pages and a few other places. Don't remind me of this... It hurts :P
5481
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 07:24 PM »
Yeah, I'm pro-removing sessid too. And sort of determining that if the client won't accept cookies, and it's not the first page they load, then give them a message saying to fuck off :P

Then again I'm not sure we won't be getting genuine feedback about horror stories that arose from that design choice ;)
5482
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 07:22 PM »
So, shall I change the url to /do/something, even if it wastes a few extra milliseconds per page..?
5483
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 05:30 PM »
Yep, yep, I fixed it a couple of minutes ago... (Feeds were ALSO broken for a few dozen minutes, sorry about that.)

It's because I modified PURLs to use references everywhere in filters (rather than just in the $urls to $url loop, also in the filter calling, to avoid passing and returning a 100+-entry array on every call). And I forgot to update redirectexit() to use the new system.

Performance is not noticeably better by doing that, but I made some tests by calling it 10k times and that code block is about twice faster than before. So if you know you're going to modify an array before you pass it, *and* you have lots of entries, references are helpful for performance.

Anyway... So, I also rewrote the topic handling code to use a more efficient regex, in light of my recent discovery on look-arounds.
With all of my optimizations combined, pretty URLs are now about 250% faster than before, and can be as fast as 10 milliseconds for a full topic page with 15 posts! (Before that, it was 0.03s-0.04s on average...)
I don't know if I can optimize it even more, but I'll have a look.

Optimizing board and profiles isn't too exciting BTW -- it wouldn't save a lot.
Also, going through the URL array multiple times definitely doesn't hurt performance. Doing the main preg_replace twice isn't as great, though, but because I've optimized the regex for it, it shouldn't be as much as a problem as it used to be...
I'm pretty hopeful that Wedge's implementation of Pretty URLs is now going to be as fast as it can.

One final note... I rewrote the action code to allow for this kind of URL:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/do/something/?in=27

Instead of:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/?action=something;in=27

It doesn't seem to create any further issues, and it looks cool. My main concern is that in terms of performance, it's about 3 times slower than currently (the action loop I mean!), because as of SVN, the regex is only run if no match was found earlier, when right now it'll be run every time. (Albeit an optimized regex with, again, the lookaround optimization.)
It's still very fast -- an average on one millisecond to run the function on all URLS -- but I'd like to get some opinions on the URL scheme for this.
Posted: April 14th, 2012, 05:28 PM

:edit: Updated post intro.
5484
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 05:12 PM »
Ah, yes, I remember our talk about phpsessid... :)
5485
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 03:59 PM »
For The callback. Which is already very fast ;)
Basically with all my optimizations, purls is now about twice faster overall than before. :) it's already a nice gain. But it's bound to bring some new bugs. Which I'll fix, of course, when they come up!
5486
Bug reports / Re: Pretty URL remarks
« on April 14th, 2012, 02:35 PM »
BTW, there was really a lot of crap in the PURL code. I simplified it a lot. For instance, it would systematically do a preg_replace for several strings when only one or two would need a regex... I replaced some stuff with rtrim(), some others with str_replace... And managed to speed up the code by 400% in these areas. :)
Overall, it's never been as fast as it is now.
5487
Archived fixes / Re: Buggy Feed links
« on April 14th, 2012, 02:34 PM »
There is nothing to fix.
As soon as Google will retrieve the feed, it will try and find differences with the earlier feed. If anything changes, such as the URL, it will mark the item as new/unread.
Yesterday I made many tweaks to the pretty URL system, and as a result I broke feed links a few times. I think I fixed that no more than a couple of hours after it was broken, but it's enough for Google to mark something as new.

Oddly, though, I *thought* that with the system I'm using in Wedge (unique tags), it wouldn't bother with URL changes... That is one thing that really bothers me. But other than that -- I checked the current feed and it's all 'normal' to me...
Posted: April 14th, 2012, 02:31 PM

Note: a minor bug...
I was logged off when I posted my reply. It asked me to log in. I did my thing, and it said "Page not found" or something. Went back to the original page, clicked Submit again -- told me "Already submitted". Went back, copied my text, refreshed the page: it was NOT already submitted... So I just pasted the text and voilà.

So, two possible bugs:
- "already submitted" being said even when the post was actually refused...
- "page not found". I think it also does this in "View query" pages. Only admins can see that, though... (That bug may be due to my pretty URL tweaks though.)
5488
Bug reports / Re: Mark post unread
« on April 14th, 2012, 12:24 PM »
Spent a long time thinking the same.
Came to understand this week that I'm not crazy ;)
Too busy to double check with the code but that's certainly the behavior.
It was okay back when SMF used to mark all pages as read when page 1 was read. No longer is :)
5489
Bug reports / Re: Mark post unread
« on April 14th, 2012, 11:28 AM »
Just so we're clear -- if a topic was seen when it has new posts in it, hitting 'mark unread' will immediately mark THESE posts as unread. While if I for instance just visit the penultimate page of a read topic and hit Unread, it'll mark the entire topic as unread, instead of simply the last two pages. Which would make more sense... To me, 'mark unread' should basically say, 'come back here when I want to see this topic again'... Like a bookmark. At least that's how I use it...
5490
Bug reports / Mark post unread
« on April 14th, 2012, 10:05 AM »
Bug: when clicking mark unread in a topic, it should mark the page as unread. Not the whole topic.