Pretty URL remarks

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #15, on April 12th, 2012, 08:40 PM »
I'm trying not to look into SlammedDime's mod really. Perhaps out of pride -- because I had issues with the guy before. Also because I don't feel like reinventing something that works (albeit is a bit complicated...). I didn't write PrettyURLs either. I just know it well enough to fix its bugs as fast as needed.

Anyway, err... Apart from that, I don't really know where to start here. Just wanted to mention that with my 'simpler' regex, at least I get to easily change ANY url in the page and that works for JavaScript too... Just look at action menus, usually they wouldn't use transformed URLs ;) (The only thing to remember is to allow for %a-z or something in any string where you're expecting digits, since usually you'll get myvariable=%var% in JS and you want to replace these without breaking them... (It's different for topic IDs though. Can't really replace these :P)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #16, on April 12th, 2012, 09:17 PM »
I can fully understand that, but on the flip side, it does work exactly how you suggest.

The only downside to checking for any URL-like structure is that you can risk changing things that shouldn't be changed, e.g. the URL that controls the page index injection for ... expansion (which is a known bug in SMF implementations)
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #17, on April 12th, 2012, 09:44 PM »
Nope, even that works perfectly... :)
Posted: April 12th, 2012, 09:40 PM

Always has, BTW... I think! (IIRC it's always been transformed even when scripts were disabled.)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #18, on April 12th, 2012, 10:40 PM »
That's the thing, it normally gets broken because of the partial URL it has that can't necessarily be pretiffied.

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #19, on April 12th, 2012, 11:17 PM »
How so?

Btw I was mistaken.. I inverted my benchmark results. The svn version is actually a bit faster. Need to dig into this.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #20, on April 13th, 2012, 12:06 AM »
I can't remember if the code was changed or not, been so long since I looked. But in SMF's code at least, the page index included a URL with something like %1$d in it as a placeholder for the page or start number, which caused problems in SimpleSEF. I also know that there were problems with Google at one point, scraping content for URLs and seeing %1$d type content and trying to visit it, only to be rebuffed.

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #21, on April 13th, 2012, 09:12 AM »
- Google: well, it depends if they visit all pages but those in actual links... In our case it's <a data-href> so it shouldn't visit. But we know Google is too curious for its own sake... Anyway it will load the page but point to the correct canonical URL. I can live with that...
- SimpleSEF: if it matches against \d+ only, then I guess it can't find it yes. Wedge matches against more, precisely so that %1$d is taken into account.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #22, on April 13th, 2012, 12:12 PM »
Certainly at one time it was matching against anything that looked like a URL, even not in a tags, which is why there were things where the URL was split up.

I don't know what SimpleSEF did in the end, I just know that %1$d or whatever else was in the page index was problematic. But if the PURLs implementation correctly converts it to suit and everything still works as it should, great.

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #24, on April 14th, 2012, 02:35 PM »
BTW, there was really a lot of crap in the PURL code. I simplified it a lot. For instance, it would systematically do a preg_replace for several strings when only one or two would need a regex... I replaced some stuff with rtrim(), some others with str_replace... And managed to speed up the code by 400% in these areas. :)
Overall, it's never been as fast as it is now.

MultiformeIngegno

  • Posts: 1,337
Re: Pretty URL remarks
« Reply #25, on April 14th, 2012, 03:16 PM »
The wizard is in action!!! 400% faster.. :o

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #26, on April 14th, 2012, 03:59 PM »
For The callback. Which is already very fast ;)
Basically with all my optimizations, purls is now about twice faster overall than before. :) it's already a nice gain. But it's bound to bring some new bugs. Which I'll fix, of course, when they come up!

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #27, on April 14th, 2012, 05:06 PM »
And it's currently broken, in redirectexit(), when trying to redirect back to a topic, since it redirects back to ?topic=x which causes a redirect loop... return-to-topic currently is broken for me, it saves but subsequently fails to load the page after (where it has returned to the topic)

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: Pretty URL remarks
« Reply #28, on April 14th, 2012, 05:30 PM »
Yep, yep, I fixed it a couple of minutes ago... (Feeds were ALSO broken for a few dozen minutes, sorry about that.)

It's because I modified PURLs to use references everywhere in filters (rather than just in the $urls to $url loop, also in the filter calling, to avoid passing and returning a 100+-entry array on every call). And I forgot to update redirectexit() to use the new system.

Performance is not noticeably better by doing that, but I made some tests by calling it 10k times and that code block is about twice faster than before. So if you know you're going to modify an array before you pass it, *and* you have lots of entries, references are helpful for performance.

Anyway... So, I also rewrote the topic handling code to use a more efficient regex, in light of my recent discovery on look-arounds.
With all of my optimizations combined, pretty URLs are now about 250% faster than before, and can be as fast as 10 milliseconds for a full topic page with 15 posts! (Before that, it was 0.03s-0.04s on average...)
I don't know if I can optimize it even more, but I'll have a look.

Optimizing board and profiles isn't too exciting BTW -- it wouldn't save a lot.
Also, going through the URL array multiple times definitely doesn't hurt performance. Doing the main preg_replace twice isn't as great, though, but because I've optimized the regex for it, it shouldn't be as much as a problem as it used to be...
I'm pretty hopeful that Wedge's implementation of Pretty URLs is now going to be as fast as it can.

One final note... I rewrote the action code to allow for this kind of URL:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/do/something/?in=27

Instead of:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/?action=something;in=27

It doesn't seem to create any further issues, and it looks cool. My main concern is that in terms of performance, it's about 3 times slower than currently (the action loop I mean!), because as of SVN, the regex is only run if no match was found earlier, when right now it'll be run every time. (Albeit an optimized regex with, again, the lookaround optimization.)
It's still very fast -- an average on one millisecond to run the function on all URLS -- but I'd like to get some opinions on the URL scheme for this.
Posted: April 14th, 2012, 05:28 PM

:edit: Updated post intro.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Pretty URL remarks
« Reply #29, on April 14th, 2012, 05:31 PM »
I like what you've done with it, it's pretty approachable and friendly in the scheme of things :)