Wedge

Public area => The Pub => Bug reports => Topic started by: Nao on April 10th, 2012, 03:28 PM

Title: Pretty URL remarks
Post by: Nao on April 10th, 2012, 03:28 PM
So... Support for pretty URLs is pretty solid, but I still had an issue with Wedge trying to re-parse URLs that were already parsed, and sometimes breaking them as a result.

This turned out to be due to the way I implemented my 'index.php' removal code (which, BTW, is an optional feature). Because $scripturl immediately ignores the filename, the pretty URL regex thus searches for any boardurl and will transform them.
I fixed it by instead doing the transform *after* pretty URLs are handled. (Currently it's done within the PURL code block but I'm not sure this should be associated with them -- one might want to just get rid of index.php without systematically transforming URLs...)

This adds a new problem. Because it's now done through a basic str_replace, I have no way to 'control' how index.php is removed from the page -- could be within a non-linked URL, for instance. Or whatever.

My belief is that if you want to remove index.php from your URL, you will want to remove it from ANY place on the page, even if it's not intended as being transformed. Making it a no-brainer to apply the technique I just devised.

I'd like some opinions on this.
Title: Re: Pretty URL remarks
Post by: Arantor on April 10th, 2012, 03:37 PM
How about, then, we remove all instances of $scripturl from anywhere in the code and only ever use <URL> internally, so when *that's* replaced, we fix it there once and only once?
Title: Re: Pretty URL remarks
Post by: Nao on April 10th, 2012, 03:41 PM
Actually thought of doing it that way first, but then I figured, "argh, it's gonna get dirty"... ;)

But certainly it would be... Interesting to actually *remove* $scripturl entirely from the global list!
There are still 2000+ occurrences of it in the code, and we'd have to make sure ob_sessrewrite is always called... (e.g. Xml and SSI... But it's pretty much the same issue as with PURLs to begin with!)
Posted: April 10th, 2012, 03:40 PM

Also, if someone decides to add a link to the forum and adds index.php manually (or just gets it through an old Google link etc), then these might not get replaced in the final code...
Title: Re: Pretty URL remarks
Post by: Arantor on April 10th, 2012, 04:12 PM
Hmm, I see what you mean. But it's still got canonical URLs, it's still got everything else it needs to have, so all that it means is that it's an extra variation of URL that's inbound - it's not a killer in any real sense.

I have no issue with manually calling ob_sessrewrite.
Title: Re: Pretty URL remarks
Post by: Nao on April 10th, 2012, 11:16 PM
Does anyone else has something to add? Pretty please? I'm not sure what's best here.
Title: Re: Pretty URL remarks
Post by: Nao on April 10th, 2012, 11:19 PM
An alternative would be to do it like the svn currently does, only we add [?#] directly after $preg_replace. This is the first thing I did but it will still preventively match all urls until the question mark is not found.
Title: Re: Pretty URL remarks
Post by: Arantor on April 11th, 2012, 12:18 AM
Give it a try and see how it works out :)
Title: Re: Pretty URL remarks
Post by: MultiformeIngegno on April 11th, 2012, 12:57 PM
Quote from Nao on April 10th, 2012, 11:16 PM
Does anyone else has something to add? Pretty please? I'm not sure what's best here.
If you explain the matter in a newbie way I'll glad you with my opinion.. :P
Title: Re: Pretty URL remarks
Post by: Nao on April 11th, 2012, 12:59 PM
Well, it's very simple...

...Once I figure out how to explain it simply :P
Title: Re: Pretty URL remarks
Post by: Nao on April 11th, 2012, 02:59 PM
So... After doing some tests... I'm pretty much getting the same performance across all method. They're very variable, the server seems to be unstable in that respect. I was at first a bit surprised but not that much... When trying to match against a resource URL (image etc), Wedge will either stop at 'index.php' (if it's enabled), or at the [?;&#] if index.php is disabled. In both cases it should really be the same performance...

I'm not sure why but it seems that the 'original' solution (the svn one) removes the index.php automatically in the 'wedge.org/index.php' links, even though the regex doesn't seem to match them... Not that it's important.
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 12:41 AM
The more I simplify the code, the slower it gets... Sweet. And it broke this site without me knowing. Sorry.
Title: Re: Pretty URL remarks
Post by: Arantor on April 12th, 2012, 12:49 AM
The sad reality is that the complex code that is very hard to read and fathom out is the fastest.
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 09:23 AM
I'm sure it's just a 'simple' mistake somewhere...
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 08:27 PM
So... I think it's working again?

But right now, it's fairly... Slower, I should say. I'm not exactly sure why -- the regex should be faster (if anything it's simpler...), and I also optimized the underlying PHP, but still, it's up to twice slower than before to do the replacements.

I'm kinda hoping I could make it a bit faster by doing the overall page replacement only once:

- retrieve all content in the page like this: '\btopic=(\d+)|\bboard=(\d+)' (perhaps also \bcategory=(\d+)??)
- thus we have a list of topic and board IDs ($matches[1] has topics, $matches[2] has boards), including stuff that isn't ACTUALLY topic IDs to transform, but we get it as quick as we can. Now we simply do our search & replace regex...
- during the replacement, if we ever find 'topic=' or 'board=' in the match, and we don't have a list of topic/board names, fill the topic/board name cache(s).
- possibly call a new hook at this point...
- test for action=profile, replace with profile. (Has custom URL scheme)
- test for action=, replace with action. (Has custom URL scheme)

Should cater for everything...?! Except for the URL cache. (Which I've always contemplated dropping anyway... If performance is THAT important to you, maybe you shouldn't consider doing pretty URLs in the first place...?! Because that cache table ALWAYS gets huge after a while...)

What do you think?

Currently, this is pretty much how it is done...:
- run the search regex
- loop through all links and find uncached URLs
- call topic handling, loop through all links to find topics and transform them
- call action handling, loop through all links to find actions and transform them
- etc...
- run the replace regex with a callback that will look for the rebuilt URL in the temporary memory cache, and will do the replacement on all links.

I have a feeling that it *could* be made faster, see...!
Do you share my feeling or not?
Posted: April 12th, 2012, 08:16 PM

Funny... I just did a benchmark of my two regexes. And indeed the 'faster' one (in theory) is 4 times faster than the older one.
So the slowing down thingy must be somewhere else...
Title: Re: Pretty URL remarks
Post by: Arantor on April 12th, 2012, 08:31 PM
It's certainly possible to do it as a once-run thing, SlammedDime's mod (SimpleSEF) does that if I remember rightly.

I see where you're going with making it faster though, and I think you're right, that it can be made faster.
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 08:40 PM
I'm trying not to look into SlammedDime's mod really. Perhaps out of pride -- because I had issues with the guy before. Also because I don't feel like reinventing something that works (albeit is a bit complicated...). I didn't write PrettyURLs either. I just know it well enough to fix its bugs as fast as needed.

Anyway, err... Apart from that, I don't really know where to start here. Just wanted to mention that with my 'simpler' regex, at least I get to easily change ANY url in the page and that works for JavaScript too... Just look at action menus, usually they wouldn't use transformed URLs ;) (The only thing to remember is to allow for %a-z or something in any string where you're expecting digits, since usually you'll get myvariable=%var% in JS and you want to replace these without breaking them... (It's different for topic IDs though. Can't really replace these :P)
Title: Re: Pretty URL remarks
Post by: Arantor on April 12th, 2012, 09:17 PM
I can fully understand that, but on the flip side, it does work exactly how you suggest.

The only downside to checking for any URL-like structure is that you can risk changing things that shouldn't be changed, e.g. the URL that controls the page index injection for ... expansion (which is a known bug in SMF implementations)
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 09:44 PM
Nope, even that works perfectly... :)
Posted: April 12th, 2012, 09:40 PM

Always has, BTW... I think! (IIRC it's always been transformed even when scripts were disabled.)
Title: Re: Pretty URL remarks
Post by: Arantor on April 12th, 2012, 10:40 PM
That's the thing, it normally gets broken because of the partial URL it has that can't necessarily be pretiffied.
Title: Re: Pretty URL remarks
Post by: Nao on April 12th, 2012, 11:17 PM
How so?

Btw I was mistaken.. I inverted my benchmark results. The svn version is actually a bit faster. Need to dig into this.
Title: Re: Pretty URL remarks
Post by: Arantor on April 13th, 2012, 12:06 AM
I can't remember if the code was changed or not, been so long since I looked. But in SMF's code at least, the page index included a URL with something like %1$d in it as a placeholder for the page or start number, which caused problems in SimpleSEF. I also know that there were problems with Google at one point, scraping content for URLs and seeing %1$d type content and trying to visit it, only to be rebuffed.
Title: Re: Pretty URL remarks
Post by: Nao on April 13th, 2012, 09:12 AM
- Google: well, it depends if they visit all pages but those in actual links... In our case it's <a data-href> so it shouldn't visit. But we know Google is too curious for its own sake... Anyway it will load the page but point to the correct canonical URL. I can live with that...
- SimpleSEF: if it matches against \d+ only, then I guess it can't find it yes. Wedge matches against more, precisely so that %1$d is taken into account.
Title: Re: Pretty URL remarks
Post by: Arantor on April 13th, 2012, 12:12 PM
Certainly at one time it was matching against anything that looked like a URL, even not in a tags, which is why there were things where the URL was split up.

I don't know what SimpleSEF did in the end, I just know that %1$d or whatever else was in the page index was problematic. But if the PURLs implementation correctly converts it to suit and everything still works as it should, great.
Title: Re: Pretty URL remarks
Post by: Nao on April 13th, 2012, 12:47 PM
As you say!
Title: Re: Pretty URL remarks
Post by: Nao on April 14th, 2012, 02:35 PM
BTW, there was really a lot of crap in the PURL code. I simplified it a lot. For instance, it would systematically do a preg_replace for several strings when only one or two would need a regex... I replaced some stuff with rtrim(), some others with str_replace... And managed to speed up the code by 400% in these areas. :)
Overall, it's never been as fast as it is now.
Title: Re: Pretty URL remarks
Post by: MultiformeIngegno on April 14th, 2012, 03:16 PM
The wizard is in action!!! 400% faster.. :o
Title: Re: Pretty URL remarks
Post by: Nao on April 14th, 2012, 03:59 PM
For The callback. Which is already very fast ;)
Basically with all my optimizations, purls is now about twice faster overall than before. :) it's already a nice gain. But it's bound to bring some new bugs. Which I'll fix, of course, when they come up!
Title: Re: Pretty URL remarks
Post by: Arantor on April 14th, 2012, 05:06 PM
And it's currently broken, in redirectexit(), when trying to redirect back to a topic, since it redirects back to ?topic=x which causes a redirect loop... return-to-topic currently is broken for me, it saves but subsequently fails to load the page after (where it has returned to the topic)
Title: Re: Pretty URL remarks
Post by: Nao on April 14th, 2012, 05:30 PM
Yep, yep, I fixed it a couple of minutes ago... (Feeds were ALSO broken for a few dozen minutes, sorry about that.)

It's because I modified PURLs to use references everywhere in filters (rather than just in the $urls to $url loop, also in the filter calling, to avoid passing and returning a 100+-entry array on every call). And I forgot to update redirectexit() to use the new system.

Performance is not noticeably better by doing that, but I made some tests by calling it 10k times and that code block is about twice faster than before. So if you know you're going to modify an array before you pass it, *and* you have lots of entries, references are helpful for performance.

Anyway... So, I also rewrote the topic handling code to use a more efficient regex, in light of my recent discovery on look-arounds.
With all of my optimizations combined, pretty URLs are now about 250% faster than before, and can be as fast as 10 milliseconds for a full topic page with 15 posts! (Before that, it was 0.03s-0.04s on average...)
I don't know if I can optimize it even more, but I'll have a look.

Optimizing board and profiles isn't too exciting BTW -- it wouldn't save a lot.
Also, going through the URL array multiple times definitely doesn't hurt performance. Doing the main preg_replace twice isn't as great, though, but because I've optimized the regex for it, it shouldn't be as much as a problem as it used to be...
I'm pretty hopeful that Wedge's implementation of Pretty URLs is now going to be as fast as it can.

One final note... I rewrote the action code to allow for this kind of URL:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/do/something/?in=27

Instead of:

http://wedge.org/pub/bugs/7333/pretty-url-remarks/msg277572/?action=something;in=27

It doesn't seem to create any further issues, and it looks cool. My main concern is that in terms of performance, it's about 3 times slower than currently (the action loop I mean!), because as of SVN, the regex is only run if no match was found earlier, when right now it'll be run every time. (Albeit an optimized regex with, again, the lookaround optimization.)
It's still very fast -- an average on one millisecond to run the function on all URLS -- but I'd like to get some opinions on the URL scheme for this.
Posted: April 14th, 2012, 05:28 PM

:edit: Updated post intro.
Title: Re: Pretty URL remarks
Post by: Arantor on April 14th, 2012, 05:31 PM
I like what you've done with it, it's pretty approachable and friendly in the scheme of things :)
Title: Re: Pretty URL remarks
Post by: Nao on April 14th, 2012, 07:22 PM
So, shall I change the url to /do/something, even if it wastes a few extra milliseconds per page..?
Title: Re: Pretty URL remarks
Post by: Arantor on April 14th, 2012, 08:51 PM
As long as it works and works even for actions added to the list through hooks (e.g plugins), changing to /do/something would be neat.
Title: Re: Pretty URL remarks
Post by: Nao on April 16th, 2012, 06:47 PM
Done here as you can see.

Adding new URL schemes through hooks is no longer possible with my current code, though... But hopefully it'll be (nearly) doable once I commit. I'd like to clean up my code now, and commit something for a change... ;)
Title: Re: Pretty URL remarks
Post by: nend on April 17th, 2012, 04:31 PM
So no caching?

On Pretty URL's by vb I have done allot of modifications to the code. I have reported some of what I have done to his code back to him hoping it will get implemented. Some does and some doesn't. However I figured it easier to cache url's with SMF caching instead of using the DB. At least this way I can let old data just die, the database option didn't allow for this. Also it will serve as a big performance boost if the page ever becomes the center of attention.

I have also removed index.php from the urls with the mod. Added one little extra rule to the source. Among that I have also added allot of extra rules for other mods.

Separately though I have made all old urls illegal and display a 404 page. I have found through access logs that allot of bots look for say action=, topic= or board= to identify SMF.
Title: Re: Pretty URL remarks
Post by: Arantor on April 17th, 2012, 05:13 PM
The other benefit of using SMF's caching is that if you're using something like memcached or APC, it transparently caches *there* rather than in the DB without any code change on your behalf.

That reminds me I need to do other caching changes, actually.
Title: Re: Pretty URL remarks
Post by: Nao on April 17th, 2012, 06:52 PM
Quote from nend on April 17th, 2012, 04:31 PM
So no caching?
Wedge already does topic/board name caching (I need to add category name caching as well...), which represents the single biggest performance savings you could get. There's also a mini-cache inside the replacement callback (i.e. if a URL was already converted earlier in the page, we won't even bother to re-calculate what little data needs to be re-calculated).

What I'm not convinced with, is the global cache that stores every single URL to the cache... And I already added some warnings in the admin to discourage its use. And I'm considering dropping it entirely.
Quote
On Pretty URL's by vb I have done allot of modifications to the code.
Just wanted to say... Pretty URLs is NOT vb's. Definitely not! Dannii wrote it (and I wrote some small parts of it, or more precisely, I suggested improvements to Dannii because at the time I had no experience with collaborative programming, and I never used the SVN access he gave me on Google Code). He gave it to vb after he decided to retire from it, and forgot to offer it to me first (for which he apologized, so that's okay, it wasn't against me.) vb did absolutely nothing for PURLs, save for a few minor bug fixes. In fact, the mod is available in a BSD license so he couldn't do the usual things he does with mods he inherits (i.e. resell them!)

The PURLs in Wedge is based upon the version I rewrote for Noisen many years ago. All in all, it has very little to do with the original mod, except that I kept its name for simplicity, and it still uses the same basic techniques (going through all links to store them, extracting topic IDs, querying their names, and going through all links against to replace them.)
Quote
I have reported some of what I have done to his code back to him hoping it will get implemented. Some does and some doesn't. However I figured it easier to cache url's with SMF caching instead of using the DB.
But how did you do that...? One file per URL? That would incur a lot of disc writing/reading... One file per page?
Quote
I have also removed index.php from the urls with the mod. Added one little extra rule to the source.
Can even be done with a str_replace... :)
$buffer = str_replace('"' . $scripturl . '", '"' . $boardurl . '"', $buffer);
Of course it's not the exact not I had in my code... And I'm doing it differently these days anyway (stripping the index.php directly when initializing $scripturl, which might create issues with plugins if they're not careful, so I'm still unsure I'll be doing that forever...)
Quote
Separately though I have made all old urls illegal and display a 404 page. I have found through access logs that allot of bots look for say action=, topic= or board= to identify SMF.
Hmm... But if you used to have a purl-free forum, you're basically telling Google to go away...
Title: Re: Pretty URL remarks
Post by: nend on April 17th, 2012, 08:27 PM
On SMF caching I cache 1 file per page, it would be too much of a waste 1 file per URL. I figured there are only going to be a few pages hit often and allot of pages not so often, so I believe it will even itself out using this method.

On to the 404 thing, the forum has always been a Pretty URL forum, so Google or any other search engine does not know the old urls.

However though, I had bad performance when I started but had to work on allot of aspects to get it up to par. The DB cache system though that it had was the worst feature, the board and topic cache in the db was good but the other url cache was just junk.

Sorry I don't know the history of the mod that much. I hardly even stop by SMF though or keep up with the mods they release there. I hate trying other peoples mods also, usually full of bad coding anyways, but I gave a few a try to get a site up and ready fast. Usually after a while I end up make the system incompatible for future updates, don't care though file comparison tools help allot here, maybe more work but worth it to see if there is any valuable code updates. Getting off topic here though... :whistle:
Title: Re: Pretty URL remarks
Post by: Nao on April 26th, 2012, 05:25 PM
Quote from nend on April 17th, 2012, 08:27 PM
On SMF caching I cache 1 file per page, it would be too much of a waste 1 file per URL. I figured there are only going to be a few pages hit often and allot of pages not so often, so I believe it will even itself out using this method.
I don't know. I'm not sure this is for the best... We're talking about a feature that takes on average a hundredth of a second to execute (it's about half the time of the entire ob_sessrewrite execution time IIRC), perhaps two at the most, and caching this would logically save even less time... Then again, most of the cache calls in Wedge/SMF are done to save a few milliseconds only... But they usually don't have to target a wide range of URLs or anything.
Quote
On to the 404 thing, the forum has always been a Pretty URL forum, so Google or any other search engine does not know the old urls.
Then I guess you're good, indeed...
Quote
However though, I had bad performance when I started but had to work on allot of aspects to get it up to par. The DB cache system though that it had was the worst feature, the board and topic cache in the db was good but the other url cache was just junk.
Yeah, that's the main issue really... I'm still very much tempted to remove it -- but I'm wary of removing it to replace it with something I like even less.
Quote
Sorry I don't know the history of the mod that much.
Neither do I... Basically, I got in touch with Dannii back in late '07, helped out a bit, he offered a dev spot at google code, I never used it because of my lack of knowledge of svn, then my own version started diverging from his, and around '10 (?) he made it BSD and gave vblamer a dev spot and the 'keys' to the SMF mod page. By that time, my own code was already too different anyway...
Quote
I hardly even stop by SMF though or keep up with the mods they release there.
When you're running a forum, you don't always think of updating it... I personally rarely do, either.
I think we should write some kind of 'automatic updater' for plugins and Wedge, disabled by default or something, with the ability to install only the 'proven' updates of Wedge (or maybe also cutting edge versions?), something like that... We could simply retrieve the gzip file, extract it somewhere, and copy the tree structure...
Title: Re: Pretty URL remarks
Post by: MultiformeIngegno on April 26th, 2012, 05:35 PM
Quote
I think we should write some kind of 'automatic updater' for plugins and Wedge, disabled by default or something, with the ability to install only the 'proven' updates of Wedge (or maybe also cutting edge versions?), something like that... We could simply retrieve the gzip file, extract it somewhere, and copy the tree structure
That would be cool!! :o
Title: Re: Pretty URL remarks
Post by: Arantor on April 26th, 2012, 06:53 PM
Quote
When you're running a forum, you don't always think of updating it... I personally rarely do, either.
I think we should write some kind of 'automatic updater' for plugins and Wedge, disabled by default or something, with the ability to install only the 'proven' updates of Wedge (or maybe also cutting edge versions?), something like that... We could simply retrieve the gzip file, extract it somewhere, and copy the tree structure...
I'm going to love writing a routine to download the zip, upload it from a temporary file, unpack it file by file, upload each one to the server via FTP and hope to hell that we don't have permissions fuck-ups along the way.

Notifying users is good. Making it one-click is a VERY BAD IDEA. It's right up there with shitty plugins as why people get hacked.

As cool as it might be, the nightmare of doing it is enough to put me off doing it. Other systems do not have auto updating tangles, I don't entirely see why we should, especially since it's more likely to fuck you about than not.
Title: Re: Pretty URL remarks
Post by: godboko71 on April 27th, 2012, 05:38 AM
WordPress had or has a one click update, Nightmare on shared hosting and a resource hog and SLOW. I(t is a shame most shared hosts dont let users do wget and unpack shit this of all the saved bandwidth world wide.
Title: Re: Pretty URL remarks
Post by: MultiformeIngegno on April 27th, 2012, 12:09 PM
It works like a charm instead.. The security concerns are valid, but it works really really well.
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 12:22 PM
It doesn't 'work like a charm' for competent sysadmins who don't use FTP - it only works for me if I unsecure WP installations.

If you think for one moment I'm going to trade security for functionality, you're very much mistaken.
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 04:27 PM
Well... Then you could disable it ;)
Thing is, 90% of all SMF installs are 'insecure' per se if you'd like... And perhaps they don't mind, even knowing so. If this at least allows them to do automatic upgrades...

:edit: Testing edits...
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 04:40 PM
I'm sorry, I'm not prepared to accept being insecure by default just because people are too lazy to do some work themselves occasionally.

Here's the thing: other platforms don't have one-click upgrades, and they're absolutely happy with this for exactly the same reason I am.
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 04:48 PM
Quote from Arantor on April 27th, 2012, 04:40 PM
I'm sorry, I'm not prepared to accept being insecure by default just because people are too lazy to do some work themselves occasionally.
Why would your site be insecure just because there's an option to update the files...?

I mean, if we have such a system in, we'd simply warn any admin that an update is ready, then they go to the admin area, click to update, we pass along the session IDs etc, and the script goes on to import the file, extract it in a temp folder (all of this is very safe...), and then either we ask the user to use FTP to move the folder to the root (i.e. automatically replace all files), or we can offer to do it for them... (Which you wouldn't do, I guess.)
Quote
Here's the thing: other platforms don't have one-click upgrades, and they're absolutely happy with this for exactly the same reason I am.
And Wedge isn't exactly a platform that's like all others... :P
I'd like for it to require as little attention as possible from the admin point of view. SMF upgrades are so painful to me that I simply don't do the upgrades and leave my installs opened to security holes (well, I do try and fix them manually though...)

I'd really like for Wedge to be able to say that "our installed base has zero security holes"...
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 04:55 PM
Quote
Why would your site be insecure just because there's an option to update the files...?
Because just about every system ever designed to do that has to make the files writable to the webserver. Which means making everything 777. Or you fuck around with creating a manual S/FTP connection, unpacking the update file by file and uploading it via S/FTP that way. While the latter is feasible for plugins, it simply isn't feasible for large-scale updates. And that still assumes the system is actually running FTP or SFTP, systems like IIS won't be. (And believe me, getting it running properly on there is a nightmare)

So if you don't fuck about with S/FTP, you tell people how to chmod or chown (and just look at how many support topics there are on SMF for this) their files, and once it works, they're not going to put it back to secure afterwards.
Quote
I'd really like for Wedge to be able to say that "our installed base has zero security holes"...
Isn't going to happen. You'll always have people who won't upgrade because they've made custom changes they won't upgrade for, or because they think that their plugins won't work.

But, it's interesting to note that the systems that have one-click updates are more routinely victims of server abuse through files being infiltrated (because of poor permissions setup) than those that don't.

Even putting aside these factors, you're left with an update system that makes the files insecure by default, or an update system that is secure by default but requires users to do some work to keep it up to date. I'd certainly prefer the latter, especially as updates should just be a case of uploading a new set of files, nothing more.
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 05:20 PM
<sigh>

Then no automatic updates for me...
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 05:28 PM
Did you ever do automatic updates for SMF?

How many people are screwed because automatic updates don't work for them? (The number of 1.1.11 updates that failed was pretty vicious, btw)


Note that if someone can show me a robust method for doing it that doesn't require making things insecure by default, I'm more than willing to entertain the idea. But the sheer mechanics of doing it considering the usual mess that is hosts + permissions means I'd rather avoid it entirely, just because that's actually better for the user.
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 05:31 PM
Nope, never did...
But I did do automatic updates for AeMe sitelists, and guess what, never had a single problem with 'em... :P
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 05:31 PM
Do you know why you never had a problem with them?
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 05:35 PM
Apart from the fact that I rock..?
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 05:41 PM
Sadly, it wasn't because you rock, it's because SMF was creating the file by the webserver user itself, meaning that it was implicitly writable by the webserver, and indeed by any other user on the server, which is really no better than the file being owned by the proper account holder but being 777.

The whole point of faffing with FTP/SFTP the way I mention is so that the webserver owner explicitly does NOT own the files, but the FTP user account does (i.e. the user's own account). That way you leave them at the standard 644 and they're not writable, and thus not at risk from tainting.
Title: Re: Pretty URL remarks
Post by: Nao on April 27th, 2012, 05:44 PM
Well... I'm not 100% sure it's true.
Aeva-Sites.php is indeed generated through SMF, but Subs-Aeva-Sites.php or whatever was included in the package so it wouldn..... Oh, wait, the package is usually installed by SMF. Which would explain...

Well, how 'bout we make Wedge into a package, too? :lol:
I mean -- we ship with a gzip file and an index.php file, and we run the index file to decompress the gzip and install the files with a webserver user ID...

(Of course it's also opening the door to issues when people start FTPing updated files. Hmm.)

Well, I guess FTP'ing the whole stuff is the only solution then. What's the issue with security? Why not use your SFTP component if available?
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 05:53 PM
Quote
Well, I guess FTP'ing the whole stuff is the only solution then. What's the issue with security? Why not use your SFTP component if available?
Because it's still the same basic problem, regardless of whether you use FTP or SFTP.

Scenario 1: the www-data user owns the files. They're writable from Wedge. They're also writable by definition from ANYTHING ELSE that's web-based on the server. That means any rogue process can infect any Wedge file or plugin file with code. This, really, is the risk vector that we're talking about.

Scenario 2: the proper user account owns the files. Then they make everything 777 (and experience says they won't put them back to 644), so Wedge can write the files. Same problem: they're still writable by any other code that's executed via PHP from web requests, which will all be run as www-data.

Scenario 3: we do it as I'm thinking, with FTP. That means we upload it to the server or otherwise obtain it, it'll be in a temporary folder somewhere. Exactly where is irrelevant, because however we get it, the same thing has to occur: it goes to a temporary place. Here's where it gets complicated: whatever we do, we really then have to get into unzipping that file. [1] Whatever we do, we've unpacked the file and we then have to upload it file by file over FTP. For any sizeable package, this is not going to be a quick process, especially if we're talking about a major upgrade. It's likely to then fall foul of the 30 second timeout, amongst other things.


These are the alternatives as I see them, and I don't like any of them that much :/
 1. We end up dumping it all into system-wide temporary folder, where it's also at risk, though there's no guarantee the files will actually be left available over any period of time. It's really a side concern though.
Title: Re: Pretty URL remarks
Post by: nend on April 27th, 2012, 07:19 PM
Quote from Nao on April 26th, 2012, 05:25 PM
Quote from nend on April 17th, 2012, 08:27 PM
On SMF caching I cache 1 file per page, it would be too much of a waste 1 file per URL. I figured there are only going to be a few pages hit often and allot of pages not so often, so I believe it will even itself out using this method.
I don't know. I'm not sure this is for the best... We're talking about a feature that takes on average a hundredth of a second to execute (it's about half the time of the entire ob_sessrewrite execution time IIRC), perhaps two at the most, and caching this would logically save even less time... Then again, most of the cache calls in Wedge/SMF are done to save a few milliseconds only... But they usually don't have to target a wide range of URLs or anything.
So your saying caching the data takes more time then if I processed all the urls in a page?

I wasn't too sure which would be faster when I made the modification to do file caching instead of db. I know the file caching has to be faster than the db, not only faster but you don't have to worry about the DB being filled with junk that never purges. Both can be turned off easily anyways.

However I know you and Arantor, always bickering over performance and slight gains, which isn't a bad thing. So I am pretty sure your information is correct. Thanks though for the information, I think I am going to disable caching completely in that area.
Title: Re: Pretty URL remarks
Post by: Arantor on April 27th, 2012, 07:22 PM
It's not a simple x is faster than y argument, either. If you have stuff in the DB that's being called frequently, assuming you don't have a stupid DB layout and bad queries, there's a reasonable chance that you might be able to get some of that stuff out of the query cache, and if the table isn't too huge, you might even get it entirely into memory anyway, and RAM is faster than HD for things like that ;)
Title: Re: Pretty URL remarks
Post by: nend on April 28th, 2012, 05:21 PM
Quote from Arantor on April 27th, 2012, 07:22 PM
It's not a simple x is faster than y argument, either. If you have stuff in the DB that's being called frequently, assuming you don't have a stupid DB layout and bad queries, there's a reasonable chance that you might be able to get some of that stuff out of the query cache, and if the table isn't too huge, you might even get it entirely into memory anyway, and RAM is faster than HD for things like that ;)
The method in the VB's script is not DB friendly. The URL DB cache for most people out grows everything else in the DB. Before I noticed it, the URL DB cache was taking up at least 3/4 or 75% of my DB. That is when I suggested to VB to use SMF's cache and also give the option to disable, because that is better than the DB option IMHO. Plus who wants 75% of their DB just used for Pretty URL's.
Title: Re: Pretty URL remarks
Post by: Nao on May 1st, 2012, 12:55 PM
Quote from nend on April 28th, 2012, 05:21 PM
The method in the VB's script is not DB friendly.
Just wanna say... Again, it's not vb's script. It's Dannii's. vb did pretty much nothing on it. All he was trying to do is accumulate mods in his ownership and use them to advertise his paid-for mods. Bad karma.
Quote
The URL DB cache for most people out grows everything else in the DB. Before I noticed it, the URL DB cache was taking up at least 3/4 or 75% of my DB. That is when I suggested to VB to use SMF's cache and also give the option to disable, because that is better than the DB option IMHO. Plus who wants 75% of their DB just used for Pretty URL's.
No one.
Any volunteers to benchmark Wedge in DB/non-DB Pretty URLs modes?
Shouldn't be too complicated to do... But I have no time myself, I'm afraid. I'm busy IRL (most today, tomorrow should be better for me), and even then I'm struggling to finish my newest commit which is going to be huge because I finished many of my 'works in progress' yesterday night (maybe after all, it's a good thing I had my Internet down for so long because at least I was able to focus :lol:), anyway, it's going to take some time to document it and everything...

Either way, I think there's a 90% chance that I'll be removing the DB feature and replacing it with nothing ;)
Title: Re: Pretty URL remarks
Post by: Nao on May 1st, 2012, 01:17 PM
Re package auto update -- I really dont think it's unfeasable. But it'll require some thinking Indeed.
Title: Re: Pretty URL remarks
Post by: Arantor on May 1st, 2012, 03:02 PM
Well, of the three options as I see it, the last one is the only sane way to do it, other than putting a ton of crap in the DB and pushing it out to cache files periodically (which is sluggish and horribly inefficient, and not really any more safe) but it's still a lot of work and I'm not sure how reliable it'll be.
Title: Re: Pretty URL remarks
Post by: Nao on May 2nd, 2012, 07:58 AM
Are you referring to our SFTP story ('scenario 3'), or to the PURLs talk ('ton of crap in the DB')...?
Title: Re: Pretty URL remarks
Post by: Arantor on May 2nd, 2012, 01:41 PM
Actually, the former (file access). The alternative to allowing file uploads is to push it all into the DB and call for it as necessary, pushing it out to the cache folder to make it faster. But if that includes executable code, you're still making executable code writable by the webserver owner which makes it vulnerable to abuse from outside. (Cache files are less of a risk because they're purged periodically)
Title: Re: Pretty URL remarks
Post by: nend on May 6th, 2012, 09:18 PM
I need to get testing this properly. It looks like that now I disabled the cache it runs a little slower. DB caching works great but it puts too much junk in the DB without any way to properly purge expired URL's. It looks to me though file caching is in the middle.

I need still need to properly test these, benchmarks and such. So everything I am writing is by speculating by feel, no hard stats. It is going to be hard though to get a mean on a grid hosting account, can't force it to stay on one node, too many resource changes at one time.
Title: Re: Pretty URL remarks
Post by: nend on May 6th, 2012, 09:29 PM
Ok, done some testing, will when a cache key exist there is not much difference in speed from no caching at all to caching. It is when the script has to insert a new cache it slows down.

No Caching at all
Code: [Select]
.141
.237
.146
.154
.139
.239
.116
.15
.128
.215

Caching
Code: [Select]
1.186
.352
.134
.153
.346
.244
.312
.133
1.179
.246

As you can see there are spikes, these spikes are caused when a new cache needs to be created. There is basically no gain in caching and no caching is better. :cool:
Title: Re: Pretty URL remarks
Post by: Nao on May 6th, 2012, 10:58 PM
So, no file cache then?
Title: Re: Pretty URL remarks
Post by: nend on May 6th, 2012, 11:40 PM
Quote from Nao on May 6th, 2012, 10:58 PM
So, no file cache then?
I would guess if you had a real memory caching system it would be faster, as for file caching, it is basically useless and DB caching is wasteful.

For me, DB storage I only have 1GB per database. So DB caching for me isn't a option and file caching looks to have no beneficial gain.

I would guess the only benefit you can gain on file caching is if the information is in the DB, other stuff like computing power PHP seems to be able to compute these things faster than a read from a file, well the read isn't the problem, seems like it is the write.
Title: Re: Pretty URL remarks
Post by: Arantor on May 6th, 2012, 11:54 PM
That would imply then that you want to extend the TTL on caching so that you do fewer writes.

You could leave the caching subsystem in, but only cache things if it's level 2 or higher (so that you actually have to have a proper memory cache in order to use it)
Title: Re: Pretty URL remarks
Post by: nend on May 6th, 2012, 11:56 PM
You know under different circumstances you might want different settings here. Just because my disk I/O is slow doesn't mean that is the case for every server setup. Who knows what scheduler is being used on my server, if writes are slower than reads than there might be a higher priority on them, sounds like deadline to me, but can't be sure.

Then you have your accelerators, if one of these are installed then caching might not be a bad idea.

Maybe these are things that can be determined during setup. Certain test to determine the best setting on the server the site is on.
Title: Re: Pretty URL remarks
Post by: Arantor on May 7th, 2012, 12:10 AM
Disk I/O is always slower than using memory caches - always. That's why you have memory caches ;) In any case, writes are physically more intensive to perform than reads, in whatever environment you care to name. It's really about the comparative performance which is almost impossible to test meaningfully.

That's really the thing, if cache level >= 2, cache pretty URL stuff, if not, don't.
Title: Re: Pretty URL remarks
Post by: nend on May 7th, 2012, 12:38 AM
Quote from Arantor on May 7th, 2012, 12:10 AM
Disk I/O is always slower than using memory caches - always. That's why you have memory caches ;) In any case, writes are physically more intensive to perform than reads, in whatever environment you care to name. It's really about the comparative performance which is almost impossible to test meaningfully.

That's really the thing, if cache level >= 2, cache pretty URL stuff, if not, don't.
I know memory is faster, my comparison was processing vs disk I/O. Processing seems faster than I/O speeds, but can that be the case on all systems, is the processing always going to be faster than disk I/O on every system?

Writes don't always have to be costly though, but the chances of running into a environment like this is rare, on a server even rarer. Most of my setups, on my local devices I always set up the environments where it favors reads more than writes. Reading is what the user cares the most about.

I agree though, the simple solution is usually the best, if there isn't a memory cache available then only process it.
Title: Re: Pretty URL remarks
Post by: Nao on May 7th, 2012, 01:23 AM
So... Do I remove db caching? :P
Title: Re: Pretty URL remarks
Post by: Arantor on May 7th, 2012, 01:42 AM
Yes, remove DB caching and if any caching is going to occur, do it via cache_put_data if (isset($settings['cache_level']) && $settings['cache_level'] >= 2)
Title: Re: Pretty URL remarks
Post by: Nao on May 7th, 2012, 02:32 AM
Why only on these? Because we are to assume that's only for non-file-based caches?

Also, I wouldn't know how best to store these URLs in a file cache... Per-page basis is meh to me.
Title: Re: Pretty URL remarks
Post by: Arantor on May 7th, 2012, 02:37 AM
Because for file based caches, the available evidence would seem to suggest that the overhead of performing the writing would erode the benefit of caching.

I'm not sure about handling caching either though.