Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - [Unknown]
31
Off-topic / Re: htmlspecialchars while inserting into DB
« on June 21st, 2011, 05:00 PM »
Quote from Arantor on June 21st, 2011, 11:59 AM
I think it's important that some element of that is continued to be done, but how far do you go with it? At what point is it acceptable to trade off performance for security?
This wasn't done for security reasons (although it's not necessarily a bad thing security wise.)  It was done to make upgrading not require updating past posts, and because user settings and admin settings can affect the bbc and we didn't want to deal with "recalculating."  And lastly because you need to be able to edit your posts, and we knew there would be bugs if we tried to "reverse the translation" - but were worried about the additional storage requirements of storing two copies of every single post.

Ultimately, for that, I think several things:

1. String parsing in PHP is slow.
Actually, it's slow in a lot of languages (but especially PHP), because they use a NUL at the end of the string, and parsing is slicing heavy.  I think for larger forums, the only good solution is to write (parts of?) the parse_bbc algorithm in a tighter language, such as D.  This could be done with basic IPC kinda like memcached works.

2. Caching posts needs a bit of a better solution
A lot of people want to just "cache everything."  Throw physical boxes at the problem, and use all of them as ram sticks, and just cache, cache, cache, cache.  This is expensive, and becomes hard to manage, expunge, etc.
What I've done is an "indexed cache", which means I mark entries in the cache that are popular, and over time, cache those popular items.  Then when they get unpopular, they are garbage collected.  We use this at work, and it requires very little memory, but significantly benefits viral traffic (which I'm pretty sure is the same sort of traffic forums get.)

3. The parse_bbc() routine is far from perfect
The way TOX-G parses is much better.  The BBC parsing routine was written in a similar way to how one would write it in a language like C.  I naively hoped PHP would turn this into optimal code (and it was the easiest way to write anyway), but it doesn't.

4. Someone needs to profile parse_bbc()
I think Groundup was trying to do this, but I don't know how much he did with it.  It needs to be broken up into smaller functions - ideally like ~5 of them or more - so that the performance problem can be identified.  I have some ideas where performance may be bad, but guessing isn't the best road to improvement.
Quote from Arantor on June 21st, 2011, 11:59 AM
On the flip side, there are things like topic subjects, which aren't resanitised on display, what's in the DB is assumed to be safe.
Right.  I think htmlspecialchars is relatively cheap, so I'd either do it on display, or else communicate the data in a text format (such as json, although that has its own escaping) and set it as text via the DOM.
Quote from Arantor on June 21st, 2011, 11:59 AM
It reattributes old posts to a new user, e.g. new account after deletion, but it doesn't do topic ownership at the same time.
Hmm.  I feel like we had something in other/tools/ that did this and did do topic reattribution, which Compuart I think wrote.  Weird.
Quote from Arantor on June 21st, 2011, 11:59 AM
You know something's wrong when they're accepting mods on the mod site that are bug-fix mods. I only wish I were kidding.
Ah, this reminds me of YaBB SE 1.5.x.  That's when I "joined the scene."  I created a package server (reverse engineered how) and centralized bug fixes in an unofficial package that I updated every day or two.
Quote from Arantor on June 21st, 2011, 11:59 AM
You can say that about a lot of things, though. It's almost the same argument against something like Microsoft Word: just because you might only use 2% of its features, that sounds like an argument for stripping out the rest - until you realise that everyone else uses a different 2% of its features.
Well, Word is a boat.  I don't know about you, but I don't want to write software like that.  I'm the guy who always wanted to remove the calendar from the standard distribution of SMF.  It's higher attack surface, more checks for perf impact, more to download, etc.

-[Unknown]
32
Off-topic / Re: Post-XSS scenarios and database driven sessions
« on June 21st, 2011, 04:36 PM »
Quote from Dragooon on June 21st, 2011, 03:21 PM
I've really been looking into it even though I was 16 hours overdue for sleep. So the best way against XSS is to prevent it, I've realised that regardless of the measures the session will be stolen. So it is mostly in hands of htmlspecialchars, is there other way to do XSS even if the output is htmlspecialchar'ed?

As far as CSRF go, I guess that's what those session tokens in SMF are for?

Also as far as iFrame go, a person can't really steal a cookie from iframe correct?
Well, httponly is pretty useful against XSS.  Not a silver bullet, but a good option.

For CSRF, actually, using XMLHttpRequest and setting custom headers can really help here.  But yes, using SSL and using some sort of token in the URL works well.

For iframes, there's X-Content-Frame-Options or something like that.  You can't steal a cookie, but clickjacking is a significant security concern.

<IfModule mod_headers.c>
   Header add X-Content-Type-Options nosniff
   Header add X-Frame-Options SAMEORIGIN
</IfModule>

-[Unknown]
33
Off-topic / Re: htmlspecialchars while inserting into DB
« on June 21st, 2011, 10:21 AM »
Quote from Arantor on June 21st, 2011, 09:35 AM
I wonder if preparsecode's changing everything into br tags is a holdover from YaBB where everything was in flat files. (I don't know how it was stored, but it seems feasible to me)
Now that you explain which ones, I'm pretty sure they all were holdovers.  Definitely the br thing was.  And I know the nobbc/html thing was just because that was the easiest way.
Quote from Arantor on June 21st, 2011, 09:35 AM
That's one of the things we can do something about. We've expressly set ourselves on the path of having an importer rather than just 'upgrading' the existing tables - aside from the fact it lets people run them both side by side to experiment, it also means we can do manipulations along the way like fix any of these things we decide to resolve.
While I agree, it also means large forums (where's Douglas at anyway?) are a big problem.
Quote from Arantor on June 21st, 2011, 09:35 AM
Depends what you mean by sanitise. htmlspecialchars both ends strikes me as a bad idea, for example.
Sure.  But I explicitly don't agree with the "garbage in, garbage out" philosophy when it comes to security.  An example: just because you only put valid filenames into the attachments table doesn't mean you shouldn't validate that assumption on the way out too.  This also improves upgrade scenarios anyway (if done right.)
Quote from Arantor on June 21st, 2011, 09:35 AM
Well, the methodology I was referring to was that all mods using the database should use $smcFunc's functions, and should be using the proper parameterisation, or the proper insert method made available that deals with escaping etc for you.
I was referring to where you talked about magic quotes.  Maybe it was the wrong way to go at the time, but the codebase expected them on as well as register_globals, so I decided the most secure route was to move forward with magic quotes on and kill register_globals (because with either of those reversed, could've had security holes if I made mistakes, rather than just bugs.  I prefer bugs.)
Quote from Arantor on June 21st, 2011, 09:35 AM
were still deferred through some "well, it might cause regressions" mantra.
Well, at work, this is a pretty common mantra.  I'm even holding back changes on a project right now for this reason.

At the same time, I agree.  Especially with the release of a new major version number, you gotta get it right or you'll pay for those mistakes for a while.  I know Chrome and Firefox are going the way of loosey-goosey, but I'm not convinced that will work for either of them long term, especially Firefox.

Back whenever ago, I posted the same basic thing on simplemachines.org; if you need more RCs, release more RCs.  There's no shame in releasing RCs.  I'm still considering TOX-G alpha, after all (and probably will until I write better docs, some more useful samples, and add a couple nagging features that I want.)  It's being used in production by a select few, and I'm ready to help those implementations upgrade if necessary if I have to make breaking changes.

Sad to hear that SMF didn't do the same.
Quote from Arantor on June 21st, 2011, 09:35 AM
We use jQuery in Wedge. Not just because it means we get to minimise the code to be sent to users (since the admin can pick a CDN copy of jQuery), but the time we spent writing JS is shorter, and I suspect less time is also spent in debugging. Plus a lot of users do add stuff that makes use of jQuery, so having it in the core means plugins won't try each adding it and falling over each other.
Well, jQuery is fine.  Most people who take the test, even if they add jQuery, leave that line in and don't touch it; they clearly don't understand it.  That's what's a bad sign.

[rant]

I don't hate jQuery, but I don't really like it either.  It's not a bad choice and very sane for most web apps, though.  My problem with it is that it's got a lot of features, most of which I don't need, and doesn't provide most of the features I do.  It has animations, but doesn't do colors; it deals with nodelists spectacularly, but makes me learn a second DOM syntax to do so; it has whiz-bang solutions like $.each, but it doesn't have convenience funcs to build this-bound delegates; it makes it super easy to build html, but in a way that encourages XSS-vulnerable code; and it supports xmlns selectors but doesn't make them compat in IE browsers.  Off the top of my head.

[/rant]

But yeah, it's popular and people are learning it, so it makes sense.

-[Unknown]
34
Off-topic / Re: Post-XSS scenarios and database driven sessions
« on June 21st, 2011, 05:14 AM »
Quote from Dragooon on June 21st, 2011, 01:53 AM
- IP checking (Can't find a way around it)
- Small cookie time length
IP checking is okay, if you're sure none of your users are on AOL.

Small cookie timeouts are a good idea, as long as you use a session keepalive (or are fine getting logged out constantly.)

The best way to do "remember me" is like another form of sessions (just not garbage collected.)  This allows you to have a button to log other computers (which each have a separate token) out or etc.  SMF tries to do this, and does okay, but the better way is to have each computer use a separate token.

-[Unknown]
35
Off-topic / Re: htmlspecialchars while inserting into DB
« on June 21st, 2011, 05:03 AM »
Quote from Dragooon on June 20th, 2011, 09:59 PM
Does this make sense? I've been wondering, wouldn't mysql_real_escape be enough while appending to DB?
I think it doesn't make sense.  I say this even though I think SMF does do it for posts at least, but there are many mistakes I either made or didn't fix when I wrote SMF.

From a security standpoint, the assumption should be that everything else has been compromised.  From an optimization standpoint, the assumption should be everything else is perfect.  Quality lives somewhere between the two.

Generally, I will cast to int things even from the database - because I don't know if my database query had a SQL injection, or maybe something went wrong in my insertion, or another software was compromised and gained access into my database.  The less that an attacker can "gain" even after they successfully exploit a small hole, the better.

Plus, htmlspecialchars'ing before you insert in the database increases space requirements, and makes integration with other systems harder.
Quote from CJ Jackson on June 20th, 2011, 10:35 PM
I never used mysql_real_escape, I always used prepared statement because I find them a lot easier.
If you're using ISO-8859-1, ASCII, or UTF-8, then as long as you escape all the right characters, you are safe.  If your database connection (not your output) is ever in a charset other than utf-8, then it matters a lot and you MUST use mysql_real_escape_string.  Examples are Shift_JIS or Big-5.
Quote from CJ Jackson on June 20th, 2011, 10:35 PM
I always used htmlspecialchars for data that goes in a html attribute, I rarely use them for anything else.
Not just attributes, but also text.
Quote from CJ Jackson on June 20th, 2011, 10:46 PM
Also the data that comes out of the database tends to be dirty, it like to keep the back slashes, so clean it with stripslashes() and always before htmlspecialchars().
You shouldn't escape twice (or allow magic GPC to get in your way) when inserting.  If you have to strip slashes on the way out, you're storing it with slashes in the database.
Quote from Arantor on June 21st, 2011, 12:05 AM
SMF (and Wedge)'s specific brand of content encoding going into the DB, where bbcode is concerned at least, is slightly odd, and one day I'll figure out exactly why it was done the way it was (remove newlines, expressly inject br tags into the stored content, after htmlspecialchars has been run)
This may have been done in the name of optimization, and some parts of it predated me and some parts of it were my fault.  I saw bbc as mostly as scary a beast as anyone did, so I was happy to make sure it worked.  Ultimately, I think the original text should be stored, but it becomes an upgrade problem (and who wants the upgrade on large forums to change all the text?)
Quote from Arantor on June 21st, 2011, 12:05 AM
@CJ Jackson: I'd rather not be trying to sanitise on output, I'd rather sanitise it when capturing it so that if something screwball tries to dump the contents of the DB, it's still going to be safe because there isn't anything dirty in the DB.
No reason not to sanitize both ends, IMHO.
Quote from Arantor on June 21st, 2011, 01:32 AM
At least with SMF, mods are vetted and generally have had oddities weeded out
I like the methodology I tried to push in SMF: if it's broken, fix it.  Same with XMLHttpRequest and etc.  I'll never understand why most devs don't go that way.  My company, in a standard test, has three lines that make XMLHttpRequest work in all browsers (yes, this test is getting a bit old...)  We watch for developers to (a) remove the line and use something else, e.g. jQuery, (b) use XMLHttpRequest directly, or (c) leave the line AND use their own XMLHttpRequest/ActiveXObject or whatever code (that is much longer and harder to read.)

Of people who take it, I think <10% pick a or b.  90% pick route c.  It baffles me.

That said, when people do detection or don't check ini settings properly, "just fixing it" can make integration harder.

-[Unknown]
36
Features / Re: Optimize release images
« on June 20th, 2011, 09:39 AM »
Quote from Arantor on June 20th, 2011, 09:19 AM
The avatars folder, yes, they weren't optimised much when I put the xkcd pack together originally, but the rest of the images should be optimised.
cd Themes/default/images

svn ls -R | xargs wc -c
 170768 total

find . -name "*.png" -print0 | xargs -0 optipng -o7
find . -name "*.png" -print0 | xargs -0 -n1 pngout

svn ls -R | xargs wc -c
 165021 total

cd avatars/xkcd

svn ls -R | xargs wc -c
 304040 total

find . -name "*.png" -print0 | xargs -0 optipng -o7
find . -name "*.png" -print0 | xargs -0 -n1 pngout

svn ls -R | xargs wc -c
 265055 total

cd media/icons

svn ls -R | xargs wc -c
  59003 total

find . -name "*.png" -print0 | xargs -0 optipng -o7
find . -name "*.png" -print0 | xargs -0 -n1 pngout

svn ls -R | xargs wc -c
  53070 total

That is, of course, counting all images, not just pngs.  Not an immense savings, but a savings.

-[Unknown]
37
Features / Re: Optimize release images
« on June 20th, 2011, 09:11 AM »
Quote from Nao/Gilles on June 20th, 2011, 08:47 AM
I believe I've already run everything through pngquant. I do it systematically for new files and keep a 32 bit copy in the other/images folder as well. ^_^
Just to clarify, I ran a quick optipng before posting this, and was able to shrink most every png to a smaller size.  And as mentioned, all of the pngs in avatars seem to be compressible (most of them by making them grayscale, which results in pretty good savings.)  I'm not suggesting compressing the gifs or jpegs to png, as they would likely be larger.
Quote from Nao/Gilles on June 20th, 2011, 08:47 AM
I dont automate it because png8 sometimes has awful results in ie6 btw.
How do you mean?  Aside from not supporting alpha transparency (in 8 bit or or 24-bit PNGs), I'm not aware of any awful results in IE6.  I must not be aware of it - what problems are there?

-[Unknown]
38
Features / Optimize release images
« on June 20th, 2011, 03:35 AM »
Running optipng on most images can really improve things.  At a cursory glance, it appears that many pngs, avatars, etc. can all be optimized with some amount of savings.

I suggest adding it to a release script, so it's never forgotten, something like this:

find Themes avatars media -name "*.png" -print0 | xargs -0 --no-run-if-empty other/tools/optipng -o4

Or even better, run it more often (e.g. when checking in png changes) and then it only has to process the changed files, which is much quicker.

For example, the avatars directory could currently save 40 KB if it were losslessly compressed (about 12%.)  This affects both distribution bandwidth and obviously admin's bandwidth.  Losslessly optimizing jpegs is a good idea too.

-[Unknown]
39
Features: Theming / Re: WeCSS: the Wedge CSS parser
« on June 20th, 2011, 02:14 AM »
I think you may be dreaming on the documentation front, but who knows, luck happens.

Well, I'm currently more of a fan of lesscss anyway (in part also because of its js-side implementation in addition to js and php server-side implementations.)  I know compass is uber popular, but it doesn't seem interesting to me.  I can see making the curlies optional, like in PHP with if, but in general I just like curlies.

I definitely want to give it a try.  Are you developing it as just part of Wedge or as a discreet module?  For any parsing system, like TOX-G, less, sassy, or this wecss, I definitely think test-driven development makes sense - do you have tests or just using the Wedge css as that for now?

-[Unknown]
40
Features: Theming / Re: CSS and JavaScript minification
« on June 20th, 2011, 01:47 AM »
Quote from Nao/Gilles on June 20th, 2011, 12:37 AM
Yes, it's based off Packer 3.1. Dean Edwards has been working on Packer 4.0, though, and he told me he'd be putting the PHP version online soon.
Sure, I'd like access, if only for this.  I considered spending the time to fix the PHP base2 and stuff, and even fixed it some, but I figured I'd check with him to see if it was a known issue he was fixing.  Never got a reply, so I left it alone.
Quote from Nao/Gilles on June 20th, 2011, 12:37 AM
Regarding semicolons, I provided a fix in the source file, but I commented it out because I think it's best to educate devs into using semicolons as much as possible.
I agree; keep it without the fix.  But report an error message with file and line info.  I also prefer this for trailing commas, since I want to be able to dev against the unpacked js. even in IE.

I definitely wouldn't want it to just pack my css or js every page view. Not only for the cost of the mtime IO checks, but also because it makes debugging harder by far.  I don't know if you do js debugging, but I definitely do, and debugging packed js is similar to debugging an optimized exe with no debug info.

That's why I'd suggest a dev mode switch.  Also because, it means you can have the dev mode only apply to administrators (with a conspicuous message, like when you leave upgrade.php uploaded), such that you can "stage" js and tox changes, test them, and then "push them live" for everyone with a click of a button.

Also, are css/js files served with the overhead of PHP?  I can show benchmarks that even on nginx/php-fpm, this overhead is not nothing.  I suggest providing some way to avoid if possible, as Apache and nginx both have deflate mechanisms, and its support is detectable.

Oh, and I automate everything I can.  Take a look at my TOX-G makefile - I don't leave it to chance that I forget the copyright year, or to run the tests, or etc.  And I have packing, optipng, etc. all automated as well.  If I can remove the human component, I do.

-[Unknown]
41
FAQs / [FAQ] Re: Minimum requirements
« on June 19th, 2011, 08:33 PM »
Would that still make it a requirement?  If an optional feature doesn't work, it seems like a recommended thing not a minimum requirement.

When I buy a game and don't see fancy shadows, it's not because my video card doesn't meet the minimum requirements... it's because it doesn't meet the recommended ones.

-[Unknown]
42
Features: Theming / Re: Template blocks
« on June 19th, 2011, 04:33 PM »
Quote from Nao/Gilles on June 19th, 2011, 03:40 PM
Or defined on the settings.XML override file yes.
It is simplistic but it's a good compromise.
Of course ideally I'd be using tox but I'm not sure themers wouldnt be lost. There are so many changes in Wedge already.
Posted: June 19th, 2011, 03:38 PM

Oh. And I believe there are several private topics regarding the birth of this feature and comparisons with tox.
I see it in the post now, sorry for missing.  Was just asking.

-[Unknown]
43
Off-topic / Re: A PHP fork?
« on June 19th, 2011, 11:35 AM »
I find it very interesting.  Honestly, I wish I had the time to sink my teeth into something like this.  These are some good improvements, and if things like this are getting rejected, I wonder how much good a fork could do in the world...
Quote from Arantor on June 16th, 2011, 01:26 PM
Forcing users to use GET and POST, rather than an ambiguous source is a nice step, though honestly I'd love to see a proper taint detection method such as in Perl, where you explicitly can't do anything to input without some kind of sanity check first.
Yes, although annoying, I agree.
Quote from Arantor on June 17th, 2011, 05:12 PM
I don't know which off the top of my head, but if it works how I think it works, it'll be GET - because what it can do is inject a <script> tag into the DOM for the browser to fetch the contents dynamically - and it'll be JSON when it comes in, presumably.
This is called jsonp and has problems with > 4k of data.  Some of the "easy" things jQuery does don't necessarily encourage best practice (I know this from having to do code review at work.)
Quote from Eros on June 18th, 2011, 02:37 AM
....I wouldn't call the POS IDEs for PHP proper IDE's either. Then again, the only thing I think Microsoft ever did right was Visual Studio so....:/
/quote]

Indeed.  I use Phalanger myself for PHP and it works great.  I bet it could be hacked relatively easily into supporting a JSON-like array syntax.

-[Unknown]
44
FAQs / [FAQ] Re: Minimum requirements
« on June 19th, 2011, 11:04 AM »
I wouldn't waste my time with the Chrome 1 requirement.  You would be surprised at how hard it is to lock Chrome to a single version.  I don't think it's worth worrying about older than 6 or 8 or something at this point, and even probably not that far back.

Why might Flash be required?
Quote from Nao/Gilles on March 28th, 2011, 09:43 AM
We included jQuery but mainly for user interest. i.e. we don't actually need it ourselves, but we figured that since it's the de-facto library for developers, many would be happy to see it included by default, avoiding the need to include the library separately (potentially breaking other mods using it in the first place.) Everything is taken care of for them. However, early versions of $ had a reasonable filesize. Now, v1.4.4 is about 24kb after gzipping, which is scandalously big. It pretty much makes it hard for 56k modem users to run a site that uses jQuery. And the newest 1.5.x branch is even worse (I suspect v1.5.3 will finally reach the 30kb limit.)
I am constantly astounded by how many nifty features (like deferred) I never have any use for, and how many much simpler core things that I've used in my own js libraries it completely misses (like currying.)
Quote from Nao/Gilles on March 28th, 2011, 09:43 AM
I guess I still haven't found my peace with jQuery. It has some nice features, but nothing that we couldn't implement ourselves. Hopefully, v2.0 will be modular... But I suspect that even if they added some kind of modularity to it, it would still be bigger than the entirety of 1.4.x.
Ha.  Well, at least they definitely know what they're doing.  It's becoming a standard though, such that I wouldn't be surprised if it goes the way C did: Intel optimizes CPUs for C, so if you don't use C, your code is slower.  This is why all languages are based on C's rules nowadays.  I expect Jaeger and V8 and etc. to get optimized for jQuery, so it's probably going to become a standard.  If it keeps getting more popular, I suspect it'll be bundled and detected at some point by browsers for further perf wins (since we're currently in a perf war.)
Quote from Nao/Gilles on March 28th, 2011, 07:17 PM
Re: filesize, it's also less of an issue when files are loaded at the end. That's where perceived loading times come into play. The only thing that's slower in that situation is the execution of JS functions. But that only means your code should behave as if the browser doesn't support JavaScript for a couple of seconds, and then JS takes over. It's as easy as that. (Of course it's not really exciting to write fallbacks for non-JS but you can always simply skip that and just expect people NOT to click everywhere in the first two seconds of loading your website for the first time... Which is, let's just say it, totally unrealistic. The act of clicking so quickly, I mean.)
You can get some good wins out of deferring some of the js.  I'm going more the js app route myself, so a mere 20K doesn't sound like a lot (although it all adds up.)  I'm currently more in the realm of 270KB (before deflate or minification, about 41KB after) but I late-load about half that.  This is for a relatively complicated piece of internal software, though.

-[Unknown]
45
Features: Theming / Re: WeCSS: the Wedge CSS parser
« on June 19th, 2011, 10:42 AM »
I belong to the school of semantics, by far, although I think using inheritance is only sensible.

I know I'm often the first person to write my own version of something, and I love learning from it and see if I can do better.  And here as well, I think writing a PHP version makes a lot of sense.  But, considering Sassy recently moved to using CSS-style syntax, and I personally hate Python style syntax, perhaps it makes sense to gravitate to the rules of another syntax?

That could make the documentation needs lighter.

-[Unknown]