Does this make sense? I've been wondering, wouldn't mysql_real_escape be enough while appending to DB?
I never used mysql_real_escape, I always used prepared statement because I find them a lot easier.
I always used htmlspecialchars for data that goes in a html attribute, I rarely use them for anything else.
Also the data that comes out of the database tends to be dirty, it like to keep the back slashes, so clean it with stripslashes() and always before htmlspecialchars().
SMF (and Wedge)'s specific brand of content encoding going into the DB, where bbcode is concerned at least, is slightly odd, and one day I'll figure out exactly why it was done the way it was (remove newlines, expressly inject br tags into the stored content, after htmlspecialchars has been run)
@CJ Jackson: I'd rather not be trying to sanitise on output, I'd rather sanitise it when capturing it so that if something screwball tries to dump the contents of the DB, it's still going to be safe because there isn't anything dirty in the DB.
At least with SMF, mods are vetted and generally have had oddities weeded out
I say this even though I think SMF does do it for posts at least, but there are many mistakes I either made or didn't fix when I wrote SMF.
Ultimately, I think the original text should be stored, but it becomes an upgrade problem (and who wants the upgrade on large forums to change all the text?)
No reason not to sanitize both ends, IMHO.
I like the methodology I tried to push in SMF: if it's broken, fix it
Of people who take it, I think <10% pick a or b. 90% pick route c. It baffles me.
I wonder if preparsecode's changing everything into br tags is a holdover from YaBB where everything was in flat files. (I don't know how it was stored, but it seems feasible to me)
That's one of the things we can do something about. We've expressly set ourselves on the path of having an importer rather than just 'upgrading' the existing tables - aside from the fact it lets people run them both side by side to experiment, it also means we can do manipulations along the way like fix any of these things we decide to resolve.
Depends what you mean by sanitise. htmlspecialchars both ends strikes me as a bad idea, for example.
Well, the methodology I was referring to was that all mods using the database should use $smcFunc's functions, and should be using the proper parameterisation, or the proper insert method made available that deals with escaping etc for you.
were still deferred through some "well, it might cause regressions" mantra.
We use jQuery in Wedge. Not just because it means we get to minimise the code to be sent to users (since the admin can pick a CDN copy of jQuery), but the time we spent writing JS is shorter, and I suspect less time is also spent in debugging. Plus a lot of users do add stuff that makes use of jQuery, so having it in the core means plugins won't try each adding it and falling over each other.
Definitely the br thing was. And I know the nobbc/html thing was just because that was the easiest way.
While I agree, it also means large forums (where's Douglas at anyway?) are a big problem.
Sure. But I explicitly don't agree with the "garbage in, garbage out" philosophy when it comes to security. An example: just because you only put valid filenames into the attachments table doesn't mean you shouldn't validate that assumption on the way out too. This also improves upgrade scenarios anyway (if done right.)
Well, at work, this is a pretty common mantra. I'm even holding back changes on a project right now for this reason.
Back whenever ago, I posted the same basic thing on simplemachines.org; if you need more RCs, release more RCs. There's no shame in releasing RCs
Well, jQuery is fine. Most people who take the test, even if they add jQuery, leave that line in and don't touch it; they clearly don't understand it. That's what's a bad sign.
My problem with it is that it's got a lot of features, most of which I don't need, and doesn't provide most of the features I do.
I think it's important that some element of that is continued to be done, but how far do you go with it? At what point is it acceptable to trade off performance for security?
On the flip side, there are things like topic subjects, which aren't resanitised on display, what's in the DB is assumed to be safe.
It reattributes old posts to a new user, e.g. new account after deletion, but it doesn't do topic ownership at the same time.
You know something's wrong when they're accepting mods on the mod site that are bug-fix mods. I only wish I were kidding.
You can say that about a lot of things, though. It's almost the same argument against something like Microsoft Word: just because you might only use 2% of its features, that sounds like an argument for stripping out the rest - until you realise that everyone else uses a different 2% of its features.
This wasn't done for security reasons (although it's not necessarily a bad thing security wise.)
And lastly because you need to be able to edit your posts, and we knew there would be bugs if we tried to "reverse the translation" - but were worried about the additional storage requirements of storing two copies of every single post.
1. String parsing in PHP is slow.
2. Caching posts needs a bit of a better solution
What I've done is an "indexed cache", which means I mark entries in the cache that are popular, and over time, cache those popular items.
3. The parse_bbc() routine is far from perfect
4. Someone needs to profile parse_bbc()
Right. I think htmlspecialchars is relatively cheap, so I'd either do it on display, or else communicate the data in a text format (such as json, although that has its own escaping) and set it as text via the DOM.
Hmm. I feel like we had something in other/tools/ that did this and did do topic reattribution, which Compuart I think wrote. Weird.
centralized bug fixes in an unofficial package that I updated every day or two.
I'm the guy who always wanted to remove the calendar from the standard distribution of SMF.
Oh? Well, I never went as far back as YaBB or YaBBSE, so I'm looking at it all from SMF 1.1 and later's perspective, and what I've seen said. My understanding was that SMF 1.0 did bbc parsing through regexp, on display, but that a vicious ReDoS could take it out because it wasn't protected against that, hence it was doing through string parsing and done on a regurgitation basis, and that security was one of the major factors.
Funny you should mention that, that's exactly what the WYSIWYG editor tries to do, badly. So much so that it doesn't bother converting anything other than simpler HTML.
No argument. Interesting comment there, there were plans to rewrite the bbc parser in C under the banner of smflib. (Not sure if that was after your time or not) but it never really went anywhere. It's still in SMF's SVN, untouched in pretty much forever.
I think whatever happens, we can't really afford to leave PHP, it's not like we can just conjure up a parser in C, and even if we could and did, I'd honestly not want the hassle of support for something like that.
if a post takes over a certain length of time to parse
This sounds like a good idea to me. I'm not sure offhand how we'd do it, but it seems more reasonable than just bulk throwing things at cache.
I really need to get properly familiar with TOX-G's innards. Is there anything specific you're thinking about how it operates that you'd do differently?
*nods* It does need to be done properly. I never seem to find the time to do this though :/
So, then, if we were to htmlspecialchars it on output, presumably we wouldn't do it on saving the data in the first place? Or would we unsanitise it before resanitising it?
I'd really rather not give them any more ways to make it easier to make them insecure.
Right now the calendar is seemingly used by the relative minority. A small but vocal group use it for events, and a larger group turn it on because of birthdays.
I look at WordPress and I see they have it. Different horses for different courses but they have a calendar in the core, enabled by default. Because it suits them to do so.