Nao

  • Dadman with a boy
  • Posts: 16,082
Fixing mismatched BBCode
« on December 4th, 2011, 11:59 AM »
The code in Class-Editor that straightens out missing code tags is quite odd to look at... Some variables are initialized and never actually used, etc. I looked into it and decided to rewrite it. As a result, I did pretty much the same thing -- in only three lines of code instead of 10.

Now, the thing is that the I'm using a $num_tags variable instead, which is incremented or decremented based on whether it meets an opening or closing tag. Although technically it works (if you have two opened code tags, it will addd two closers), in reality it doesn't work as expected. Two examples:

- if you first use a closer, then an opener, $num_tags will be set to zero and nothing will be fixed. The code needs to be smarter than that.
- if you use two openers and a closer, Wedge will add an extra closer... Except that code tags within other code tags are treated as code (not tags), so they don't need a closer. It's always been a mess in SMF because if you add two closer tags here, the second one is shown as plain text, instead of the first one. Here, I'd tend to say this is a SMF bug that needs to be fixed in Wedge... What do you think?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #1, on December 4th, 2011, 12:57 PM »
What needs to happen is that it needs to understand nesting hierarchy properly, not just for code tags (which will solve other issues of stupidity)

As an exercise, modify the post in the DB or hand it directly to parse_bbc to see what that does. That actually keeps track of hierarchy to a point.

Code tags inside code tags won't (and can't, by definition) ever nest properly. The innermost closer will always be hit first, ending the tag, so what comes after it will always be wrong as it will be the remainder of the code tag outside the original code tag. And it cannot and should not be assumed that a code tag nested inside a code tag will have an inner closer either.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #2, on December 4th, 2011, 11:13 PM »
Yeah, it's all a bit complicated... If in SMF you type /code then code, it will simply add the closing code at the end, and leave the first /code as is.

Here's my original rewrite:

Code: [Select]
if (preg_match_all('~\[(/)?code[^]]*\]~i', $message, $matches))
foreach ($matches[1] as $tag)
$num_tags += (empty($tag) ? 1 : -1);

// If we have open tags, close them.
while ($num_tags > 0 && $num_tags--)
$message .= '[/ code]';

// Open any ones that need to be open, only if we've never had a tag.
while ($num_tags < 0 && $num_tags++)
$message = '[code]' . $message;

I think it's quite elegant, but it has the problems I mentioned above.
Now... Why exactly is it so important to have matching tags? AFAIK, because the parser doesn't do recursive regex magic, it doesn't really care whether there are mismatched tags. In fact, it's even worthless to add a code tag before the post is there's an extra /code overall: because the /code is probably a mistake on the user's part, and I'm sure they'd rather see an untransformed post, rather than everything turned to a large code block for no reason whatsoever (in their opinion)...

So, technically, there is not even a strong reason to have the last 3 lines in the code above. It won't change a thing...
Heck, could probably even get rid of it all... :-/

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #3, on December 4th, 2011, 11:17 PM »
Quote
Why exactly is it so important to have matching tags?
Generally, it's not a huge deal because the parser will close any tags at the end (so that the tag endings will be output correctly) but for code tags, processing is quite a bit different in the bowels of the bbc parser, because the content is actually split by code tag, and the loop can safely deal with extra /code closers but it'll get very upset if there's more openers than closers.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #4, on December 4th, 2011, 11:44 PM »
So... How about we forget about closers, and only deal with fixing openers?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #5, on December 4th, 2011, 11:53 PM »
The parser logic is smart enough to cope with closers (even respecting block level vs non block level tags to a point) - except for code closers. What do you have in mind as far as fixing openers goes?

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #6, on December 5th, 2011, 12:38 AM »
So, do you mean that the code I posted above would be 'perfect' for Wedge?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #7, on December 5th, 2011, 12:54 AM »
Where does that code run, exactly?

(Haven't looked at the commit yet)

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #9, on December 5th, 2011, 09:45 AM »
Hmm, that's just before the splitter code runs, interesting.

I don't think you can cause any kind of vulnerability that easily with it but you can cause all kinds of mess with it if not careful, because the splitter explicitly relies on having matched tags when it does the split.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #10, on December 5th, 2011, 10:27 AM »
I'm actually considering rewriting the splitter to use the strpos (with offset) technique I used a few months ago... It's probably going to be faster, and more solid.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #11, on December 5th, 2011, 10:31 AM »
*nods* Sounds like a plan to me :)

Though at some point the preparser needs to be gutted to actually examine the post with respect to the tag's definitions, rather than relying on its own defined rules (so that table->tr->td isn't defined through some esoteric regexp but because that's how the bbc rules define it - which means it becomes possible to add attributes, or even, dare I suggest a th tag)

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #12, on December 5th, 2011, 11:48 AM »
But that would mean a fuller rewrite of preparsecode, which I'm not ready/willing to do... ;)

So, I'm looking at the code tag stuff and all I see is that it's mainly used to deal with special tags only when they're outside code tags. It'd be more solid to actually search for [ code] from the current offset, make a string out of it, deal with it, then search for the next [ /code] (if not found, add [ /code] at the end of the post), set it as the new offset, and loop.

PS: long PMs eh...! Will answer later today.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Fixing mismatched BBCode
« Reply #13, on December 5th, 2011, 12:23 PM »
Quote
But that would mean a fuller rewrite of preparsecode, which I'm not ready/willing to do...
I know, and same here. But that is the eventual goal.
Quote
So, I'm looking at the code tag stuff and all I see is that it's mainly used to deal with special tags only when they're outside code tags.
Well, yes and no. The main things it's used for are to make *sure* that url and img tags are legal. It does also serve to carry out certain other exception handling that won't be applicable inside code tags.

Basically, it's the bit that needs overhauling anyway, so provided that the bits between tags remain accessible for manipulating (bearing in mind that they could be rewritten in the process) I'm cool with overhauling it with strpos.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Fixing mismatched BBCode
« Reply #14, on December 5th, 2011, 07:21 PM »
My new rewrite is a bit messy and longer. What do you think about this: I'm parsing the message to remove any misplaced code tags. Eg two opening tags? The second one is removed. Same for closing tags. Etc. Not in my pc so I won't elaborate. This of course makes it impossible to type in a code tag inside another but who cares.