Remove nested quotes is broken

live627

  • Should five per cent appear too small / Be thankful I don't take it all / 'Cause I'm the taxman, yeah I'm the taxman
  • Posts: 1,670
Remove nested quotes is broken
« on September 4th, 2011, 08:55 AM »
The feature to remove nested quoting is broken a la http://www.simplemachines.org/community/index.php?topic=450841.0

I then tested it myself. Ironically, it removed my second first-level quote!

So I spent the evening trying to fix it. I didn't try the proposed tweak, because I felt that it wouldn't do it right. Instead, I sniffed around and ended up with this here'code[1]:

Code: [Select]
// Remove any nested quotes, if necessary.
if (!empty($modSettings['removeNestedQuotes']))
$pattern = "/\[quote(.*?)\](((?R)|.)*?)\[\/quote\]/is";
preg_match_all($pattern, $form_message, $matches, PREG_SET_ORDER);
foreach ($matches as $match)
{
$block = preg_replace(array('~\[quote(.*?)\](((?R)|.)*?)\[\/quote\]~is', '~^\n~', '~\[/quote\]~'), '', $match[2], -1, $count);
$form_message = str_replace($match[2], $block, $form_message);
}
 1. A bit messy, could use a bit of cleanup. Maybe it could be a user option instead of an admin setting.
A confident man keeps quiet.whereas a frightened man keeps talking, hiding his fear.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #1, on September 4th, 2011, 12:15 PM »
I didn't even know it didn't work in the first place... :^^;:

Interestingly, since you have SVN read access, go to /other/wip/Post-alt-quotes.php and have a look at my code... The code that will turn quotes into ">" prefixed lines uses that:

Code: [Select]
while (preg_match_all('~\[quote(.*?)]((?:[^[]|\[(?!/?quote))+)\[/quote]~', $form_message, $matches))

Code means (for the non-regex-experts): match a string starting with [ quote] or [ quote something], then anything that isn't "[/quote]" (that's the complicated code in the middle!), and then[/quote].
Which is pretty much what (?R) (aka recursive regex) will do (except that (?R) doesn't require removing the quote tags later to avoid matching them again, obviously...)
I try to avoid using (?R) these days because of the difficulties I had fixing the server crashes that Karl experienced on AEVAC... (Heck, I don't even know if recursive regex won't start failing if we end up removing the size limit on posts like Pete suggested this week...)

Anyway. I don't really know what's best...?
Try adding (?>...) in your recursive regex, BTW. You can find examples in Aeva-Embed.php...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Remove nested quotes is broken
« Reply #2, on September 4th, 2011, 01:07 PM »
Quote
Heck, I don't even know if recursive regex won't start failing if we end up removing the size limit on posts like Pete suggested this week...
There is a distinct increase in chance it will fail, in one of two ways.

PHP has its own time limit, of course, but there is also a secondary performance limiter in regexp which is the backtrack limit. For those not so hot on regexp, every time the engine steps through an expression that it's started to match (first character matches, second character matches, third character doesn't, for example), it has to go back to where it started to re-evaluate the next character, which is a backtrack.

There is a limit on the number of backtracks allowed in a single overall call to the preg_ functions, and it's usually pretty high (I seem to recall the default being 100k or so), but when you get recursive calls, particularly if the input is designed so it would quite badly end up backtracking as many times as possible (this is known as the pathological case, or the grounds for ReDoS), it starts to hit that limit.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

live627

  • Should five per cent appear too small / Be thankful I don't take it all / 'Cause I'm the taxman, yeah I'm the taxman
  • Posts: 1,670

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #4, on May 31st, 2012, 11:15 PM »
Enabled feature on my local install... Added two quotes with a comment in between... Hit quote on that post... Both quotes were removed correctly, and the comment was left in.
Doesn't seem broken to me..?

live627

  • Should five per cent appear too small / Be thankful I don't take it all / 'Cause I'm the taxman, yeah I'm the taxman
  • Posts: 1,670

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #6, on May 31st, 2012, 11:23 PM »
It's not possible in the first place, because the first level is already removed...

Oh, you mean, as if the feature was disabled, a quote pyramid was made, and then the feature was enable...?

live627

  • Should five per cent appear too small / Be thankful I don't take it all / 'Cause I'm the taxman, yeah I'm the taxman
  • Posts: 1,670

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #8, on May 31st, 2012, 11:28 PM »
Even weirder... I made a pyramid, and it doesn't fail at all. Everything works.
Posted: May 31st, 2012, 11:28 PM

Hmm, no, it removed too much actually. A bit.
Will fix asap.

live627

  • Should five per cent appear too small / Be thankful I don't take it all / 'Cause I'm the taxman, yeah I'm the taxman
  • Posts: 1,670
Re: Remove nested quotes is broken
« Reply #9, on May 31st, 2012, 11:34 PM »
Quote
it removed too much actually
Yep. I know you're good at tweaking regexes so I'm sure you'll be able to find a better way to fix it than me.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #10, on May 31st, 2012, 11:49 PM »
This one seems to work fine...

Code: [Select]
if (!empty($settings['removeNestedQuotes']))
while (preg_match_all('~\[quote(.*?)]((?>[^[]|\[(?!/?quote))+)\[/quote]\n?~i', $form_message, $matches))
foreach ($matches[0] as $match)
$form_message = str_replace($match, '', $form_message);

It's not uber-optimized (if only because it does the replacement in X passes, with X = quote depth level), but at least it should be relatively safe with backtracking. I don't think it could lead to regex abuse. As I said above, I try to stay away from recursive regexes as much as possible these days, but it'd be great if you could do some quick benchmarking on heavy pyramids... (I don't think the performance difference will be noticeable if quote depth <= 1)
Posted: May 31st, 2012, 11:44 PM

:edit:

Code: [Select]
while (preg_match_all('~\[quote(.*?)](?>[^[]|\[(?!/?quote))*\[/quote]\n?~i', $form_message, $matches))

Should be a tiiiiny bit faster... (and take into account possible empty quotes.)
Posted: May 31st, 2012, 11:46 PM

And that.

Code: [Select]
while (preg_match_all('~\[quote(.*?)](?>[^[]|\[(?!/?quote))*\[/quote]\n?~i', $form_message, $matches))
$form_message = str_replace($matches[0], '', $form_message);

Funnier.

MultiformeIngegno

  • Posts: 1,337

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Remove nested quotes is broken
« Reply #12, on June 3rd, 2012, 07:38 PM »
There's nothing more powerful when it comes to analyzing strings.
Although there are faster solutions if the regex is simple enough...