This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
6916
Features / Re: Fixing mismatched BBCode
« on December 9th, 2011, 12:03 PM »Then don't change the output for bbcode and just save as raw data and be done with it.
There are many cases where it can be 'fixed' by using CSS and only applying the changes to them, but when you start using JS in a bbc tag...
Not sure I agree with that. I suspect more people would use the WYSIWYG editor if it weren't so buggy. Hell, I could see myself using it if it had proper keybindings (which IIRC some of the editors do support), because it's even faster to hit Ctrl-B to go bold than it is to use the bbcode.
Well, if we could get the wysiwyg editor to actually show quotes as HTML, it would certainly solve a lot of issues. (And that's where my automatic quote splitter would come in very handy, because that'd really be the only way to split quotes at all...)
The biggest stumbling block to WYSIWYG will always be complex bbcode that has no direct equivalent in HTML, e.g. the code or quote tags, not the simple stuff.
It's a brave thing you've taken on there - overhauling the preparser/parser for tag mismatches was always on my todo list but I haven't been brave enough to tackle it just yet.
I'm inclined to go with this, provided that the user is made aware that there was a change and that they might want to review the post (since an extra b closer tag has been added and there is now going to be unparsed b closer left behind) and hopefully they'll spot that the s is unopened.
6917
Features / Re: Fixing mismatched BBCode
« on December 9th, 2011, 10:46 AM »[quote][i] [b] Hello [/i] [/b] [/s][/quote]In a situation like this, my original code would first fix the mismatched tags in the middle, then it would look for an opener to 's' and eventually add one at the very beginning... Which would break everything because there's a quote in between. So I added a 'last_safe' variable which pointed out what was the last *safe* place to insert something (i.e. anything BEFORE it is considered valid and thus shouldn't be messed with again.)
Problem is, in a situation like that, the variable would be, at best[1] set to... The s closer's position. So we'd end up with an opener, immediately followed by a closer.
So... I'd like to know which you think is best. Shall we:
1- silently remove closer tags if no openers were found?
[quote][i] [b] Hello [/b] [/i][/quote]2- add an opener right before them?
[quote][i] [b] Hello [/b] [/i] [b] [/b] [s] [/s][/quote]3- leave them be, whatever, except maybe for code tags?
[quote][i] [b] Hello [/b] [/i] [/b] [/s][/quote]Posted: December 9th, 2011, 10:44 AM
:edit: Added a footnote. BTW, my favorite is (1), personally...
| 1. | I'm saying "at best" because there's still an open quote at this point, so I'd have to implement code that would check LATER tags to make sure the quote is actually closed itself and is thus safe... |
6918
Features / Re: Fixing mismatched BBCode
« on December 9th, 2011, 09:59 AM »
- wysiwyg is still hell... Saving as raw data is worse. Especially when you start changing the output for a bbcode...
- the main issue with bbc vs html, I think, is that most forum users are used to bbcode. So, basically, even if you enabled basic html tags by default, we would still have to support bbcode for those who don't know about html etc... (at best, we could turn it into pseudo-html at parse time;)
- wiki markup isn't very popular, I reckon. If popularity wasn't an issue, I'd have switched to SPIP code long ago :P
- overall... yeah, bbcode is about consistency, mainly. Although I suppose nothing prevents us from adding alternative pseudo-code for people to choose from. But it always makes things more complicated.
I've given up on the recursive code. Anything that ruins my sleep is not something I should keep really. It's a little alarm clock in my head. So I'm back to my 'original' code and instead of going through the list of recorded tags, I'm going back through the stack... It's probably not going to give fantastic results - for instance, I was storing the tag type until now. If you were trying to find the best place for an opener for an orphan '/i', Wedge would go through previous tags, spot a closing quote and stop immediately because it can't enclose quotes inside italics. Things like that... That was pretty cool, but it doesn't work for that friggin' closer nb I mentioned before, and one day hunting for a bug is enough. I'll just make it pretty simple. I HOPE. No guts, no glory. Whatever.
- the main issue with bbc vs html, I think, is that most forum users are used to bbcode. So, basically, even if you enabled basic html tags by default, we would still have to support bbcode for those who don't know about html etc... (at best, we could turn it into pseudo-html at parse time;)
- wiki markup isn't very popular, I reckon. If popularity wasn't an issue, I'd have switched to SPIP code long ago :P
- overall... yeah, bbcode is about consistency, mainly. Although I suppose nothing prevents us from adding alternative pseudo-code for people to choose from. But it always makes things more complicated.
I've given up on the recursive code. Anything that ruins my sleep is not something I should keep really. It's a little alarm clock in my head. So I'm back to my 'original' code and instead of going through the list of recorded tags, I'm going back through the stack... It's probably not going to give fantastic results - for instance, I was storing the tag type until now. If you were trying to find the best place for an opener for an orphan '/i', Wedge would go through previous tags, spot a closing quote and stop immediately because it can't enclose quotes inside italics. Things like that... That was pretty cool, but it doesn't work for that friggin' closer nb I mentioned before, and one day hunting for a bug is enough. I'll just make it pretty simple. I HOPE. No guts, no glory. Whatever.
6920
Features / Re: Fixing mismatched BBCode
« on December 9th, 2011, 12:25 AM »
- only handles mismatched tags, but it's still a lot. I never really planned to rewrite the entire parser... then again, nothing prevents us from writing similar code for other uses.
- nothing prevents us from doing that either... (?)
- well, i dunno then.
- i certainly like the idea of a very fast parser, but we'd have to determine if it features code that can fix bbc without turning it to html. I doubt it has.
- PEAR isn't important here -- just requires a few rewrites to give up on the dependency. then again -- NBBC is > 120KB (60KB after some sort of minification), so that pear library isn't that horrible to begin with.
- nothing prevents us from doing that either... (?)
- well, i dunno then.
- i certainly like the idea of a very fast parser, but we'd have to determine if it features code that can fix bbc without turning it to html. I doubt it has.
- PEAR isn't important here -- just requires a few rewrites to give up on the dependency. then again -- NBBC is > 120KB (60KB after some sort of minification), so that pear library isn't that horrible to begin with.
6921
Features / Re: Fixing mismatched BBCode
« on December 9th, 2011, 12:12 AM »
Oh... Not a good sign... I'm already lost in my recursive code... :-/
PHP has a PECL library for bbcode parsing, but it only returns html, not fixed bbc.
There's also a PEAR library written in PHP, it seems to be nicely written albeit a bit large (30KB for main source + more for specific tags like url...), and doesn't provide a fixer by default -- but it does seem to fix tags on its side.
I've also found "NBBC" on sourceforge, it has an extensive test suite and documentation. Unfortunately it doesn't provide a 'check' mode either, only converts directly to html but at least this one is BSD.
It doesn't generate opening tags automatically when finding an orphan closed tag, though... So apart from its alleged speed, it's not that interesting. Plus, it's huge.Quote from Arantor on December 8th, 2011, 11:31 PM I think the whole point of the html5 tokenizer is to provide for a common ground to handle errors...?Quote Yes, that's pretty much what fixNesting does...Quote NBBC does that. (Wedge, too, obviously. But not fixNesting.)Quote The reporting is already being done... If it were JUST about that, I wouldn't have worked on the alternative 'fixer' today.
Unfortunately I can't really feel satisfied with just the report code because it's not going to be used in quick edit etc.
(Well... Unless we add an errorbox for quick edit, of course. Which would not be such a bad idea...)Quote The end of the line? :PQuote We could always draw our inspiration from the best elements in wiki code...
Anyway. Time for bed.
I would have posted my 'bad work in progress' from today, but the source code is pretty fucked up (commented out code, echos and print_rs everywhere...), and I don't want to ruin my reputation :P
PHP has a PECL library for bbcode parsing, but it only returns html, not fixed bbc.
There's also a PEAR library written in PHP, it seems to be nicely written albeit a bit large (30KB for main source + more for specific tags like url...), and doesn't provide a fixer by default -- but it does seem to fix tags on its side.
I've also found "NBBC" on sourceforge, it has an extensive test suite and documentation. Unfortunately it doesn't provide a 'check' mode either, only converts directly to html but at least this one is BSD.
It doesn't generate opening tags automatically when finding an orphan closed tag, though... So apart from its alleged speed, it's not that interesting. Plus, it's huge.
They still deal with mismatched tags and even malformed tags differently.Quote With most of the browsers now using a unified parsing model (the HTML5 parser), it's no longer an issue.
What the parser ultimately does is step through the post and figures out the tags in play, and when it hits a closer (especially of a block level) or certain combinations of block level opener, it reviews all the tags that are open and closes some or maybe all of them.
At each point, not only is the list of open tags maintained, plus block level evaluation, but the potential tags that can be contained inside each other, plus dependencies, are all reviewed too.
Now, if we were to move the full logic from the parser to the preparser, we'd be able to trap improperly nested tags too, and report on them properly.
Unfortunately I can't really feel satisfied with just the report code because it's not going to be used in quick edit etc.
(Well... Unless we add an errorbox for quick edit, of course. Which would not be such a bad idea...)
Here's the problem: what ends the list item? What ends the list?
This is one place where I'm actually slightly envious of wiki markup because it actually does it sanely. It has no assumptions about hierarchy, one line is one list item, and the first blank line after the fact is the end of the list. If only it were that simple with the one character shortcuts.
Anyway. Time for bed.
I would have posted my 'bad work in progress' from today, but the source code is pretty fucked up (commented out code, echos and print_rs everywhere...), and I don't want to ruin my reputation :P
6922
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 11:19 PM »It's a complicated task at the very best of times :( And it shows in browsers too, when they have malformed tags to deal with, some ignore them totally, some make assumptions with interestingly unpredictable results. I remember having a conversation with a fellow geek back in 2000 which illustrates this perfectly: he was building a site with a big complex data-heavy table in it, and it worked perfectly in IE but broke horribly in Netscape. As I discovered... he wasn't putting any of the closing tags because 'it doesn't need them'. Well, it obviously does!
It's also a rabbit hole of a problem, in that no matter how clever you get, you can pretty much always find another example that will break it. The issue is where the line gets drawn.
This is also why I think we will need to adopt the logic used in the parser to unwind and reprocess the tag nesting, simply because it's a lot more than just having balanced tags. It's a pain to get right but the result will be worth it in the end.
What do you reckon about the special list of tags like x or *? Do we really need them? I actually think they cause more trouble than they're worth - and they're a perfect poster child of why this whole issue is a problem, since no-one seems to know how to actually safely end such a list.
SPIP actually supports turning opening dashes automatically into bullet points. At least it did back in 2003... Ah, good old times.
6923
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 10:22 PM »
Yeah... Didn't think about this all...
Well, it's all pretty fucked up really. A waste of my time... -_-
Basically, it works 100%, until you meet some of the more complicated stuff I have in my test case. And I don't see how to fix it without, tadaaam... Another full rewrite!
If I do this... I'll probably have to, hmm... Do it recursively... Oh, I don't like that... :-/
Y'know, like, "if a tag is opened and is not self-closed, re-run the function on the string AFTER that tag, asking for it to return after it meets the closer tag..." And if it meets a closer that's not the one we're expecting, we'll just add our closer there (manually), and return from the function.
What annoys me the most is that every time, I get this very simple idea that ends up being flawed in one aspect or another... I just don't want to spend another day on that.
PS: any special tags that aren't in the list of double tags, like x or * or whatever, are not important here because Wedge will simply ignore them anyway.
PPS: my test case is as such. It chokes on the first [/nb] (which doesn't have a matching opener at this point because we already rewrote the first opener to add a closer to it.)
Code: [Select]
Well, it's all pretty fucked up really. A waste of my time... -_-
Basically, it works 100%, until you meet some of the more complicated stuff I have in my test case. And I don't see how to fix it without, tadaaam... Another full rewrite!
If I do this... I'll probably have to, hmm... Do it recursively... Oh, I don't like that... :-/
Y'know, like, "if a tag is opened and is not self-closed, re-run the function on the string AFTER that tag, asking for it to return after it meets the closer tag..." And if it meets a closer that's not the one we're expecting, we'll just add our closer there (manually), and return from the function.
What annoys me the most is that every time, I get this very simple idea that ends up being flawed in one aspect or another... I just don't want to spend another day on that.
PS: any special tags that aren't in the list of double tags, like x or * or whatever, are not important here because Wedge will simply ignore them anyway.
PPS: my test case is as such. It chokes on the first [/nb] (which doesn't have a matching opener at this point because we already rewrote the first opener to add a closer to it.)
[/quote][quote author=Nao link=msg=1][b]
Lorem ipsum?
[/b][/quote]What is that?
I don't speak rubbish![nb]I'm wondering if Rory, though? He had time to learn...[ /code][ code][ nb][ /code][ /nb][quote]post1[/quote][b] [s] comment[/quote]post2[/quote]6924
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 06:53 PM »Does fixNesting deal with block tag nesting mismatches or just nesting mismatches?
It supposedly fixes any mistmatched tags and that's all... It adds missing openers when it finds orphan closers, and it adds at the end closers to match orphan openers. And believe me, it's already hard enough to manage as it is... I've been on it all day, and it's still pissing me off right now. Granted, it's quite a complex string I'm working with (basically -- if it works with it, it'll work with everything), but right now my code is headache-inducing, and it starts failing after a few fixes... (hint: try not to insert data into an array you're *looping through at the moment*... I should probably reset the loop every time instead of trying to account for all changes...)
Also, once nesting and mismatches is fixed, we will also need to look at dependencies and must-contain/must-not-contain rules too, which are also specifiable in the bbc parser...
preparsecode does a lot of things, actually. I think I listed what it did in a previous post,
Part of the reason parse_bbc has it and not preparsecode is that posts added to the DB through other sources that won't have come through preparsecode originally, and that's not just for modders (for example, this will include the importer unless we push every post through some kind of fixer during the import)
I always wanted to move that to the preparser anyway to remove this dependency on strange, naive regexps that didn't allow for customising the table tag or adding th tags, without rewriting all that stuff as well.
6925
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 10:00 AM »They shouldn't be. The bbc parser should actually close both the s and b tags, honouring proper hierarchy, when it gets to the end of quote.Like so, in fact.Quote post1
it's best to first close any opened tags, and THEN go through the search for a place to add a new quote opener.
(And if you check the source, you'll see it's unmodified.) It works because there's the end of a block level tag with unresolved non block tags inside it. The exact behaviour is incredibly complicated, and is no doubt one of the reasons why the bbc parser is so big and scary - but it's also resilient.
This is why I asked if the idea behind this was partly to reduce its complexity or not, because it actually does a lot of silent fixing that most people don't even realise.
::parse_bbc does some fixing on its own, IIRC, but if it does, it's in the wrong place. It should be done at save time, obviously. (AND, we should remove any fixer code from ::parse_bbc to force modders to go through ::preparsecode. Believe me, I didn't even know this function existed before Shitiz used it in SMG, and I didn't have the *reflex* to use it systematically until, err.... Now?)
The problem as you've discovered with writing such a solution is that unless you can get inside the user's brain and figure out what they meant, rather than what they typed, you have no hope of getting it consistently right.
Things like, "Okay, this is a closing tag BUT it's at the beginning of a line *and* is immediately followed by content, so MAYBE it's an opener, let's try to turn it into an opener and see if it suddenly validates"... These things are doable, but they take time to implement.
(Well, that particular solution would still be a very good one to the first test case I posted.)
6926
Features / Re: New revs
« on December 8th, 2011, 09:47 AM »
rev 1188
(6 files, 5kb)
* Well, if it isn't magical... Rewrote the mismatch code again, and moved it to the wedit object. This time, it's shorter, it's easier to grasp and it actually yells at you if you do [tag1][tag2][/tag1][/tag2]. Give me another day and I'll have a full HTML5 parser for ya... Will that be all m'am? Maybe a pound of apples? (Post.php, Post2.php, Class-Editor.php, Errors.language.php, sections.css)
* Some regex simplifications. You've got to face it, a regex IS complex, whatever way you look at it. Making it longer won't make it any simpler. (Class-Editor.php)
(6 files, 5kb)
* Well, if it isn't magical... Rewrote the mismatch code again, and moved it to the wedit object. This time, it's shorter, it's easier to grasp and it actually yells at you if you do [tag1][tag2][/tag1][/tag2]. Give me another day and I'll have a full HTML5 parser for ya... Will that be all m'am? Maybe a pound of apples? (Post.php, Post2.php, Class-Editor.php, Errors.language.php, sections.css)
* Some regex simplifications. You've got to face it, a regex IS complex, whatever way you look at it. Making it longer won't make it any simpler. (Class-Editor.php)
6927
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 09:29 AM »
Thanks, I didn't notice that one bit ;)
Okay, I'm in the process of moving the code to the wedit object (where it should have been from the beginning), and trying to fix the code automatically... So, considering the fact that the most important content (topic posts) is okay because we clearly specify the errors, fixing posts automatically shouldn't be too much of a hassle but obviously the code to actually fix them is going to be more complex...
So, let's say I have this:
Code: [Select]
I think it's safe to say that the poster added a / by mistake, and it should be removed, but I hardly see how Wedge is going to be able to spot it automatically without going into a great deal of large-scale testing. So I suppose we could do it this way...
Code: [Select]
i.e, the extra /quote knows there is no previous matching quote tag, so it simply looks for the *last* tag it found (or MAYBE the last closer tag of its kind, i.e quote?), and it adds an opener tag right after it. Then we mark the tag as fixed (i.e. we remove 'quote' from the stack of opened tags). So, we are in a situation where 'comment' is suddenly stuck into a quote tag. Continue as usual. Then we spot the other closer quote, and we do the same, i.e. add a closer tag after the last closer, so in this case the 'post2' bit is fixed.
The obvious problem with this solution is that our comment is now in a quote. But because we'll have three separate quotes in a row, it'll be *relatively* obvious (not captain-obvious obvious, but still), that the middle one is a reply to the previous quote. What do you think..?
I was thinking of other solutions, like checking whether a tag is something like '[/quote author=Nao]', which in this case would mean "it's an opening quote where the / was added by mistake" but I don't really think it's a realistic case.
Now for another test case...
Code: [Select]
Using the pseudo-code from above, this would be really messy -- an opening quote would be added just before the closer, and then the s and b tags would remain opened until the end of the post, where they would then be closed forcibly.
So in this case, I think it's best that when we look through the latest opened tag in the stack, and it's not our closer, we simply add the related closer automatically and then keep going through the stack in reverse order, closing tags as required, until we find ours (or not). This is actually pretty much what my code is doing right now, as opposed to the pseudo-code in the first example.
Now, if we mix our two examples...
Code: [Select]
The closer quote will trigger a search for the last closed tag, which in this case is another closer quote, BUT between them it will find two unclosed tags... Which gets confusing, so it's best to first close any opened tags, and THEN go through the search for a place to add a new quote opener.
It's all very 'amusing' because I have to maintain a parallel stack of tag positions and the code has currently jumped from 15 lines to 50, which caused me to write this post in the hope that it'll allow me to sort things out... :lol:
Anyway, opinions welcome...
Hey, perhaps someone has heard of some BSD/MIT code available online that precisely does just that -- a flawless fix of all BBC or HTML tags left opened or closed... :P
Okay, I'm in the process of moving the code to the wedit object (where it should have been from the beginning), and trying to fix the code automatically... So, considering the fact that the most important content (topic posts) is okay because we clearly specify the errors, fixing posts automatically shouldn't be too much of a hassle but obviously the code to actually fix them is going to be more complex...
So, let's say I have this:
[quote]post1[/quote]comment[/quote]post2[/quote]I think it's safe to say that the poster added a / by mistake, and it should be removed, but I hardly see how Wedge is going to be able to spot it automatically without going into a great deal of large-scale testing. So I suppose we could do it this way...
[quote]post1[/quote][quote]comment[/quote][quote]post2[/quote]i.e, the extra /quote knows there is no previous matching quote tag, so it simply looks for the *last* tag it found (or MAYBE the last closer tag of its kind, i.e quote?), and it adds an opener tag right after it. Then we mark the tag as fixed (i.e. we remove 'quote' from the stack of opened tags). So, we are in a situation where 'comment' is suddenly stuck into a quote tag. Continue as usual. Then we spot the other closer quote, and we do the same, i.e. add a closer tag after the last closer, so in this case the 'post2' bit is fixed.
The obvious problem with this solution is that our comment is now in a quote. But because we'll have three separate quotes in a row, it'll be *relatively* obvious (not captain-obvious obvious, but still), that the middle one is a reply to the previous quote. What do you think..?
I was thinking of other solutions, like checking whether a tag is something like '[/quote author=Nao]', which in this case would mean "it's an opening quote where the / was added by mistake" but I don't really think it's a realistic case.
Now for another test case...
[quote][b] [s] post1[/quote]Using the pseudo-code from above, this would be really messy -- an opening quote would be added just before the closer, and then the s and b tags would remain opened until the end of the post, where they would then be closed forcibly.
So in this case, I think it's best that when we look through the latest opened tag in the stack, and it's not our closer, we simply add the related closer automatically and then keep going through the stack in reverse order, closing tags as required, until we find ours (or not). This is actually pretty much what my code is doing right now, as opposed to the pseudo-code in the first example.
Now, if we mix our two examples...
[quote]post1[/quote][b] [s] comment[/quote]post2[/quote]The closer quote will trigger a search for the last closed tag, which in this case is another closer quote, BUT between them it will find two unclosed tags... Which gets confusing, so it's best to first close any opened tags, and THEN go through the search for a place to add a new quote opener.
It's all very 'amusing' because I have to maintain a parallel stack of tag positions and the code has currently jumped from 15 lines to 50, which caused me to write this post in the hope that it'll allow me to sort things out... :lol:
Anyway, opinions welcome...
Hey, perhaps someone has heard of some BSD/MIT code available online that precisely does just that -- a flawless fix of all BBC or HTML tags left opened or closed... :P
6928
Off-topic / Re: Making a note for the future
« on December 8th, 2011, 12:19 AM »
Well, I didn't see a branch for 'For your fans'... :P
6929
Features / Re: Fixing mismatched BBCode
« on December 8th, 2011, 12:14 AM »
Okay, I'll look into doing that tomorrow...
Still, complex errors won't be magically fixed, I'm afraid... Unless, unless I do a stricter check.
So while typing this post, it came to me that I could use a stack of tags and stack/unstack data and... Well, have a look at this code:
Code: [Select]
And... Here's the error message I'm getting following my latest rewrite. Which is actually SHORTER than the last one ;) It's not perfect but I'm working on it eheh.
Pretty cool uh?
PS: and yes, it works with tag nesting too, since it's a stack... i.e. if I have properly nested 'b' tags inside the quote, everything's fine.
PPS: the main issue with fixing tags in the middle of a message is that I would then have to find the exact position of the tag... I guess it's doable, though, but I'll have to go through a series of strpos etc or something to fill in the list first, so it'll definitely make the code bigger.
Still, complex errors won't be magically fixed, I'm afraid... Unless, unless I do a stricter check.
So while typing this post, it came to me that I could use a stack of tags and stack/unstack data and... Well, have a look at this code:
[/quote][quote author=Nao link=msg=1 date=1309111289]Lorem ipsum?[/quote]What is that?
I don't speak rubbish!
[nb]I'm wondering if Rory does, though? He had time to learn...[ /code]
[ code][ nb][ /code][ /nb]And... Here's the error message I'm getting following my latest rewrite. Which is actually SHORTER than the last one ;) It's not perfect but I'm working on it eheh.
Pretty cool uh?
PS: and yes, it works with tag nesting too, since it's a stack... i.e. if I have properly nested 'b' tags inside the quote, everything's fine.
PPS: the main issue with fixing tags in the middle of a message is that I would then have to find the exact position of the tag... I guess it's doable, though, but I'll have to go through a series of strpos etc or something to fill in the list first, so it'll definitely make the code bigger.
6930
Features / Re: Fixing mismatched BBCode
« on December 7th, 2011, 10:38 PM »
Maybe we could have the test in an external function. If called from post2, return an error. Otherwise try to fix It by adding as many missing tags as required. Yay?