Wedge

Public area => The Pub => Topic started by: Arantor on August 20th, 2012, 05:25 AM

Title: Print Page
Post by: Arantor on August 20th, 2012, 05:25 AM
I've thought about Print Page recently and have some mixed feelings about it.

Specifically, three things occur to me.

1) It could be a plugin rather than a core feature. The savings for the bulk of forums would be quite significant since robots do often follow nofollow links, they just don't carry link juice through. So removing it entirely would actually save quite a bit of bandwidth in the long run.

2) If it's not made a plugin, it could have permissions attached, e.g. default to be not visible to guests.

3) If it does remain core and doesn't have permissions attached, it at least needs to indicate noindex in the page itself, since right now they still do get indexed.

Also, making it a plugin would mean one less stupid thing that is going to appear in 'how do I configure my htaccess for bestest SEO for my forums' threads.

Thoughts?
Title: Re: Print Page
Post by: Pandos on August 20th, 2012, 10:24 AM
For me it's OK to stay in core. But with permissions attached.

Title: Re: Print Page
Post by: Nao on August 20th, 2012, 11:04 AM
Quote from Arantor on August 20th, 2012, 05:25 AM
1) It could be a plugin rather than a core feature. The savings for the bulk of forums would be quite significant since robots do often follow nofollow links, they just don't carry link juice through. So removing it entirely would actually save quite a bit of bandwidth in the long run.
I'd say if robots are a problem, we could simply attach possibly_robot to that.
OTOH, one of the good points of printpage is that it gets you more chances to have multiple keywords stuffed in it. But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
Which I just realized is NOT in Wedge..?! I thought I'd implemented that long ago... :-/
Quote from Arantor on August 20th, 2012, 05:25 AM
2) If it's not made a plugin, it could have permissions attached, e.g. default to be not visible to guests.
But guests are also entitled to viewing printpage if they're not bots...
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 02:45 PM
Quote
But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
SMF has it, however it uses it to point to previous/next topic, not pages within a multi-page topic (and I can't remember if that's the correct use or not actually)
Quote
But guests are also entitled to viewing printpage if they're not bots...
Yes, currently. Whether that remains the case remains to be seen.
Quote
OTOH, one of the good points of printpage is that it gets you more chances to have multiple keywords stuffed in it.
Except that it isn't. It's been marked as nofollow for some time, even in SMF, which means search engines are not supposed to consider it for ranking purposes - but they will still frequently index it anyway.

Also note that there are cases where you can screw things up because of the way print page works. For example, try to get the print page of the Aeva topic on SMF, it's likely going to fail hard when it runs out of memory.

The main display staggers it - it pulls one post, processes it, displays it. Print page gets *everything* in one go and then hands it all to the template. For larger topics, doubly so on low-memory configurations, it's going to go splat.
Title: Re: Print Page
Post by: Norodo on August 20th, 2012, 02:57 PM
How about this:
http://www.alistapart.com/articles/goingtoprint/
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 03:01 PM
There's quite a bit more involved than simply using a different stylesheet.

Print page versions have the entire thread on them, not paginated. There are also several changes to the way bbcode is handled, certain tags get disabled, certain other tags have their behaviour modified to suit printing, for example the color bbc is silently disabled, links are changed so that the URL is present.

None of this stuff can be done in CSS.
Title: Re: Print Page
Post by: Norodo on August 20th, 2012, 03:02 PM
Harsh. Good luck then.

Although quite a few of those can be fixed with CSS. Color bbc for example could pretty easily be disabled with a !important somewhere in there, and URLs can be appended with pure CSS. The pagination is probably the biggest issue.
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 03:07 PM
This is partly why I want to make it a plugin because I'd actually remove all the extra crap from the parser which would allow some sanity.

Let me explain my beef with this. parse_bbc accepts 4 parameters: the message to parse, whether smileys are enabled, a cache id, and a list of specific tags to parse. (This last one is primarily used for signatures, where you only allow certain bbc in signatures.)

Seems logical, right? Yeah... I thought that, until I realised that the 'are smileys allowed' parameter isn't really a boolean. It has three valid values - true, false, and 'print'. As you can imagine that's not the way to go about things.
Title: Re: Print Page
Post by: Norodo on August 20th, 2012, 03:46 PM
Sounds like an idea.

I'm currently bedridden, but I'm probably going to look into the CSS print page way later if you do make it so it's a plugin. Can't hurt to have every way tested, eh? Seems to me to be a plugin even I could do. (Adding a css file shouldn't be too hard, or?)
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 04:16 PM
It's a bit technical given how the printpage stuff works - it's actually embedded directly into the page, not even as a CSS file! (If/when I make it a plugin, I'll rewrite it to use a CSS file anyway)

I still think you're missing the point with chasing the printable CSS version, though. While you could certainly fix a number of the aspects of layout with a printable stylesheet (most notably, hiding the sidebar, de-floating anything else, hiding most of the user side panel, the options and menus, quick reply), you still won't have the ability to have the entire thread in one place, something not permitted in the main code (it used to be an option but IIRC I removed it), plus links still won't be printed, just their link text.

I think at best you'll get something that looks respectable, but will still be a poor imitation of what is really needed.



The big question: do we need it in the core? My gut feeling is actually no.
Title: Re: Print Page
Post by: spoogs on August 20th, 2012, 05:51 PM
I say no as well, plugin with perms should be just fine. In SMF I use your print page permission mod to remove guests permission to view it, but for no special reason. I would bet that not too many are concerned whether it is core or not and probably wouldn't miss is it were removed.
Title: Re: Print Page
Post by: godboko71 on August 20th, 2012, 10:49 PM
I have never once printed a thread in 15ish years using forums. How lame am I?
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 10:55 PM
I've never used print-page to print anything, I have used it a few times to save archive copies of a thread for surreptitiously sending to people.
Title: Re: Print Page
Post by: MultiformeIngegno on August 20th, 2012, 11:29 PM
I vote for making it plugin and have a css print page in core (also if it doesn't handle all the things that print page non-css does)..
Title: Re: Print Page
Post by: Arantor on August 20th, 2012, 11:33 PM
I wouldn't bother with the CSS print page in core to be honest. It's just one more thing to bug test on.
Title: Re: Print Page
Post by: Nao on August 22nd, 2012, 01:06 PM
Quote from Arantor on August 20th, 2012, 02:45 PM
Quote
But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
SMF has it, however it uses it to point to previous/next topic, not pages within a multi-page topic (and I can't remember if that's the correct use or not actually)
Yeah, it has it and it's pointless.
So I double-checked my code, and indeed I implemented it into Wedge -- but with a caveat: robot_no_index has to be set to off. So there are many reasons why the meta links wouldn't show up: either there's only one page in the topic (duh!), or it was access through a msgXXX or #new link, things like that...
And what matters is obviously bots only, here. I don't think that accessibility gurus would even kill anyone for not providing these meta links, because seriously... Who ever uses those?! Your browser has to support them, you have to show a special toolbar that takes room in the UI, etc...
Quote
Except that it isn't. It's been marked as nofollow for some time, even in SMF, which means search engines are not supposed to consider it for ranking purposes - but they will still frequently index it anyway.
They index it, but it doesn't bring PR to the overall site, that's what you mean...? I guess the point here is simply to have people be able to reach your site through multiple keywords... (although reaching it through the printpage... Urgh!)
Quote
Also note that there are cases where you can screw things up because of the way print page works. For example, try to get the print page of the Aeva topic on SMF, it's likely going to fail hard when it runs out of memory.
Yeah, I've never really noticed that... Can you post a direct link please? I don't remember where that topic is ;)
Quote
The main display staggers it - it pulls one post, processes it, displays it. Print page gets *everything* in one go and then hands it all to the template. For larger topics, doubly so on low-memory configurations, it's going to go splat.
But that's the thing about printpage, the main point is not that it's printable, the main point is that it's savable... (saveable?)
I used to do that precisely on sm.org topics back in the day.
Title: Re: Print Page
Post by: Arantor on August 22nd, 2012, 02:44 PM
Quote
So there are many reasons why the meta links wouldn't show up: either there's only one page in the topic (duh!), or it was access through a msgXXX or #new link, things like that...
Then the check that normally sets robot_no_index should be modified. The usual rule is that if there is random stuff in $_GET (i.e. other than topic) and/or if $_REQUEST['start'] is non-numeric (which is set up with msgXXXX for linking to specific posts, new for new items and is numeric if you're using conventional pagination)
Quote
And what matters is obviously bots only, here. I don't think that accessibility gurus would even kill anyone for not providing these meta links, because seriously... Who ever uses those?! Your browser has to support them, you have to show a special toolbar that takes room in the UI, etc...
I seem to recall it's pretty much only Opera that does with gestures on the browser side. It would be good on the bot side to paginate neatly though.
Quote
They index it, but it doesn't bring PR to the overall site, that's what you mean...? I guess the point here is simply to have people be able to reach your site through multiple keywords... (although reaching it through the printpage... Urgh!)
No. What it means is that it is indexed, but no PR is brought from the main site to the printable version.

In fact, it's worse than that, because the printpage version contains the entire thread and specifies the *first page* of the thread as a canonical version. Which pretty much screws up any real benefit it had in the first place.
Quote
Yeah, I've never really noticed that... Can you post a direct link please? I don't remember where that topic is
http://www.simplemachines.org/community/index.php?topic=200401.0

Print page actually works, but I dread to think what the memory limit is set to.
Quote
But that's the thing about printpage, the main point is not that it's printable, the main point is that it's savable... (saveable?)
I used to do that precisely on sm.org topics back in the day.
No... that's just a happy side-effect. The printable version was never intended to be used for that, it was intended to gather everything on one page for printable purposes, and also fix some other issues that float etc. would generate.

I'm still not entirely convinced this needs to stay in the core - other than archiving threads to send to people, I've never used the damn thing.
Title: Re: Print Page
Post by: Nao on August 22nd, 2012, 03:06 PM
Quote from Arantor on August 22nd, 2012, 02:44 PM
Then the check that normally sets robot_no_index should be modified.
Why?
Quote
The usual rule is that if there is random stuff in $_GET (i.e. other than topic) and/or if $_REQUEST['start'] is non-numeric (which is set up with msgXXXX for linking to specific posts, new for new items and is numeric if you're using conventional pagination)
Yes, but you didn't finish your sentence... ;)
robot_no_indexed makes sense in Wedge (overall), and in this situation it does, too. It's about saving users from having to download two extra lines of HTML they don't care about, and will only be useful to search bots...

I forgot to specify that I changed prev/next meta to link to prev/next pages back when I was reading the Google blog, and they published an article about how they were changing their logic to use prev/next links for topic pages so that they can be seen as a single page in the engine. Well, so far I ain't seen that happen... (Not that I'm testing a lot, though.)
Just like when they announced they'd use microformats or microdata or whatever, the schema.org thing, to show clean breadcrumbs in their result pages... Well, the result is that many bare SMF sites have their breadcrumb at google, and Wedge hasn't -- even though SMF doesn't use schema.org breadcrumbs and Wedge does. Thank you very much for the time loss, Google...
Quote
I seem to recall it's pretty much only Opera that does with gestures on the browser side.
Possibly. It has a toolbar for these buttons, too, but it's disabled by default, thankfully. I think that Firefox can handle these too, and perhaps Safari as well... (Maybe with plugins or somethin'?)
Quote
No. What it means is that it is indexed, but no PR is brought from the main site to the printable version.
Oh, yeah, right... So it's hidden in page 15 right? But if you use rare keywords, it'll still show it on page 1, which is better than no results at all (because of printpage not being available.)
Then again, if (and only if) Google's handling of prev/next works as expected on Wedge, then I suppose we can expect it not to need a printpage for that...
Quote
Print page actually works, but I dread to think what the memory limit is set to.
Here's wondering... What if server-side gzipping is disabled on printpage? It would sure increase the bandwidth needs (1MB gzipped, 6MB unzipped in Aeva's case), but would probably make it easier to send the page in chunks..? Heck, maybe it's already done that way... Because the topic just didn't show up in one go on my browser, it loaded progressively...
I'd say, if mod_gzip and PHP are smart enough to catch the output buffer and gzip parts of it (I believe gzip is suited for chunk transmissions?), then it's not worth worrying too much about memory...
Then again, I'm not exactly a server/Apache/PHP internals specialist, and I probably said something silly.

Oh, and of course, another good way to limit the filesize (and thus bandwidth requirements) is to just strip any whitespace around posts, and/or start optimizing the actual HTML like crazy... For instance, here we have a class for author and for body, with two different tags. It may be smarter to just use a class on top of both of them (in the DOM), and just use class-free tags after it...
Quote
I'm still not entirely convinced this needs to stay in the core - other than archiving threads to send to people, I've never used the damn thing.
Same here...
Possibly, what we could do is, instead of directly showing the printpage version, we could hmm... Show a choice to the user: either print the current page, or print the entire topic, or show an archive of the topic for safe-keeping. Then we could handle all of them differently...
Title: Re: Print Page
Post by: godboko71 on August 22nd, 2012, 03:51 PM
Seems like maybe print could really be two plugins a "Save Topic" plugin for those who want to save a topic to send to someone, and a print plug for those who want to offer the option to print. Though really if the save version is printable then really only need a save topic plugin. Maybe even have different file output types depending what extensions the server has. Basic HTML, PDF for those that can ect ect.
Title: Re: Print Page
Post by: Arantor on August 22nd, 2012, 04:16 PM
Quote
Why?
If the prev/forward navigation relies on robot_no_index, something's wrong because it shouldn't really be.
Quote
Yes, but you didn't finish your sentence...
Yes, I did, I just didn't qualify the subject of the sentence, seeing as how it was implied by the previous one. The rules around robot_no_index being added are those. (I had to modify it myself lately), and robot_no_index perhaps needs to be reconsidered.
Quote
Oh, yeah, right... So it's hidden in page 15 right? But if you use rare keywords, it'll still show it on page 1, which is better than no results at all (because of printpage not being available.)
Then again, if (and only if) Google's handling of prev/next works as expected on Wedge, then I suppose we can expect it not to need a printpage for that...
No... Google will still index all 15 pages normally. The actual problem is that printpage specifically fucks around with page canonicalisation by having content that isn't at the canonical URL. Even though it's nofollow'd, Google still follows it!
Quote
Heck, maybe it's already done that way... Because the topic just didn't show up in one go on my browser, it loaded progressively...
That's the point, it is NOT handled progressively. It is queried, pushed entirely into $context and then output. When I first went to the URL, it was actually blank.
Quote
I'd say, if mod_gzip and PHP are smart enough to catch the output buffer and gzip parts of it (I believe gzip is suited for chunk transmissions?), then it's not worth worrying too much about memory...
Except that you have to worry about memory when you do it like that.

Display does that somewhat bizarre process of having a callback per message specifically so that you can have truly massive messages or vast threads without any problems with memory_limit. Printpage does not do that, it just queries, pushes everything into $context before going to the template. On low memory configurations it's quite possible to overflow that on long threads.
Quote
Then again, I'm not exactly a server/Apache/PHP internals specialist, and I probably said something silly.
It's not silly at all. Let me explain how gzip works in relation to Apache/PHP, depending on what is responsible.

If PHP is set up to do it (and Wedge has that configuration option), it's done in PHP, and the total page output must fit in memory. This is why the DB backup feature is often more reliable if you don't gzip it, because you have more memory capacity to cope with the data, because you don't have to hold it all at once to gzip it.

If Apache is set up to do it, and not PHP, PHP just has to output its content back to Apache, and PHP just has to make sure that it doesn't run out of memory in whatever it's doing.
Quote
Same here...
Possibly, what we could do is, instead of directly showing the printpage version, we could hmm... Show a choice to the user: either print the current page, or print the entire topic, or show an archive of the topic for safe-keeping. Then we could handle all of them differently...
I'm not disputing the validity of such things. My point is that I don't believe it should be in the core by default. If admins want the ability to archive parts of the forum, that should be up to them.

The fact we get SEO benefits, plus streamlining parse_bbc a little, these are just nice side benefits.
Quote
Seems like maybe print could really be two plugins a "Save Topic" plugin for those who want to save a topic to send to someone, and a print plug for those who want to offer the option to print. Though really if the save version is printable then really only need a save topic plugin. Maybe even have different file output types depending what extensions the server has. Basic HTML, PDF for those that can ect ect.
Interesting approach. I had actually thought about doing so. I'm just not convinced that people actually use print-page for printing, and that as a result it isn't needed in the core by default.
Title: Re: Print Page
Post by: spoogs on August 22nd, 2012, 04:24 PM
I used it once, does that count :P

Title: Re: Print Page
Post by: Nao on August 22nd, 2012, 04:30 PM
Quote from Arantor on August 22nd, 2012, 04:16 PM
If the prev/forward navigation relies on robot_no_index, something's wrong because it shouldn't really be.
It's a choice of mine... Really it's all about saving bandwidth. Guests will mostly get to see the prevnext version anyway (we could even make it even simpler by providing them with canonical page links in 'recent posts' and SSI functions, rather than a msgXXX link).
I could also restrict these meta links to guests...
Quote
No... Google will still index all 15 pages normally. The actual problem is that printpage specifically fucks around with page canonicalisation by having content that isn't at the canonical URL. Even though it's nofollow'd, Google still follows it!
I've never seen a wedge.org print page being indexed, though. Heck, I don't remember a single SMF print page being indexed, at all... It's always the wireless content crap that gets the treatment. (And that's no longer an issue in Wedge, eh eh.)
Quote
That's the point, it is NOT handled progressively. It is queried, pushed entirely into $context and then output. When I first went to the URL, it was actually blank.
Hmm...
Well, so it should be done like in Display.php right..? Callback and everything...
Quote
Display does that somewhat bizarre process of having a callback per message specifically so that you can have truly massive messages or vast threads without any problems with memory_limit. Printpage does not do that, it just queries, pushes everything into $context before going to the template. On low memory configurations it's quite possible to overflow that on long threads.
And would that be fixed with a callback?
Quote
If Apache is set up to do it, and not PHP, PHP just has to output its content back to Apache, and PHP just has to make sure that it doesn't run out of memory in whatever it's doing.
Well... That's interesting.
I'm not sure I remember -- does PHP still gzip the page if enabled in Wedge, even if Apache can handle it? If yes, then maybe we should first add a test to see if Apache handles gzipping of HTML pages, and then disable PHP gzipping internally..?
Quote
I'm not disputing the validity of such things. My point is that I don't believe it should be in the core by default. If admins want the ability to archive parts of the forum, that should be up to them.
It can still be core but made to be enabled or disabled...
Quote
The fact we get SEO benefits, plus streamlining parse_bbc a little, these are just nice side benefits.
I don't think it would have that much of an influence over parse_bbc... ;) Plus, I think Aeva Media has some tricks in it, too. (That, and my Subs-BBC.php file has so much custom data in it, I'd rather not see any changes until I'm done with my own ahah...)
Quote
Interesting approach.
Close enough to mine :P
Quote
I had actually thought about doing so. I'm just not convinced that people actually use print-page for printing, and that as a result it isn't needed in the core by default.
Agreed, for the default aspect. Not sure about not-core though.

PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
Title: Re: Print Page
Post by: Arantor on September 4th, 2012, 07:12 PM
Bah, how did I miss this one?
Quote
I've never seen a wedge.org print page being indexed, though. Heck, I don't remember a single SMF print page being indexed, at all... It's always the wireless content crap that gets the treatment. (And that's no longer an issue in Wedge, eh eh.)
2.0 made it nofollow by default, 1.1 did not. And it does still get indexed, because even though it's marked nofollow, Google et al *still follows* them for links, even if they don't make it into the index itself.
Quote
Hmm...
Well, so it should be done like in Display.php right..? Callback and everything...
Quote
And would that be fixed with a callback?
Yes, it should be done like in Display.php and yes, making it a callback would fix the issues - though it would also require more than a few minutes work in rewriting it.
Quote
Well... That's interesting.
I'm not sure I remember -- does PHP still gzip the page if enabled in Wedge, even if Apache can handle it? If yes, then maybe we should first add a test to see if Apache handles gzipping of HTML pages, and then disable PHP gzipping internally..?
That's why there's the question on install ;)
Quote
It can still be core but made to be enabled or disabled...
Having given it a couple of weeks' thought, I still think it shouldn't be core.
Quote
PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
And yet, I've seen such warnings...?
Title: Re: Print Page
Post by: Nao on September 18th, 2012, 04:11 PM
Quote from Arantor on September 4th, 2012, 07:12 PM
Bah, how did I miss this one?
Story of my life!
Quote
Yes, it should be done like in Display.php and yes, making it a callback would fix the issues - though it would also require more than a few minutes work in rewriting it.
Hmm... Why more than a few minutes? I'm sure a good old case of copy & paste would help a lot... ;)
Quote
That's why there's the question on install ;)
That question is for PHP only, innit...? I don't remember.
Quote
Having given it a couple of weeks' thought, I still think it shouldn't be core.
I suggest that we realistically postpone these discussions to v2.0, if ever... :P
Quote
Quote
PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
And yet, I've seen such warnings...?
Magical.
Sometimes it works for me -- sometimes it doesn't. It sounds like it's some super-unstable function when it really isn't... :-/
Title: Re: Print Page
Post by: Arantor on September 18th, 2012, 04:39 PM
Quote
Hmm... Why more than a few minutes? I'm sure a good old case of copy & paste would help a lot...
Copy and paste and shed-loads of editing. Display's callback is way more complicated than it needs to be, and print-page's set up is just not designed that way.
Quote
That question is for PHP only, innit...? I don't remember.
That's the beauty of it, either you do it at the PHP level or you do it at the Apache level, and if you do it at PHP level, it will break it at Apache level, hence the test.
Quote
I suggest that we realistically postpone these discussions to v2.0, if ever...
I can do this one in an afternoon ;)
Title: Re: Print Page
Post by: Arantor on September 29th, 2012, 08:35 PM
Been thinking about this again. One of the features I want to offer in relation to the warning system would benefit if print page were removed, though it's entirely possible to do without touching print page at all. I just think it'd be cleaner to remove print page, I'm also thinking about cleaning up parse_bbc to have two paramaters - the message and an array of options to be passed inwards.
Title: Re: Print Page
Post by: Arantor on May 12th, 2013, 04:28 PM
So, the new warning system is in place, my original concern was over the disemvowel feature.

I still want to do the refactoring, and I still think it would be an improvement to pull print page into a plugin. (The refactoring gets much simpler if I don't have to worry about print page, but I can still do it to include that for now.)

Thoughts?
Title: Re: Print Page
Post by: Arantor on May 12th, 2013, 05:06 PM
Just to add, I've already started on the refactoring, the signature for parse_bbc is now two parameters, the message and an array of options.

The array has the current following keys:
Quote
* - smileys (bool) Whether smileys should be parsed or not, regardless of any other bbcode content.
 * - cache (string) If potentially cacheable, this should be the cache's id. If not defined, no caching will occur. This should be a quasi-unique key for the item being parsed, so that if it took over 0.05 seconds, it can be cached. (The final key used for the cache takes the supplied key and includes details such as the user's locale and time offsets, an MD5 digest of the message and other details that potentially affect the way parsing occurs)
 * - print (bool) Whether in the printable mode or not, which disables various tags as well as hiding smileys.
 * - parse_tags (array) A list of tags to be parsed on this run, undefined or empty array to do all those currently enabled. (This overrides any user settings for what is and is not allowed. Additionally, runs with this set are never cached, regardless of cache id being set)
 * - owner (int) If defined, the user id of the author of this content. Used for identifying whether parsing should include user sanctions like disemvowelling.
 * - type (string, required) Indicates what type of content this is. Known values: post, signature
I should add, type is growing as I delineate each of the places parse_bbc is called, the reason for doing so is that it actually adds options to do so. Most of the time it won't make any difference but it does mean we can do things like explicitly know that a piece of bbc is a post or a signature without having to sniff the cache id. It also separates 'print page' from the smileys parameter.
Title: Re: Print Page
Post by: Arantor on May 13th, 2013, 12:16 AM
Just to add, here's the complete list of types of bbc that could be applied and this information is now available to the bbc parser in case a hook or similar wants to modify it.
Quote
*  -- agreement
 *  -- custom-field
 *  -- cut (used with westr::cut)
 *  -- empty-test (for when checking a post is really empty)
 *  -- infraction-notice
 *  -- media-album-description
 *  -- media-comment
 *  -- media-custom-field
 *  -- media-custom-field-description
 *  -- media-description
 *  -- media-embed
 *  -- media-playlist-description-preview
 *  -- media-playlist-description
 *  -- media-welcome
 *  -- mod-comment (comments to reported posts)
 *  -- mod-note (notes in the moderation center)
 *  -- news
 *  -- plugin-readme
 *  -- pm
 *  -- pm-draft
 *  -- pm-notify
 *  -- poll-option
 *  -- poll-question
 *  -- post
 *  -- post-convert (only for WYSIWYG BBC/HTML conversion)
 *  -- post-draft
 *  -- post-feed
 *  -- post-preview (for thread shortened versions of posts)
 *  -- post-notify
 *  -- preview (for previewing posts in editing)
 *  -- preview-pm (for previewing PMs before sending)
 *  -- q-and-a
 *  -- report-media
 *  -- report-post
 *  -- signature
 *  -- thought
/medoesn't know why he used - instead of _ but it just seemed more readable somehow (and I suppose a subtle indication that it isn't code directly but an identifier)
Title: Re: Print Page
Post by: Nao on May 13th, 2013, 10:11 AM
You're doing a good job. :)

- I'm all for parse_bbc($message, $array). I'm pretty sure we discussed this possibility already, sometime around last year, and that we were both interested in doing it. I'm appalled at how many interesting new changes we discussed, and then failed to implement later.

- I'm okay with Print being a plugin; don't remember if I was before, but it just doesn't feel that important, after all... What could be improved, really, is giving the admin to ability to only print the current page's worth of posts, and/or only print the entire topic if it has less than X pages. With maybe a (single) page index inside the print page itself, to allow easy printing of multiple pages.

Anything else you needed to know..?
Posted: May 13th, 2013, 10:02 AM

Oh, I see you've already committed... ;)

Looking at it, I have a few things to say...
- Adding so many types, it was overwhelming at first, but suddenly it hit me that you can disable some tag types on some parse types, or hook a plugin into any type of parsing and just that one... The possibilities seem limitless.
- I have to look more into it, but you're using 'parse_type' everywhere in the source code, when the parse_bbc function itself seems to accept only a 'type' parameter, which is shorter and, frankly, easier to manipulate. Is this one of my patented LastMinuteChanges™ you forgot to propagate everywhere...?
- Similarly, parse_tags could be (should be..?) shortened to just 'tags'.

Will add anything later as needed.
Title: Re: Print Page
Post by: Nao on May 13th, 2013, 11:10 AM
Yes, parse_type is definitely buggy; it should be 'type', as per your parse_bbc code.

Tell you what... I'm seeing that most of the parse_bbc calls (about 90+ out of ~125) include a parse_type, which I think is good.
I'm thinking that, given how we'd both obviously like it to become a second nature when calling parse_bbc, we should have it outside of the array...

parse_bbc($message, $type, array(...)) or parse_bbc($message, null, array(...)) if no type is defined, which would be illogical...

There are about 30 calls with an owner ID in them, so it's about a quarter of all calls, which isn't enough to justify adding a specific parameter for user ID. I would, however, suggest looking into renaming it to 'user' or just 'u' (as in profiles), probably 'user' is better for readability; 'owner' is fine, too, but it's one extra character, and apart from thoughts and profiles, it's not a terminology that's used a lot, at least less than 'member' or 'user'.

Thoughts..? :^^;:

(I'm volunteering to change parse_type to the parameter style if you don't mind, and if you accept the change, of course.)
Title: Re: Print Page
Post by: Arantor on May 13th, 2013, 03:25 PM
Yeah, let me explain what happened. Initially, it was $type, then I remembered that $type was a variable used in the bowels of the system so I made the variable $parse_type but forgot to update the function reading the parameter.

If you want to move it, that's cool, I just didn't think of that. The alternative, of course, is to make post the default rather than 'unknown' and strip the parameter from any 'post' call since posts are the predominant use.

As far as owner ID goes, there's a certain logic to that. Partly because most places never had it in the first place, but secondly it's down to context. Right now owner ID is only relevant for scramble/disemvowel. But that essentially relies on $user_profile being populated with more than 'minimal' type, and without that it doesn't do anything. And of course, we need to consider whether it's appropriate - for example, I'm not sure that it would be appropriate to disemvowel or scramble posts in the Atom feed or the signature for that matter. I don't know.

I'm also cool with using 'user' rather than 'owner'.
Title: Re: Print Page
Post by: Arantor on May 13th, 2013, 03:59 PM
Huh, forgot about the previous post as it was on the previous page.
Quote
- I'm okay with Print being a plugin; don't remember if I was before, but it just doesn't feel that important, after all... What could be improved, really, is giving the admin to ability to only print the current page's worth of posts, and/or only print the entire topic if it has less than X pages. With maybe a (single) page index inside the print page itself, to allow easy printing of multiple pages.
It wasn't a plugin before, but it strikes me how little it's really used so it might as well be a plugin. But being a plugin I can do all kinds of crazy with it. I've been thinking, for example, about making export-topic-to-PDF be an option though pagination gets tricky with that.
Quote
but suddenly it hit me that you can disable some tag types on some parse types, or hook a plugin into any type of parsing and just that one... The possibilities seem limitless.
That's the reason for being verbose in specifying all the different types.
Title: Re: Print Page
Post by: Wanchope on May 13th, 2013, 05:13 PM
This is not to speak for others but Print Page code is the first thing I removed from display.php in SMF, it is not doing anything other than creating another link for the thread which is against Google SEO procedure - creating multiple links for a single post.
Title: Re: Print Page
Post by: Nao on May 13th, 2013, 06:16 PM
Should I have both of these work..?

parse_bbc($message, $type, $array)
parse_bbc($message, $array)

with an is_array() test, it's easy enough. But it might be seen as disruptive. I'm so used to the jQuery style, I don't mind. But do you..?
Posted: May 13th, 2013, 06:15 PM
Quote from Wanchope on May 13th, 2013, 05:13 PM
This is not to speak for others but Print Page code is the first thing I removed from display.php in SMF, it is not doing anything other than creating another link for the thread which is against Google SEO procedure - creating multiple links for a single post.
Wedge (and possibly SMF) has two safeguards against Google...
- rel="nofollow" in the print links, meaning Google shouldn't care about that link,
- and a canonical URL set in the print page, that tells Google it should index the non-print page instead.

What more do you want.. ;)
Title: Re: Print Page
Post by: Arantor on May 13th, 2013, 06:26 PM
I don't really mind though my gut sort of says not to since PHP doesn't really behave that way (it's sort of an inheritance from Java's ideals concerning isomorphism). Just remember to update parse_bbc_inline with similar semantics.

As far as Google safeguards go, I'd argue it shouldn't even be indexed in the first place because it's not strictly duplicate content.
Title: Re: Print Page
Post by: Nao on May 13th, 2013, 06:47 PM
Well, my gut says that most parse_bbc calls from plugins will relate to a post, so it's best to make their life easier. I'm not documenting the tweak in the function header, only in the place where I'm doing the permutation, so if they don't know about it, it's not a problem. ;)

Done with the conversions, 40 files total. All 'parse_type' are now a parameter, all 'owner' are 'user', and all 'parse_tags' (only a couple..) are 'tags'.
Will commit later tonight; I'll be offline for the evening, unfortunately, and already late... :-/
Plus, I don't want to rush, I'll check them manually again.
Title: Re: Print Page
Post by: Wanchope on May 13th, 2013, 06:49 PM
Google index it anyway, sometimes you will see it 'index on error' according to Google. Forum Print is of no use (at least to me). Any user than needs to print something should copy to text editor before printing.
Title: Re: Print Page
Post by: Nao on May 13th, 2013, 06:49 PM
Quote from Arantor on May 13th, 2013, 03:25 PM
If you want to move it, that's cool, I just didn't think of that. The alternative, of course, is to make post the default rather than 'unknown' and strip the parameter from any 'post' call since posts are the predominant use.
BTW, the default has always been 'post'... Even in your version ;)
Title: Re: Print Page
Post by: Arantor on May 13th, 2013, 06:56 PM
That's another change I made and forgot about then because when I first wrote it, it definitely was 'unknown'.

Wanchope is right: even if it is marked nofollow, Google will still view it, it just doesn't consider it for link juice following purposes, which isn't the same as noindexing.
Title: Re: Print Page
Post by: Nao on May 14th, 2013, 03:09 PM
Quote from Arantor on May 13th, 2013, 06:56 PM
That's another change I made and forgot about then because when I first wrote it, it definitely was 'unknown'.
There are places where it would have failed if you'd done it that way-- including, ahem, in the most important parse_bbc call of them all, namely the one in prepareDisplayContext... ;) It didn't have a parse_type at all, so it would have been 'unknown'.

Listen, I still haven't committed my update, so if you want me to specify 'post' as the type, feel free to ask. It's much, much easier to remove these 'post' entries later, than re-adding them, since we can just do a regex search on parse_bbc.*'post', basically...
Quote
Wanchope is right: even if it is marked nofollow, Google will still view it, it just doesn't consider it for link juice following purposes, which isn't the same as noindexing.
Yes, but even then I forgot to mention, the print page ALSO has noindex... :lol: