Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Print Page
« Reply #15, on August 22nd, 2012, 01:06 PM »
Quote from Arantor on August 20th, 2012, 02:45 PM
Quote
But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
SMF has it, however it uses it to point to previous/next topic, not pages within a multi-page topic (and I can't remember if that's the correct use or not actually)
Yeah, it has it and it's pointless.
So I double-checked my code, and indeed I implemented it into Wedge -- but with a caveat: robot_no_index has to be set to off. So there are many reasons why the meta links wouldn't show up: either there's only one page in the topic (duh!), or it was access through a msgXXX or #new link, things like that...
And what matters is obviously bots only, here. I don't think that accessibility gurus would even kill anyone for not providing these meta links, because seriously... Who ever uses those?! Your browser has to support them, you have to show a special toolbar that takes room in the UI, etc...
Quote
Except that it isn't. It's been marked as nofollow for some time, even in SMF, which means search engines are not supposed to consider it for ranking purposes - but they will still frequently index it anyway.
They index it, but it doesn't bring PR to the overall site, that's what you mean...? I guess the point here is simply to have people be able to reach your site through multiple keywords... (although reaching it through the printpage... Urgh!)
Quote
Also note that there are cases where you can screw things up because of the way print page works. For example, try to get the print page of the Aeva topic on SMF, it's likely going to fail hard when it runs out of memory.
Yeah, I've never really noticed that... Can you post a direct link please? I don't remember where that topic is ;)
Quote
The main display staggers it - it pulls one post, processes it, displays it. Print page gets *everything* in one go and then hands it all to the template. For larger topics, doubly so on low-memory configurations, it's going to go splat.
But that's the thing about printpage, the main point is not that it's printable, the main point is that it's savable... (saveable?)
I used to do that precisely on sm.org topics back in the day.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Print Page
« Reply #16, on August 22nd, 2012, 02:44 PM »
Quote
So there are many reasons why the meta links wouldn't show up: either there's only one page in the topic (duh!), or it was access through a msgXXX or #new link, things like that...
Then the check that normally sets robot_no_index should be modified. The usual rule is that if there is random stuff in $_GET (i.e. other than topic) and/or if $_REQUEST['start'] is non-numeric (which is set up with msgXXXX for linking to specific posts, new for new items and is numeric if you're using conventional pagination)
Quote
And what matters is obviously bots only, here. I don't think that accessibility gurus would even kill anyone for not providing these meta links, because seriously... Who ever uses those?! Your browser has to support them, you have to show a special toolbar that takes room in the UI, etc...
I seem to recall it's pretty much only Opera that does with gestures on the browser side. It would be good on the bot side to paginate neatly though.
Quote
They index it, but it doesn't bring PR to the overall site, that's what you mean...? I guess the point here is simply to have people be able to reach your site through multiple keywords... (although reaching it through the printpage... Urgh!)
No. What it means is that it is indexed, but no PR is brought from the main site to the printable version.

In fact, it's worse than that, because the printpage version contains the entire thread and specifies the *first page* of the thread as a canonical version. Which pretty much screws up any real benefit it had in the first place.
Quote
Yeah, I've never really noticed that... Can you post a direct link please? I don't remember where that topic is
http://www.simplemachines.org/community/index.php?topic=200401.0

Print page actually works, but I dread to think what the memory limit is set to.
Quote
But that's the thing about printpage, the main point is not that it's printable, the main point is that it's savable... (saveable?)
I used to do that precisely on sm.org topics back in the day.
No... that's just a happy side-effect. The printable version was never intended to be used for that, it was intended to gather everything on one page for printable purposes, and also fix some other issues that float etc. would generate.

I'm still not entirely convinced this needs to stay in the core - other than archiving threads to send to people, I've never used the damn thing.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Print Page
« Reply #17, on August 22nd, 2012, 03:06 PM »
Quote from Arantor on August 22nd, 2012, 02:44 PM
Then the check that normally sets robot_no_index should be modified.
Why?
Quote
The usual rule is that if there is random stuff in $_GET (i.e. other than topic) and/or if $_REQUEST['start'] is non-numeric (which is set up with msgXXXX for linking to specific posts, new for new items and is numeric if you're using conventional pagination)
Yes, but you didn't finish your sentence... ;)
robot_no_indexed makes sense in Wedge (overall), and in this situation it does, too. It's about saving users from having to download two extra lines of HTML they don't care about, and will only be useful to search bots...

I forgot to specify that I changed prev/next meta to link to prev/next pages back when I was reading the Google blog, and they published an article about how they were changing their logic to use prev/next links for topic pages so that they can be seen as a single page in the engine. Well, so far I ain't seen that happen... (Not that I'm testing a lot, though.)
Just like when they announced they'd use microformats or microdata or whatever, the schema.org thing, to show clean breadcrumbs in their result pages... Well, the result is that many bare SMF sites have their breadcrumb at google, and Wedge hasn't -- even though SMF doesn't use schema.org breadcrumbs and Wedge does. Thank you very much for the time loss, Google...
Quote
I seem to recall it's pretty much only Opera that does with gestures on the browser side.
Possibly. It has a toolbar for these buttons, too, but it's disabled by default, thankfully. I think that Firefox can handle these too, and perhaps Safari as well... (Maybe with plugins or somethin'?)
Quote
No. What it means is that it is indexed, but no PR is brought from the main site to the printable version.
Oh, yeah, right... So it's hidden in page 15 right? But if you use rare keywords, it'll still show it on page 1, which is better than no results at all (because of printpage not being available.)
Then again, if (and only if) Google's handling of prev/next works as expected on Wedge, then I suppose we can expect it not to need a printpage for that...
Quote
Print page actually works, but I dread to think what the memory limit is set to.
Here's wondering... What if server-side gzipping is disabled on printpage? It would sure increase the bandwidth needs (1MB gzipped, 6MB unzipped in Aeva's case), but would probably make it easier to send the page in chunks..? Heck, maybe it's already done that way... Because the topic just didn't show up in one go on my browser, it loaded progressively...
I'd say, if mod_gzip and PHP are smart enough to catch the output buffer and gzip parts of it (I believe gzip is suited for chunk transmissions?), then it's not worth worrying too much about memory...
Then again, I'm not exactly a server/Apache/PHP internals specialist, and I probably said something silly.

Oh, and of course, another good way to limit the filesize (and thus bandwidth requirements) is to just strip any whitespace around posts, and/or start optimizing the actual HTML like crazy... For instance, here we have a class for author and for body, with two different tags. It may be smarter to just use a class on top of both of them (in the DOM), and just use class-free tags after it...
Quote
I'm still not entirely convinced this needs to stay in the core - other than archiving threads to send to people, I've never used the damn thing.
Same here...
Possibly, what we could do is, instead of directly showing the printpage version, we could hmm... Show a choice to the user: either print the current page, or print the entire topic, or show an archive of the topic for safe-keeping. Then we could handle all of them differently...

godboko71

  • Fence accomplished!
  • Hello
  • Posts: 361
Re: Print Page
« Reply #18, on August 22nd, 2012, 03:51 PM »
Seems like maybe print could really be two plugins a "Save Topic" plugin for those who want to save a topic to send to someone, and a print plug for those who want to offer the option to print. Though really if the save version is printable then really only need a save topic plugin. Maybe even have different file output types depending what extensions the server has. Basic HTML, PDF for those that can ect ect.
Thank you,
Boko

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Print Page
« Reply #19, on August 22nd, 2012, 04:16 PM »
Quote
Why?
If the prev/forward navigation relies on robot_no_index, something's wrong because it shouldn't really be.
Quote
Yes, but you didn't finish your sentence...
Yes, I did, I just didn't qualify the subject of the sentence, seeing as how it was implied by the previous one. The rules around robot_no_index being added are those. (I had to modify it myself lately), and robot_no_index perhaps needs to be reconsidered.
Quote
Oh, yeah, right... So it's hidden in page 15 right? But if you use rare keywords, it'll still show it on page 1, which is better than no results at all (because of printpage not being available.)
Then again, if (and only if) Google's handling of prev/next works as expected on Wedge, then I suppose we can expect it not to need a printpage for that...
No... Google will still index all 15 pages normally. The actual problem is that printpage specifically fucks around with page canonicalisation by having content that isn't at the canonical URL. Even though it's nofollow'd, Google still follows it!
Quote
Heck, maybe it's already done that way... Because the topic just didn't show up in one go on my browser, it loaded progressively...
That's the point, it is NOT handled progressively. It is queried, pushed entirely into $context and then output. When I first went to the URL, it was actually blank.
Quote
I'd say, if mod_gzip and PHP are smart enough to catch the output buffer and gzip parts of it (I believe gzip is suited for chunk transmissions?), then it's not worth worrying too much about memory...
Except that you have to worry about memory when you do it like that.

Display does that somewhat bizarre process of having a callback per message specifically so that you can have truly massive messages or vast threads without any problems with memory_limit. Printpage does not do that, it just queries, pushes everything into $context before going to the template. On low memory configurations it's quite possible to overflow that on long threads.
Quote
Then again, I'm not exactly a server/Apache/PHP internals specialist, and I probably said something silly.
It's not silly at all. Let me explain how gzip works in relation to Apache/PHP, depending on what is responsible.

If PHP is set up to do it (and Wedge has that configuration option), it's done in PHP, and the total page output must fit in memory. This is why the DB backup feature is often more reliable if you don't gzip it, because you have more memory capacity to cope with the data, because you don't have to hold it all at once to gzip it.

If Apache is set up to do it, and not PHP, PHP just has to output its content back to Apache, and PHP just has to make sure that it doesn't run out of memory in whatever it's doing.
Quote
Same here...
Possibly, what we could do is, instead of directly showing the printpage version, we could hmm... Show a choice to the user: either print the current page, or print the entire topic, or show an archive of the topic for safe-keeping. Then we could handle all of them differently...
I'm not disputing the validity of such things. My point is that I don't believe it should be in the core by default. If admins want the ability to archive parts of the forum, that should be up to them.

The fact we get SEO benefits, plus streamlining parse_bbc a little, these are just nice side benefits.
Quote
Seems like maybe print could really be two plugins a "Save Topic" plugin for those who want to save a topic to send to someone, and a print plug for those who want to offer the option to print. Though really if the save version is printable then really only need a save topic plugin. Maybe even have different file output types depending what extensions the server has. Basic HTML, PDF for those that can ect ect.
Interesting approach. I had actually thought about doing so. I'm just not convinced that people actually use print-page for printing, and that as a result it isn't needed in the core by default.

spoogs

  • Posts: 417
Re: Print Page
« Reply #20, on August 22nd, 2012, 04:24 PM »
I used it once, does that count :P

Stick a fork in it SMF

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Print Page
« Reply #21, on August 22nd, 2012, 04:30 PM »
Quote from Arantor on August 22nd, 2012, 04:16 PM
If the prev/forward navigation relies on robot_no_index, something's wrong because it shouldn't really be.
It's a choice of mine... Really it's all about saving bandwidth. Guests will mostly get to see the prevnext version anyway (we could even make it even simpler by providing them with canonical page links in 'recent posts' and SSI functions, rather than a msgXXX link).
I could also restrict these meta links to guests...
Quote
No... Google will still index all 15 pages normally. The actual problem is that printpage specifically fucks around with page canonicalisation by having content that isn't at the canonical URL. Even though it's nofollow'd, Google still follows it!
I've never seen a wedge.org print page being indexed, though. Heck, I don't remember a single SMF print page being indexed, at all... It's always the wireless content crap that gets the treatment. (And that's no longer an issue in Wedge, eh eh.)
Quote
That's the point, it is NOT handled progressively. It is queried, pushed entirely into $context and then output. When I first went to the URL, it was actually blank.
Hmm...
Well, so it should be done like in Display.php right..? Callback and everything...
Quote
Display does that somewhat bizarre process of having a callback per message specifically so that you can have truly massive messages or vast threads without any problems with memory_limit. Printpage does not do that, it just queries, pushes everything into $context before going to the template. On low memory configurations it's quite possible to overflow that on long threads.
And would that be fixed with a callback?
Quote
If Apache is set up to do it, and not PHP, PHP just has to output its content back to Apache, and PHP just has to make sure that it doesn't run out of memory in whatever it's doing.
Well... That's interesting.
I'm not sure I remember -- does PHP still gzip the page if enabled in Wedge, even if Apache can handle it? If yes, then maybe we should first add a test to see if Apache handles gzipping of HTML pages, and then disable PHP gzipping internally..?
Quote
I'm not disputing the validity of such things. My point is that I don't believe it should be in the core by default. If admins want the ability to archive parts of the forum, that should be up to them.
It can still be core but made to be enabled or disabled...
Quote
The fact we get SEO benefits, plus streamlining parse_bbc a little, these are just nice side benefits.
I don't think it would have that much of an influence over parse_bbc... ;) Plus, I think Aeva Media has some tricks in it, too. (That, and my Subs-BBC.php file has so much custom data in it, I'd rather not see any changes until I'm done with my own ahah...)
Quote
Interesting approach.
Close enough to mine :P
Quote
I had actually thought about doing so. I'm just not convinced that people actually use print-page for printing, and that as a result it isn't needed in the core by default.
Agreed, for the default aspect. Not sure about not-core though.

PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Print Page
« Reply #22, on September 4th, 2012, 07:12 PM »
Bah, how did I miss this one?
Quote
I've never seen a wedge.org print page being indexed, though. Heck, I don't remember a single SMF print page being indexed, at all... It's always the wireless content crap that gets the treatment. (And that's no longer an issue in Wedge, eh eh.)
2.0 made it nofollow by default, 1.1 did not. And it does still get indexed, because even though it's marked nofollow, Google et al *still follows* them for links, even if they don't make it into the index itself.
Quote
Hmm...
Well, so it should be done like in Display.php right..? Callback and everything...
Quote
And would that be fixed with a callback?
Yes, it should be done like in Display.php and yes, making it a callback would fix the issues - though it would also require more than a few minutes work in rewriting it.
Quote
Well... That's interesting.
I'm not sure I remember -- does PHP still gzip the page if enabled in Wedge, even if Apache can handle it? If yes, then maybe we should first add a test to see if Apache handles gzipping of HTML pages, and then disable PHP gzipping internally..?
That's why there's the question on install ;)
Quote
It can still be core but made to be enabled or disabled...
Having given it a couple of weeks' thought, I still think it shouldn't be core.
Quote
PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
And yet, I've seen such warnings...?

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Print Page
« Reply #23, on September 18th, 2012, 04:11 PM »
Quote from Arantor on September 4th, 2012, 07:12 PM
Bah, how did I miss this one?
Story of my life!
Quote
Yes, it should be done like in Display.php and yes, making it a callback would fix the issues - though it would also require more than a few minutes work in rewriting it.
Hmm... Why more than a few minutes? I'm sure a good old case of copy & paste would help a lot... ;)
Quote
That's why there's the question on install ;)
That question is for PHP only, innit...? I don't remember.
Quote
Having given it a couple of weeks' thought, I still think it shouldn't be core.
I suggest that we realistically postpone these discussions to v2.0, if ever... :P
Quote
Quote
PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
And yet, I've seen such warnings...?
Magical.
Sometimes it works for me -- sometimes it doesn't. It sounds like it's some super-unstable function when it really isn't... :-/

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Print Page
« Reply #24, on September 18th, 2012, 04:39 PM »
Quote
Hmm... Why more than a few minutes? I'm sure a good old case of copy & paste would help a lot...
Copy and paste and shed-loads of editing. Display's callback is way more complicated than it needs to be, and print-page's set up is just not designed that way.
Quote
That question is for PHP only, innit...? I don't remember.
That's the beauty of it, either you do it at the PHP level or you do it at the Apache level, and if you do it at PHP level, it will break it at Apache level, hence the test.
Quote
I suggest that we realistically postpone these discussions to v2.0, if ever...
I can do this one in an afternoon ;)
Re: Print Page
« Reply #25, on September 29th, 2012, 08:35 PM »
Been thinking about this again. One of the features I want to offer in relation to the warning system would benefit if print page were removed, though it's entirely possible to do without touching print page at all. I just think it'd be cleaner to remove print page, I'm also thinking about cleaning up parse_bbc to have two paramaters - the message and an array of options to be passed inwards.
Re: Print Page
« Reply #26, on May 12th, 2013, 04:28 PM »
So, the new warning system is in place, my original concern was over the disemvowel feature.

I still want to do the refactoring, and I still think it would be an improvement to pull print page into a plugin. (The refactoring gets much simpler if I don't have to worry about print page, but I can still do it to include that for now.)

Thoughts?
Re: Print Page
« Reply #27, on May 12th, 2013, 05:06 PM »
Just to add, I've already started on the refactoring, the signature for parse_bbc is now two parameters, the message and an array of options.

The array has the current following keys:
Quote
* - smileys (bool) Whether smileys should be parsed or not, regardless of any other bbcode content.
 * - cache (string) If potentially cacheable, this should be the cache's id. If not defined, no caching will occur. This should be a quasi-unique key for the item being parsed, so that if it took over 0.05 seconds, it can be cached. (The final key used for the cache takes the supplied key and includes details such as the user's locale and time offsets, an MD5 digest of the message and other details that potentially affect the way parsing occurs)
 * - print (bool) Whether in the printable mode or not, which disables various tags as well as hiding smileys.
 * - parse_tags (array) A list of tags to be parsed on this run, undefined or empty array to do all those currently enabled. (This overrides any user settings for what is and is not allowed. Additionally, runs with this set are never cached, regardless of cache id being set)
 * - owner (int) If defined, the user id of the author of this content. Used for identifying whether parsing should include user sanctions like disemvowelling.
 * - type (string, required) Indicates what type of content this is. Known values: post, signature
I should add, type is growing as I delineate each of the places parse_bbc is called, the reason for doing so is that it actually adds options to do so. Most of the time it won't make any difference but it does mean we can do things like explicitly know that a piece of bbc is a post or a signature without having to sniff the cache id. It also separates 'print page' from the smileys parameter.
Re: Print Page
« Reply #28, on May 13th, 2013, 12:16 AM »
Just to add, here's the complete list of types of bbc that could be applied and this information is now available to the bbc parser in case a hook or similar wants to modify it.
Quote
*  -- agreement
 *  -- custom-field
 *  -- cut (used with westr::cut)
 *  -- empty-test (for when checking a post is really empty)
 *  -- infraction-notice
 *  -- media-album-description
 *  -- media-comment
 *  -- media-custom-field
 *  -- media-custom-field-description
 *  -- media-description
 *  -- media-embed
 *  -- media-playlist-description-preview
 *  -- media-playlist-description
 *  -- media-welcome
 *  -- mod-comment (comments to reported posts)
 *  -- mod-note (notes in the moderation center)
 *  -- news
 *  -- plugin-readme
 *  -- pm
 *  -- pm-draft
 *  -- pm-notify
 *  -- poll-option
 *  -- poll-question
 *  -- post
 *  -- post-convert (only for WYSIWYG BBC/HTML conversion)
 *  -- post-draft
 *  -- post-feed
 *  -- post-preview (for thread shortened versions of posts)
 *  -- post-notify
 *  -- preview (for previewing posts in editing)
 *  -- preview-pm (for previewing PMs before sending)
 *  -- q-and-a
 *  -- report-media
 *  -- report-post
 *  -- signature
 *  -- thought
* Arantor doesn't know why he used - instead of _ but it just seemed more readable somehow (and I suppose a subtle indication that it isn't code directly but an identifier)

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Print Page
« Reply #29, on May 13th, 2013, 10:11 AM »
You're doing a good job. :)

- I'm all for parse_bbc($message, $array). I'm pretty sure we discussed this possibility already, sometime around last year, and that we were both interested in doing it. I'm appalled at how many interesting new changes we discussed, and then failed to implement later.

- I'm okay with Print being a plugin; don't remember if I was before, but it just doesn't feel that important, after all... What could be improved, really, is giving the admin to ability to only print the current page's worth of posts, and/or only print the entire topic if it has less than X pages. With maybe a (single) page index inside the print page itself, to allow easy printing of multiple pages.

Anything else you needed to know..?
Posted: May 13th, 2013, 10:02 AM

Oh, I see you've already committed... ;)

Looking at it, I have a few things to say...
- Adding so many types, it was overwhelming at first, but suddenly it hit me that you can disable some tag types on some parse types, or hook a plugin into any type of parsing and just that one... The possibilities seem limitless.
- I have to look more into it, but you're using 'parse_type' everywhere in the source code, when the parse_bbc function itself seems to accept only a 'type' parameter, which is shorter and, frankly, easier to manipulate. Is this one of my patented LastMinuteChanges™ you forgot to propagate everywhere...?
- Similarly, parse_tags could be (should be..?) shortened to just 'tags'.

Will add anything later as needed.