1) It could be a plugin rather than a core feature. The savings for the bulk of forums would be quite significant since robots do often follow nofollow links, they just don't carry link juice through. So removing it entirely would actually save quite a bit of bandwidth in the long run.
2) If it's not made a plugin, it could have permissions attached, e.g. default to be not visible to guests.
But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
But guests are also entitled to viewing printpage if they're not bots...
OTOH, one of the good points of printpage is that it gets you more chances to have multiple keywords stuffed in it.
SMF has it, however it uses it to point to previous/next topic, not pages within a multi-page topic (and I can't remember if that's the correct use or not actually)Quote But I think that Google & co know better when it comes to a topic page - especially with the 'next' and 'previous' (prev?) meta keywords in it to indicate that it's a multi-page topic.
Except that it isn't. It's been marked as nofollow for some time, even in SMF, which means search engines are not supposed to consider it for ranking purposes - but they will still frequently index it anyway.
Also note that there are cases where you can screw things up because of the way print page works. For example, try to get the print page of the Aeva topic on SMF, it's likely going to fail hard when it runs out of memory.
The main display staggers it - it pulls one post, processes it, displays it. Print page gets *everything* in one go and then hands it all to the template. For larger topics, doubly so on low-memory configurations, it's going to go splat.
So there are many reasons why the meta links wouldn't show up: either there's only one page in the topic (duh!), or it was access through a msgXXX or #new link, things like that...
And what matters is obviously bots only, here. I don't think that accessibility gurus would even kill anyone for not providing these meta links, because seriously... Who ever uses those?! Your browser has to support them, you have to show a special toolbar that takes room in the UI, etc...
They index it, but it doesn't bring PR to the overall site, that's what you mean...? I guess the point here is simply to have people be able to reach your site through multiple keywords... (although reaching it through the printpage... Urgh!)
Yeah, I've never really noticed that... Can you post a direct link please? I don't remember where that topic is
But that's the thing about printpage, the main point is not that it's printable, the main point is that it's savable... (saveable?)
I used to do that precisely on sm.org topics back in the day.
Then the check that normally sets robot_no_index should be modified.
The usual rule is that if there is random stuff in $_GET (i.e. other than topic) and/or if $_REQUEST['start'] is non-numeric (which is set up with msgXXXX for linking to specific posts, new for new items and is numeric if you're using conventional pagination)
I seem to recall it's pretty much only Opera that does with gestures on the browser side.
No. What it means is that it is indexed, but no PR is brought from the main site to the printable version.
Print page actually works, but I dread to think what the memory limit is set to.
I'm still not entirely convinced this needs to stay in the core - other than archiving threads to send to people, I've never used the damn thing.
Why?
Yes, but you didn't finish your sentence...
Oh, yeah, right... So it's hidden in page 15 right? But if you use rare keywords, it'll still show it on page 1, which is better than no results at all (because of printpage not being available.)
Then again, if (and only if) Google's handling of prev/next works as expected on Wedge, then I suppose we can expect it not to need a printpage for that...
Heck, maybe it's already done that way... Because the topic just didn't show up in one go on my browser, it loaded progressively...
I'd say, if mod_gzip and PHP are smart enough to catch the output buffer and gzip parts of it (I believe gzip is suited for chunk transmissions?), then it's not worth worrying too much about memory...
Then again, I'm not exactly a server/Apache/PHP internals specialist, and I probably said something silly.
Same here...
Possibly, what we could do is, instead of directly showing the printpage version, we could hmm... Show a choice to the user: either print the current page, or print the entire topic, or show an archive of the topic for safe-keeping. Then we could handle all of them differently...
Seems like maybe print could really be two plugins a "Save Topic" plugin for those who want to save a topic to send to someone, and a print plug for those who want to offer the option to print. Though really if the save version is printable then really only need a save topic plugin. Maybe even have different file output types depending what extensions the server has. Basic HTML, PDF for those that can ect ect.
If the prev/forward navigation relies on robot_no_index, something's wrong because it shouldn't really be.
No... Google will still index all 15 pages normally. The actual problem is that printpage specifically fucks around with page canonicalisation by having content that isn't at the canonical URL. Even though it's nofollow'd, Google still follows it!
That's the point, it is NOT handled progressively. It is queried, pushed entirely into $context and then output. When I first went to the URL, it was actually blank.
Display does that somewhat bizarre process of having a callback per message specifically so that you can have truly massive messages or vast threads without any problems with memory_limit. Printpage does not do that, it just queries, pushes everything into $context before going to the template. On low memory configurations it's quite possible to overflow that on long threads.
If Apache is set up to do it, and not PHP, PHP just has to output its content back to Apache, and PHP just has to make sure that it doesn't run out of memory in whatever it's doing.
I'm not disputing the validity of such things. My point is that I don't believe it should be in the core by default. If admins want the ability to archive parts of the forum, that should be up to them.
The fact we get SEO benefits, plus streamlining parse_bbc a little, these are just nice side benefits.
Interesting approach.
I had actually thought about doing so. I'm just not convinced that people actually use print-page for printing, and that as a result it isn't needed in the core by default.
I've never seen a wedge.org print page being indexed, though. Heck, I don't remember a single SMF print page being indexed, at all... It's always the wireless content crap that gets the treatment. (And that's no longer an issue in Wedge, eh eh.)
Hmm...
Well, so it should be done like in Display.php right..? Callback and everything...
And would that be fixed with a callback?
Well... That's interesting.
I'm not sure I remember -- does PHP still gzip the page if enabled in Wedge, even if Apache can handle it? If yes, then maybe we should first add a test to see if Apache handles gzipping of HTML pages, and then disable PHP gzipping internally..?
It can still be core but made to be enabled or disabled...
PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
Bah, how did I miss this one?
Yes, it should be done like in Display.php and yes, making it a callback would fix the issues - though it would also require more than a few minutes work in rewriting it.
That's why there's the question on install ;)
Having given it a couple of weeks' thought, I still think it shouldn't be core.
And yet, I've seen such warnings...?Quote PS: and once again, I didn't get any warnings for spoogs' post above mine, which was sent after I started my reply... My 'last' variable was set to 281415, so I should have gotten a warning, no..?
Hmm... Why more than a few minutes? I'm sure a good old case of copy & paste would help a lot...
That question is for PHP only, innit...? I don't remember.
I suggest that we realistically postpone these discussions to v2.0, if ever...
* - smileys (bool) Whether smileys should be parsed or not, regardless of any other bbcode content.
* - cache (string) If potentially cacheable, this should be the cache's id. If not defined, no caching will occur. This should be a quasi-unique key for the item being parsed, so that if it took over 0.05 seconds, it can be cached. (The final key used for the cache takes the supplied key and includes details such as the user's locale and time offsets, an MD5 digest of the message and other details that potentially affect the way parsing occurs)
* - print (bool) Whether in the printable mode or not, which disables various tags as well as hiding smileys.
* - parse_tags (array) A list of tags to be parsed on this run, undefined or empty array to do all those currently enabled. (This overrides any user settings for what is and is not allowed. Additionally, runs with this set are never cached, regardless of cache id being set)
* - owner (int) If defined, the user id of the author of this content. Used for identifying whether parsing should include user sanctions like disemvowelling.
* - type (string, required) Indicates what type of content this is. Known values: post, signature
* -- agreement
* -- custom-field
* -- cut (used with westr::cut)
* -- empty-test (for when checking a post is really empty)
* -- infraction-notice
* -- media-album-description
* -- media-comment
* -- media-custom-field
* -- media-custom-field-description
* -- media-description
* -- media-embed
* -- media-playlist-description-preview
* -- media-playlist-description
* -- media-welcome
* -- mod-comment (comments to reported posts)
* -- mod-note (notes in the moderation center)
* -- news
* -- plugin-readme
* -- pm
* -- pm-draft
* -- pm-notify
* -- poll-option
* -- poll-question
* -- post
* -- post-convert (only for WYSIWYG BBC/HTML conversion)
* -- post-draft
* -- post-feed
* -- post-preview (for thread shortened versions of posts)
* -- post-notify
* -- preview (for previewing posts in editing)
* -- preview-pm (for previewing PMs before sending)
* -- q-and-a
* -- report-media
* -- report-post
* -- signature
* -- thought
- I'm okay with Print being a plugin; don't remember if I was before, but it just doesn't feel that important, after all... What could be improved, really, is giving the admin to ability to only print the current page's worth of posts, and/or only print the entire topic if it has less than X pages. With maybe a (single) page index inside the print page itself, to allow easy printing of multiple pages.
but suddenly it hit me that you can disable some tag types on some parse types, or hook a plugin into any type of parsing and just that one... The possibilities seem limitless.
This is not to speak for others but Print Page code is the first thing I removed from display.php in SMF, it is not doing anything other than creating another link for the thread which is against Google SEO procedure - creating multiple links for a single post.
If you want to move it, that's cool, I just didn't think of that. The alternative, of course, is to make post the default rather than 'unknown' and strip the parameter from any 'post' call since posts are the predominant use.
That's another change I made and forgot about then because when I first wrote it, it definitely was 'unknown'.
Wanchope is right: even if it is marked nofollow, Google will still view it, it just doesn't consider it for link juice following purposes, which isn't the same as noindexing.