Nao

  • Dadman with a boy
  • Posts: 16,079
Header headache
« on January 25th, 2013, 11:18 AM »
Not really a Wedge bug report per se.

See, a small portion of Wedge user agents aren't using gzipping.
I decided to try and see what was causing that.

First of all, I added this simple line of code:

Code: [Select]
if (!$can_gzip) log_error(print_r($_SERVER, true));

This allows me to look into server headers within the error log. Sorry Pete, that's the reason why it's crowded with these :P

* It appears that:

- Many are due to robots, such as 'UptimeRobot', not declaring gzip capability, which isn't a problem because bots don't need CSS files. So, first thing: should I add an exception in JS/CSS loading for we::$browser['probably_robot']...? If yes, should we add more bots to the spider log? Or, more realistically, either add those we found in the current error log, or add a generic stripos(we::$ua, 'bot') and enforce gzipping *within* CSS and JS only, for these?
Also, bots usually don't provide a 'fake' browser, so they end up with no browser name internally, which means they all use the same, browser-less, uncompressed file. Which isn't a big deal I guess...

- I'd read posts about Accept-Encoding being mangled by proxies and antiviruses (e.g. http://calendar.perfplanet.com/2010/pushing-beyond-gzipping/) which also provides some solutions, but this doesn't seem to be the case here. Not finding anything special. Perhaps this practice is no longer a reality. Or perhaps they just strip the header entirely... There are solutions for this, but they imply JavaScript-testing, and at that point the first uncompressed file is already generated so we'd have to: generate CSS file, test whether gzip is available, if no do nothing, if yes delete generated CSS file and use (or generate) gzipped version... Seems a lot for not much.

- I was hoping to use that to help reduce the number of rogue files, considering that adding the OS version would potentially multiply the number of files by a great magnitude. However, in just one hour online here, not many files were created, so it probably isn't a big deal.

I'm curious to know if there's anything of interest here.

* Also, there is something I'd like for us to deal with... And possibly more important.

CSS files are being generated even when an Atom feed is being requested. I don't think that's the intended way...! I looked into the code, and it seems that most of the bypasses are done through isset($_REQUEST['xml']), which is a bit limited. First of all, there's always the cool Ajax flag that we should test against when loading from jQuery stuff. Then, the feeds --- they generate XML files. Why don't they go through the exceptions...?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Header headache
« Reply #1, on January 25th, 2013, 03:05 PM »
Quote
CSS files are being generated even when an Atom feed is being requested
That's... interesting. And unexpected.
Quote
add a generic stripos(we::$ua, 'bot') and enforce gzipping *within* CSS and JS only, for these?
Well, we could do that - we do still need to give them some CSS because some of them do do site caching (especially the likes of the Internet Archive, which doesn't have a 'bot' username)

Hmm, I don't know. I'm still a bit spaced out from the visit to the dentist...
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial