Plugin servers / getting plugins to a system

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Plugin servers / getting plugins to a system
« Reply #31, on October 19th, 2011, 11:43 AM »
The whole point of doing it is to avoid ridiculous and complex function calls that have multiple parameters that only apply in certain cases. If I'm keeping it to two lines, I may as well not bother with making it a class.

Consider, which is easier to follow... this is a hypothetical example using arbitrary GET headers.

Code: [Select]
loadSource('Subs-Package');
$code = fetch_web_data($url, '', false, 0, array('Range' => 'bytes=0-1536'));

Code: [Select]
loadSource('Class-WebGet');
$wget = new weweb($url);
$wget->setHeader('Range', 'bytes=0-1536');
$code = $wget->get();

The extra values in fetch_web_data are in order: POST data, whether to use keep-alive and redirection level. I'd assume the same defaults for weweb, though. This is a more extreme example, a typical example would normally only have the function declaration and the get. (If you put everything into the constructor, there's really no benefit to it whatsoever because you're just making it a bastardised function call.)
Posted: October 19th, 2011, 11:35 AM

Hmm, after a quick skim of the docs, I wouldn't be adding range support like that anyway, because cURL has proper support for ranges and expects a proper curl_setopt call for it. Still, I'd just convert that to a setRange() call, so that whether cURL or fsockopen is doing it, it can be done.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Plugin servers / getting plugins to a system
« Reply #32, on October 19th, 2011, 11:45 AM »
I see what you mean... Well, considering the function isn't used an awful lot, both versions are fine by me, so use whatever you want ;)
(As for the object name, 'weget' would make a bit more sense. Class-WebGet for the filename is fine.)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Plugin servers / getting plugins to a system
« Reply #34, on October 19th, 2011, 12:32 PM »
In other news, I can only find one instance where fetch_web_data actually uses the keep-alive facility, and it's one that I think has dubious use in the future - it's used when using the existing package manger, when downloading a package, to grab the file and validate it. But I can't actually see where it's *using* keep-alive. It's not like it's going to attempt to reuse that connection between requests.

Hmm, makes me wonder whether it's needed or not - I guess it probably should be supported, in case a plugin wants it.
Re: Plugin servers / getting plugins to a system
« Reply #35, on October 19th, 2011, 12:55 PM »
After getting more and more involved in it, I think it's really not worth the effort. We're not doing it on the browser where it would really make a difference. In our case, even if several requests are being sent to package servers, the time apart is still going to be several seconds, so that keep-alive is bordering on the pedantic in terms of saving (the overhead is that we save a TCP connection, at the cost of tying up the destination host's resources longer, and for the number of requests we'll be making, it would be better not to do so at all)


(If a plugin author ever did want to, assuming they cared enough to realise there's a very very slight performance gain to be made for multiple requests to the same server, they'd be using cURL flat out to do it, and using HTTP/1.1 with some of the other gizmos there. But I'd wonder what the hell they were doing to necessitate it anyway...)
Re: Plugin servers / getting plugins to a system
« Reply #36, on October 19th, 2011, 03:12 PM »
Hmm, further to the above, I do actually need to use HTTP/1.1 for Range support, that's cool because it's only actually an issue in fsock cases since cURL will just deal with it transparently. Interestingly, I think that's actually a bug in fetch_web_data, it issues an HTTP/1.0 block with Host as a supplied header, which didn't exist in HTTP/1.0.

Still, it doesn't change my modus operandi; I'm still rewriting to exclude keep-alive because we're not keeping connections alive between requests in almost every case.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Plugin servers / getting plugins to a system
« Reply #37, on October 19th, 2011, 03:42 PM »
I'll trust you on this :P

And yeah, keep-alive is pointing if we're only downloading one file... AFAIK, the only point is to allow for more simultaneous downloads off a single server by reusing earlier connections.

Oh, and can you look into reusing weget for AeMe as well...?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Plugin servers / getting plugins to a system
« Reply #38, on October 19th, 2011, 03:49 PM »
Yeah, it's pretty headache inducing, but it's been interesting to read up on the finer points of HTTP.
Quote
AFAIK, the only point is to allow for more simultaneous downloads off a single server by reusing earlier connections
Yup. Specifically it saves you the lookup at the TCP level, which if you're doing a lot of work is worth saving but for individual file requests, it's not worth it - even if you do two or three in sequence (e.g. browse list of plugins, browse category within list) it's still not actually going to benefit you too much and in almost every case it would be better to let the host have the TCP connection back instead.
Quote
Oh, and can you look into reusing weget for AeMe as well...?
Sure thing. Once it's tested, anyway ;)
Re: Plugin servers / getting plugins to a system
« Reply #39, on October 20th, 2011, 02:19 AM »
After a day of swearing and wrangling I've got something I'm reasonably happy with, but I'm not quite as happy as I could be.

GIF and PNG work well enough, they only need the first dozen bytes or so but JPEG is a pain because the header is not necessarily where you expect it to be. Plans of just grabbing the first KB went out the window as soon as I started investigating JPEGs, I bumped it to 8KB pretty quickly, and I'm currently working on 16KB.

Still, downloading 16KB to determine file size isn't bad. Of the 15364 files tested with either .jpg or .jpeg extension on my PC, 15039 were able to have their file sizes detected to be the same as what getimagesize returns. Of the 325 remaining, there are a number that are damaged files anyway (recoveries from old HDs and so on) and the rest, pretty much unilaterally, have the relevant marker block after the 16KB boundary. Larger files (even 512x512) sometimes have the boundary at 20KB into the file.[1]

So at this point I'm playing the game of diminishing returns, I can either up the boundary and accept the inevitable loss of performance or I can cut my losses.

Hmm, let's run some stats and get a proper handle on the state of play.
* 16KB: matches 15039 out of 15364 (97.8%)
* 32KB: matches 15246 out of 15364 (99.2%)
* 64KB: matches 15284 out of 15364 (99.5%)

Of the 80 that couldn't be matched at 64K, only 3 weren't corrupted, and they're ones that are special, using CMYK colour separation, so that some browsers don't render them properly anyway.

So we're really looking at 15287 files not 15364 here, which puts the percentages at:
16K: 98.4%
32K: 99.7%
64K: 99.9%

I think I'm going to call it a day here and commit things with the 16K version; we can always bump it up to 32K (or provide instructions) if need be.
 1. Just consider this for a moment. We're talking JPG. It's not layered, not transparent, doesn't have multiple 'images' inside, and yet we have 20KB of stuff before we get to set out the *size* of the image. Funnily enough I can tell you exactly what tools cause that, and one company in particular... :whistle: I'll leave you to guess which company.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Plugin servers / getting plugins to a system
« Reply #40, on October 20th, 2011, 08:02 AM »
I read into your code and you deleted the original SMF code (imagesize and stuff), maybe you should restore it and use it as a fallback for when $size isn't set at the end of the function...?
Okay, that means two hits to the remote server but blah. And an overload of 16KB compared to the older SMF function, but that's only for <1% of all JPG files so that's a very, VERY fair trade...

(Or we could re-try with a 128KB buffer. But then it won't take other files into account -- e.g. slightly broken PNG and GIF. Not that imagesize can magically get their sizes either.)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Plugin servers / getting plugins to a system
« Reply #41, on October 20th, 2011, 09:16 AM »
Quote
maybe you should restore it and use it as a fallback for when $size isn't set at the end of the function...?
I'd rather not, to be honest. The likelihood is that if it's failed getting it from 16K (or 32K), it's probably a big file, which means the original code is going to have to retrieve the entire file which on any host with low settings is going to cause a white screen.

I'd rather go to a 32K buffer than have to hit the server a second time (though the code had a habit of making two requests so even if we did a 16K hit generally and a further 64K hit if it's JPEG and we didn't find it in the first 16K, that would probably not be a killer.
Quote
(Or we could re-try with a 128KB buffer. But then it won't take other files into account -- e.g. slightly broken PNG and GIF. Not that imagesize can magically get their sizes either.)
If it's broken, it's broken, no matter how big the buffer is. There's only any point doing a retry if you can theoretically get some mileage out of it; I wouldn't have it retry on images that show up as PNG or GIF, or things that don't flag up as JPG.
Posted: October 20th, 2011, 08:50 AM

That said, the current code can be refined. Right now it assumes a range will be provided, of up to 16K - but support of ranges is not required (though strongly recommended) in HTTP/1.1 servers, and if the server ends up throwing a large JPG at the user, it's going to step through the file looking at the boundaries. Consequently, if the file is bigger than the specified range, we can legitimately assume it's the whole file, rather than a ranged subset, and apply getimagesize on it.

Nao

  • Dadman with a boy
  • Posts: 16,082

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Plugin servers / getting plugins to a system
« Reply #44, on October 20th, 2011, 10:57 PM »
Hmm I just checked, it needs a filename... So that would require saving the data to the local hard drive, and then calling getimagesize on it. Seems a bit unrealistic to me... :-/