Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
So, busy out here in RL
« on May 2nd, 2012, 12:50 AM »
I thought I'd share some of my frustrations with you all.

I won't get into the detail of it but the last few days I've been snowed under - as I mentioned briefly - in building an app using Node.js, Express, Socket.io and MongoDB. I'm not going to explain what the app is, hopefully you'll see soon enough, but it's certainly been an interesting experience for me so far, and I feel like sharing what I've found.

First up, for those of who aren't familiar with Node.js, let me explain. We here are far more used to deploying apps where we have a web server like Apache, which receives requests, passes some of them onto PHP to process the contents of the page, and handles the page until it's ready and sends it back to the user.

It's convenient, practical and we're all used to it, but the app I've been working on lately requires longlife connections, what we would normally consider to be Comet style. I could have done it in AJAX but Comet is far less vicious on the server, especially given the real-time nature of the app in question, but in Apache/PHP this just is not practical.

Enter, then, Node.js. At first glance it sounds fantastic - you're working in JavaScript on the server side (so it's easy to learn, in theory) but there's no webserver. In other words, you write some JS that also functions *AS* the webserver.

From the Node.js docs, this would be pretty much the simplest possible webserver in Node:
Code: [Select]
var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

Yup, you see that right. You're handling raw HTTP requests and headers. For someone like me who's been behind the cosy framework of PHP for years, this is... something of a change of pace. It's not a bad one at all, it's just a lot to take on board.

Especially with the heavy-going view it has towards I/O. In PHP, the script runs from top to bottom, and it waits for I/O to complete - so execution of a given page waits while a DB query completes, for example. This doesn't happen in Node, or at least it shouldn't.

The idea, really, is that you push the request off to another function and handle its response in a callback, so that mainline execution can get back to whatever it was doing. This, for me, is a *major* change of pace because it's just not a mentality I'm used to working in.

But, so far so good, actually. I was able to fashion something workable with that, but it wasn't really what I wanted, I wanted something that did some of the work for me. Yes, that's right, I started looking at frameworks and ended up with Express.

Express makes a lot of the work easier, but it has two problems. The first is that its documentation is absolutely crap. No, really, it's ABSOLUTELY SHIT. http://expressjs.com/guide.html is the entire official guide.

There's several sub-problems. Firstly, it makes no mention of Express 3.0 at all and only barely mentions 2.x in passing for migration. It would be OK reading the source, if Express were all the source, except it isn't. It's a framework on top of another framework, called Connect, and it's very hard to figure out WTF is going on when Express doesn't seem to point to another about Connect, like any other documentation.

The second problem is probably a bigger one in the scheme of things. A lot of people did figure out how to use Express and muddle through. And some people have written blogs on it, others have asked and answered things on StackOverflow. The result was that I've had dozens and dozens of tabs open in my browser for the last couple of days as I start to piece things together.

I mean, I have a functioning webserver, it receives requests, if they're dynamic pages, they're routed properly. If they're static resources, again routed properly. 404 and 500 have their own proper pages with correct headers and everything. But it took me over a day to figure all this out to make it work how I wanted because the documentation is so poor. And I mentioned that I had dozens and dozens of tabs open... I still have about 18 tabs open for the next stage of the project, and that's the problem in itself: there's so many bits and pieces on it but they're all haphazard and many of them are legacy items.

For example, I found one tutorial from late 2010 which refers to using Express methods for handling static content, and it refers to staticGzip and staticProvider; the former indicating that content should be gzipped and the latter to indicate that a given path is where static content should be served from.

But the manual doesn't mention these, it only talks about a method simply called static, which has a slightly different calling argument structure to staticProvider but does mostly the same job. After 20 minutes of digging around I found that staticProvider was deprecated in favour of simply static[1] and that staticGzip is no longer directly available because most people deploying such apps would serve static content like that from a CDN and not from your own server. A reasonable if slightly misguided idea, I think. As it happens in this case I'm looking at deploying with sufficiently long-life expiries that it doesn't really matter so much.

But anyway, I've spent so long trying to piece everything together that it's just left me feeling so frustrated.

I haven't actually told you the worst part yet, actually. Node has a nifty-in-principle tool called npm for installing packages. There's a central repository that lists packages and most of the time you can do npm install <package> and boom, it'll download. Unless it's Express, in which case you have to have make installed to be able to install it. This took me a while to figure out due to no-one describing it anywhere and npm giving me less than helpful messages.

It gets a bit better, actually. You can declare a package.json file which indicates a package's dependencies. This can be the Node versions supported, or it can be the version(s) of packages you need. My project needs Express, Socket.io and MongoDB's connector, seems straightforward enough. Until you hit the joys of figuring out which versions you need, of course.

Oh, and there's more fun. Each of those has other dependencies which also needs to be met. And it's not clear whether some of those come installed or as optional extras, Express for example states a 'devDependency' of Jade, though it doesn't actually install Jade unless you ask it to. Again, not documented anywhere. Then more version juggling.

Jade is actually pretty neat, though. Had we not had the template skeleton I might have suggested a move to it, because it allows for replacing blocks of a document, for prepending/appending content to blocks and so on. Plus the fact that Jade's syntax is incredibly tight and descriptive and not the usual verbosity of HTML itself. http://jade-lang.com/ if you're curious.


Anyway, that's enough rambling and venting from me. For my next challenge, figure out how sessions work (and figure out how to make them compliant with the EU laws, ahahahahah) and then bolt on Websockets type connections to it and maybe see if we can't do something truly wonderful from there on in!

The bottom line is that Node and its tools make for a very interesting and flexible platform for deploying unconventional applications but the lack of good documentation for major components (Node itself is well documented, just not its good bolt-ons so much) really makes it a struggle to implement anything.
 1. Including a snide mention from Express's author that JSLint is 'lame' because it can't differentiate between 'static' being used as a method name as being used to indicate something being static.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: So, busy out here in RL
« Reply #1, on May 2nd, 2012, 08:42 AM »
I've read this entirely, and all I got was this lousy T-Shirt.™

Node.js sounds fun to play with, still... Maybe that's because I'm so much into JS these days. (I don't know, I just like playing with it... Even though I'm doing things the other way around, like getting rid of prototypes and so on... Even though -- I think that a prototype as such should be a thing of plugins, not stock code -- stock code should have this.function instead of object.prototype.function... Anyway.)

I still don't understand the point of Node.js, though. (Or perhaps, even more fittingly, the point of being force to use it to compile LESS scripts when WeCSS proves that it's perfectly easy to do the same in PHP....)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: So, busy out here in RL
« Reply #2, on May 2nd, 2012, 01:48 PM »
I'm not really using it for LESS particularly.

OK, the reason I'm using it instead of PHP is because what I'm trying to do cannot be done in Apache/PHP or nginx/PHP without capping the number of users to a fraction of what I can do with Node.js.

Here's the thing, the nature of what I'm building relies on holding connections open; no point polling the server every few seconds unless there's something to receive. So I have two choices, I can either poll the server every few seconds (which is a load generator in itself) or I can have long life connections (think Websockets or Comet long-polling)

The problem with long life connections done with PHP is that each connection opens a PHP interpreter instance. That means I can scale to maybe 100 connections (which doesn't even mean 100 users) before I run out of memory on a 1GB RAM VPS. I did experiment with using nginx with the HTTP Push module but the authentication method doesn't really work there because you end up holding connections without any real ability to authenticate (because in the nginx/PHP scenario, you're using PHP-FPM, there's no way I could find to receive the connection, authenticate, then disconnect from PHP-FPM until there's something to actually send along that authenticated channel)

WIth Node.js however I can open many many connections to idle cheaply and only do work on them when there is work to do, and that includes authentication when the connection comes in.

Nao

  • Dadman with a boy
  • Posts: 16,079
Re: So, busy out here in RL
« Reply #3, on May 5th, 2012, 12:22 PM »
Next time you have a drink, you'll be considering porting Wedge to Node.js... :P

When you're talking about an open connection -- it reminds me of the difficulties I met with KMJ that contributed a lot to my loss of interest for win32 programming and brought me to PHP and stuff...
To think that, if we'd had websockets and things like that back in '06...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: So, busy out here in RL
« Reply #4, on May 5th, 2012, 01:47 PM »
I already considered how one might run PHP via Node.js... ;)

Dragooon

  • I can code! Really!
  • polygon.com has to be one of the best sites I've seen recently.
  • Posts: 1,841
Re: So, busy out here in RL
« Reply #5, on May 5th, 2012, 07:28 PM »
But if you're polling the client side itself, how does that help? Are you querying the server-side database through Node.js? if so, how? Because I am not getting how you're ultimately taking the server out of equation.
The way it's meant to be

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: So, busy out here in RL
« Reply #6, on May 5th, 2012, 07:31 PM »
In the conventional server setup you have the client polling Apache, which has to contact PHP. It cannot scale to that many connections when you leave them open (because the PHP client has to remain idle and open per connection in Apache)

In this case, Node.js is the webserver itself, and it's more than capable of leaving thousands of connections open and idle with little significant overhead and push things back to the client when necessary (so that instead of making regular polls, we use websockets to open full duplex connections - the result is MUCH less work all round)

Dragooon

  • I can code! Really!
  • polygon.com has to be one of the best sites I've seen recently.
  • Posts: 1,841
Re: So, busy out here in RL
« Reply #7, on May 5th, 2012, 07:37 PM »
Ah....crap. Node.js is a full webserver which is written using JavaScript...I thought it was a JS framework acting as a webserver(Which made all logic fail in my head). Are there any significant disadvantages to Node.js when compared to Apache apart from the spectacular documentation?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: So, busy out here in RL
« Reply #8, on May 5th, 2012, 07:54 PM »
The fact it's the webserver, really. Installing Apache is piss-easy, building anything even modestly complex in Node.js tends to invoke lots of things on top. For example, I'm using Express to handle routing and certain types of errors, but even that requires some work to do anything intricate that isn't 'out of the box'.

The other thing is that you have to be careful about how you do things. Node itself is a single thread, and the idea is that for any I/O operations you're supposed to branch off, letting a subprocess perform the I/O (which includes DB queries for example) and handle the result as a callback; the idea is that instead of having to wait around for queries to complete, you don't block the process from working and allow the main process to process other requests.

It's an interesting process if you're not used to writing in that manner, because while it is much like the JS you'd write in the client of 'kicking something off and getting a callback when it's done', you really have to do that a lot more and as you can imagine anything with any complexity is going to require multiple nested callbacks, even with some of the abstracted stuff going on to make it simpler.

The one thing I'd note is that most folks using Node aren't using conventional DBs either, but using things like CouchDB and MongoDB (Mongo is more what I'm looking for, and there is a lot of advantage to having JSON for adding objects and nesting objects inside them)

Also imagine that you have to do a lot more of the session handling sorts of things yourself (though Express does some of that for you)

Node.js is a tool for specific caseloads, rather than a general deployment.