Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Improving search
« on August 9th, 2013, 07:40 AM »
So I've been thinking about searching and the way searching works and I've concluded a number of things.

1. I want Sphinx and ElasticSearch in the core
Both Sphinx and ElasticSearch are pretty mature. Both have live update features now, so there's no reason we can't support them both with a sort of push mechanism (rather than Sphinx's pull mentality, like the old API was geared for)... the API needs rewriting to support either anyway and I might as well do it all together.

2. I want to natively support other types of data than posts.
The current setup doesn't support anything other than posts and I want to natively offer support for other stuff - calendar, helpdesk etc. The backend can support these extra things with some work, and pushing these also allows nice support in both ES and Sphinx.

3. It adds some nice feature parity with other systems without adding a ton of headaches for support.
XenForo offers ES with a $60 plugin, though I'm not entirely sure why. IPB has Sphinx in the core. Neither appears to have a huge support overhead because of them. And for the most part once they're done, they're done from our point of view.

4. The most controversial aspect of this is what I want to propose last: ditching unindexed searching.
Right now, the default searching method in Wedge is as it is in SMF: no index. It's slow, and doesn't scale beyond a few tens of thousands at peak. In fact, where we are right now on wedge.org is probably about the limit of what we can do with an unindexed DB before performance starts to go nuts. (40k is really the upper limit)

Now, partly this is because we've never configured it to be anything else, and most people just wouldn't know to do so because they wouldn't know any better. Now that's fine, because we know that people don't generally touch the settings unless they're directed - but using the search index would deliver better search performance from about 1k posts and up (and largely a push in performance terms for where things are right now for fresh installs)

I see no reason why 'no index' ever needs to be a valid search type. I'd suggest dropping that entirely and using the 'no index' option to mean 'no searching'. And then leaving the other index types to be actual index types, which would simplify the search code as well (and properly allow for it all to be segregated back to the APIs, some of which has already occurred but plenty more still to do)

This would leave us with three working search types (standard - formally known as custom, ES, Sphinx), of which 'custom' would be set as default on installation and would be populating posts as they are created (rather than having to deal with a huge index creation at once)

ES and Sphinx are both VPS level options, but there's no reason we can't have people pushing content to these indexes while using the custom index - plus of course there are always options for rebuilding indexes.


Does any of this make sense? Any questions?
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Powerbob

  • Posts: 151
Re: Improving search
« Reply #1, on August 9th, 2013, 12:56 PM »
I actually understood this :whistle: and I think it's a great idea :cool:

Dragooon

  • I can code! Really!
  • polygon.com has to be one of the best sites I've seen recently.
  • Posts: 1,841
The way it's meant to be

Norodo

  • Oh you Baidu, so randumb. (60 sites being indexed at once? Jeez)
  • Posts: 469
Re: Improving search
« Reply #3, on August 9th, 2013, 04:58 PM »
I can't think of any reason not to.