Wedge

Public area => The Pub => Features => Topic started by: Nao on April 20th, 2012, 10:37 AM

Title: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on April 20th, 2012, 10:37 AM
(Again, see the 'general' topic for more details on these Mayan topics.)

- Custom membergroups: this one is harder because it has so many ramifications. I'm going to try and address most of them. So, basically:

1/ Add an id_owner (id_member) field to the membergroups table.

2/ Add an interface for users to create membergroups. Very simply: take the admin interface for that, and remove the second half of the membergroup creation page. There are many options that shouldn't be available for non-admin members, and 99% of them are in the second half, very conveniently.

3/ Add more membergroup types. Things like 'hidden' and such are good, but the first membergroup a user creates should be considered their 'friend list' (what I call 'contact list'.)

4/ Add more membergroup fine-tuning. This is a thing I have in Noisen and am really fond of -- I can determine whether people can see all or part of my friends list. Some of my friends I chose to keep in a 'hidden' state so that they would benefit from being in my friends list, without being visible as one of my friends. Perhaps this could be 'simplified' (for code) into two groups: a regular 'friends' group and a hidden 'friends' group... I'd rather have an extra field for that, though. And it's just an example.

5/ Now for the hard parts...

(a) Write some UI for group selection in privacy settings. Not too complicated when it comes to setting privacy from the posting page, but I also want to be able to set privacy on the fly through a select box, like I'm doing on Noisen.com (although it's not a select box there, it's actually a rewrite of the icon selector). The difficulty is not in writing the thing, but in making it so that it's *easy* to understand and use.

(b) And the single most frightening thing for me... Dealing with the aftermath of being able to create membergroups for anything. Basically, from my experience on Noisen, I know that people aren't going to spend their days creating membergroups, but if they end up doing it, it could easily break Wedge if not handled correctly. Do you remember when Facebook had a limit on how many friends you could have? I think it was 5000. It was a lot though, and I doubt any forum would be penalized by having such a high limit on the number of friends. I'm still going to assume that 10k users are going to add Mike42 to at least one of their membergroups, and Mike42 is going to add 10k users to his own 'friends' group (reciprocal friend additions -- that'd be asynchronous in Wedge, i.e. not mandatory, unlike SMF.)

So, we have a members table where the additional_groups field for Mike42 contains 10k comma-separated group IDs... Which probably makes it even impossible to store additional_groups into a TEXT field. (I guess MEDIUMTEXT would do...)
Even then -- it's going to kill performance, whatever we do about it. So we need 3NF for membergroups too. We discussed it at length but, yet, nothing has come of it. I'm getting wary of it, Pete is getting wary of it... I even recently read an old discussion on sm.org where Pete (yes, you Pete :P) actually discussed how 3NF could kill performance if table joins were used everywhere.

Now we have 3 ways of dealing with this... Either we just set a (relatively) low limit on the number of people you can put into a custom membergroup, and/or number of groups you can be put into, in which case I guess staying with the current SMF way of doing things is all right, or we add 3NF and still keep the additional_groups field and up the limit (but not too much), or we do it entirely 3NF and get rid of additional_groups, get rid of any limits and scream at the performance issues later...
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on April 20th, 2012, 01:57 PM
I'm not going to get into the meat of things here, because there's a lot of things that this impacts on, but I will take care of the issue relating to 3NF.

There are times and places that normalised data can have performance concerns. (And really, 3NF is tame. Wait until you start dealing with 5NF. :niark:) Like everything else, that is.

Now, a lot of my concern about normalising everything is that you end up generating far more rows in a resultset than you need or you start getting more queries, and for information you're loading at certain times, it's going to get hideous.

I actually have an example in SMF for this: board level moderators. They're stored in their own table, and during the board index gathering query, they're actually pulled via join into the main query. Now, on a forum without any board level moderators, it's not a problem. But if you have one board with 3 moderators, that board is returned three times, once for each moderator, which has to be cleaned up in PHP, and that accounts for some of the overhead attached to the board index stuff (which is, indirectly, one of the reasons the sub-board-of-a-sub-board problem occurs).

In that particular case, I'd almost argue that having board moderators would almost be better handled as a comma-separated value stored inline in the board table, precisely because there aren't many of them in most cases.

Getting back to the concern here, separating out friends to another table will give you the proper asynchronous structure you want. But it means that loading the list of friends up front is not cheap, because it means either overloading the first query where mem.* is loaded (which for any number of friends is non trivial), or we add a new query at that point, or we push it into a cache.

That said, we can consider a few alternatives... where exactly do we need to know and care about whether a given user is a friend of ours? Well, for the users online area in the footer, certainly, but with an asynch table we can actually do a join from *that* query and figure it out that way. For the PM area we can do a query as needed there, and so on.

Like everything else, it's a change, but it can bring other changes to balance it out.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: nend on April 20th, 2012, 05:16 PM
Quote from Nao on April 20th, 2012, 10:37 AM
- Custom membergroups: this one is harder because it has so many ramifications. I'm going to try and address most of them. So,
Nao is going to group people into groups, since it is a hard task he is only going to address some people to help make the groups.
Quote from Nao on April 20th, 2012, 10:37 AM
1/ Add an id_owner (id_member) field to the membergroups table.
Noa is giving away tables to each of his groups.
Quote from Nao on April 20th, 2012, 10:37 AM
2/ Add an interface for users to create membergroups.
Noa is allowing people to create their own groups.
Quote from Nao on April 20th, 2012, 10:37 AM
3/ Add more membergroup types. Things like 'hidden' and such are good
Noa creates a hidden society
Quote from Nao on April 20th, 2012, 10:37 AM
4/ Add more membergroup fine-tuning. This is a thing I have in Noisen and am really fond of -- I can determine whether people can see all or part of my friends list. Some of my friends I chose to keep in a 'hidden' state so that they would benefit from being in my friends list, without being visible as one of my friends. Perhaps this could be 'simplified' (for code) into two groups: a regular 'friends' group and a hidden 'friends' group... I'd rather have an extra field for that, though. And it's just an example.

(a) Write some UI for group selection in privacy settings. Not too complicated when it comes to setting privacy from the posting page, but I also want to be able to set privacy on the fly through a select box, like I'm doing on Noisen.com (although it's not a select box there, it's actually a rewrite of the icon selector). The difficulty is not in writing the thing, but in making it so that it's *easy* to understand and use.
Noa creates laws and regulations.
Quote from Nao on April 20th, 2012, 10:37 AM
(b) And the single most frightening thing for me... Dealing with the aftermath of being able to create membergroups for anything. Basically, from my experience on Noisen, I know that people aren't going to spend their days creating membergroups, but if they end up doing it, it could easily break Wedge if not handled correctly. Do you remember when Facebook had a limit on how many friends you could have? I think it was 5000. It was a lot though, and I doubt any forum would be penalized by having such a high limit on the number of friends. I'm still going to assume that 10k users are going to add Mike42 to at least one of their membergroups, and Mike42 is going to add 10k users to his own 'friends' group (reciprocal friend additions -- that'd be asynchronous in Wedge, i.e. not mandatory, unlike SMF.)

So, we have a members table where the additional_groups field for Mike42 contains 10k comma-separated group IDs... Which probably makes it even impossible to store additional_groups into a TEXT field. (I guess MEDIUMTEXT would do...)
Even then -- it's going to kill performance, whatever we do about it. So we need 3NF for membergroups too. We discussed it at length but, yet, nothing has come of it. I'm getting wary of it, Pete is getting wary of it... I even recently read an old discussion on sm.org where Pete (yes, you Pete :P) actually discussed how 3NF could kill performance if table joins were used everywhere.

Now we have 3 ways of dealing with this... Either we just set a (relatively) low limit on the number of people you can put into a custom membergroup, and/or number of groups you can be put into, in which case I guess staying with the current SMF way of doing things is all right, or we add 3NF and still keep the additional_groups field and up the limit (but not too much), or we do it entirely 3NF and get rid of additional_groups, get rid of any limits and scream at the performance issues later...
Noa talks about dealing with the aftermath of the apocalypse.
Quote from Arantor on April 20th, 2012, 01:57 PM
I'm not going to get into the meat of things here, because there's a lot of things that this impacts on, but I will take care of the issue relating to 3NF.
Aranator is in charge of getting meat for the society and says "NO, he isn't going to get the meat no more".
Quote from Arantor on April 20th, 2012, 01:57 PM
There are times and places that normalised data can have performance concerns. (And really, 3NF is tame. Wait until you start dealing with 5NF. :niark:) Like everything else, that is.

Now, a lot of my concern about normalising everything is that you end up generating far more rows in a resultset than you need or you start getting more queries, and for information you're loading at certain times, it's going to get hideous.

I actually have an example in SMF for this: board level moderators. They're stored in their own table, and during the board index gathering query, they're actually pulled via join into the main query. Now, on a forum without any board level moderators, it's not a problem. But if you have one board with 3 moderators, that board is returned three times, once for each moderator, which has to be cleaned up in PHP, and that accounts for some of the overhead attached to the board index stuff (which is, indirectly, one of the reasons the sub-board-of-a-sub-board problem occurs).

In that particular case, I'd almost argue that having board moderators would almost be better handled as a comma-separated value stored inline in the board table, precisely because there aren't many of them in most cases.

Getting back to the concern here, separating out friends to another table will give you the proper asynchronous structure you want. But it means that loading the list of friends up front is not cheap, because it means either overloading the first query where mem.* is loaded (which for any number of friends is non trivial), or we add a new query at that point, or we push it into a cache.

That said, we can consider a few alternatives... where exactly do we need to know and care about whether a given user is a friend of ours? Well, for the users online area in the footer, certainly, but with an asynch table we can actually do a join from *that* query and figure it out that way. For the PM area we can do a query as needed there, and so on.

Like everything else, it's a change, but it can bring other changes to balance it out.
Aranator speaks out to Noa the overlord that he has concerns about his master plan.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: nolsilang on April 20th, 2012, 05:33 PM
So it's Noa and Aranator now? :p

Thanks for your summary nend :)
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on April 20th, 2012, 05:52 PM
:lol: Heh, something like that.

It isn't so much that I have concerns about the master plan, more that I have concerns about the way people want to go about things, and taking comments out of context. 3NF is not inherently bad, neither is combining values into text and running them inline, just like everything else these things have to be used properly.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Aaron on April 21st, 2012, 02:08 PM
Will you guys be switching to a row-based group membership solutions, too? While it would cost a few extra tables, it'd be great to finally get rid of the old FIND_IN_SET implementations...
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on April 21st, 2012, 02:40 PM
We've certainly talked about it and it would certainly solve some issues that a few users have had (when members are in many many membergroups)

My initial concern is that raised above: how can we efficiently get the list of groups a user is in when starting up? When it's inline, it can be trivially read during loadUserSettings() when we query for mem.*, which brings us to the two alternatives: do an inner join and get many extra rows, or do a separate query.[1]

Both have their advantages and disadvantages of course, and in theory we could bind that into permissions while we're at it, but that has its own performance concerns, and remember this is something we're doing per page, every page.

I did something similar in SimpleDesk, where I separated roles and groups (so knowing the groups a user was in, there was a query to fetch the roles that applied, then another query to fetch all the permissions for each department for all those roles, I forget why it was two queries though) so it's certainly possible, but it's something important to weigh up.

It would certainly be faster doing reverse lookups against FIND_IN_SET, though. (But looking up groups > members is far less common than members > groups)
 1. Doesn't matter whether it's a subselect or a separate select, it's still an extra query, though it should be pretty fast.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on April 21st, 2012, 03:32 PM
Sorry I've been too busy irl since yesterday to deal with this topic. Will do later.
Jut wanted to mention that in order to get a single entry even with a left join, you can use group_concat() which is Mysql 4.1+.
Dunno about performance but it's certainly a simple way to reproduce the existing feature.
Just wanted to post this to save you time thinking about a solution :)
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on April 21st, 2012, 05:13 PM
I think we'd need to do performance testing on that to be sure ;)

I'm still in a place mentally where I forget such constructs because I still find myself thinking about the cross-system methodologies I've been exposed to over the years...
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on September 18th, 2012, 07:58 PM
Just as a note... This is where the last membergroup discussion took place.
There's also an older contact list discussion that was never finished, here:
http://wedge.org/pub/feats/7038/privacy-options/msg274932/#msg274932

Pete, can you look into the last few posts of these and sum them up for me?[1] You're a better reader than I am, and I tend to get tired too quick when I delve into long posts ;)
 1. Basically: what we determined to be best, contact lists or membergroups, and whether or not this would have an impact on performance etc... So, really, is it realistic to have contact lists or not.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on September 23rd, 2012, 01:00 PM
So, err... No opinions I guess..?
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on September 23rd, 2012, 04:03 PM
Oh I have plenty of opinions but there are a LOT of issues that stem out of this, mostly fixable of course, but nonetheless a lot of issues.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on September 23rd, 2012, 11:37 PM
But do you see more issues coming out of using membergroups for contact lists, or a proper contact list table..?

I think that contact lists are the easiest to do, and thus probably the easiest to maintain afterwards... And we can always add privacy settings for contact lists afterwards.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Arantor on September 23rd, 2012, 11:47 PM
I see every single course of action causing a number of issues. Every approach, even leaving it as we have currently, has issues. It isn't as simple as 'which has fewer issues' because some of those have smaller or larger issues.

Contact lists being separate from membergroups would certainly be cleaner and more manageable.
Title: Re: Things to do before the Mayan apocalypse: membergroups
Post by: Nao on September 24th, 2012, 12:22 AM
Okay, looks like a 'go ahead' sign for me :P
Posted: September 24th, 2012, 12:20 AM

(Plus, we get to reset id_group back to a smallint(5)!)