Poll

Would you like to have topic privacy options in Wedge?

Yes -- everyone, just logged in users, and just the author.
4 (14.8%)
Yes -- everyone, just logged in users, just the author, and author's buddies (the regular SMF feature), even if it hurts performance a bit.
2 (7.4%)
Yes -- everyone, just logged in users, just the author, and author's contact lists (like buddies, but you can create multiple lists and put people in one or more of them), even if it hurts performance a bit.
16 (59.3%)
Yes -- everyone, or just me (i.e. just the ability to write drafts...)
0 (0%)
Yes, but I don't really care, I would never enable the feature on my forum.
1 (3.7%)
No, I don't care, and my users wouldn't either.
4 (14.8%)
Total Members Voted: 23

Nao

  • Dadman with a boy
  • Posts: 16,082
Privacy options
« on November 28th, 2011, 09:09 PM »Last edited on December 1st, 2011, 11:24 AM by Nao
Splitting this topic into its own from the original Selectbox topic...
Please read the bold characters below and tell us your opinion!

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #1, on November 28th, 2011, 10:23 PM »
Chosen is slick, really slick. It even works great on iPad.

I'm not sure what the options on Noisen are. Normal topic privacy setup would imply:

* topic starter only can see it
* topic starter and their contacts can see it
* topic starter and moderators
* anyone who can access the board

This should cover all the main cases. I don't think list of groups is needed, and I think that if it's down to the above, a simple number could deal with it.
When we unite against a common enemy that attacks our ethos, it nurtures group solidarity. Trolls are sensational, yes, but we keep everyone honest. | Game Memorial

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #2, on November 29th, 2011, 11:04 AM »
Quote from Arantor on November 28th, 2011, 10:23 PM
Chosen is slick, really slick. It even works great on iPad.
Yeah, it's a beautiful component...
Unfortunately, I tested it on my iPod and it has issues:
- there are graphic glitches inside the input boxes. Minor, but visible.
- because there's always a combo box, Safari will enlarge the page and focus on the input box. And because of that, you get the keyboard and you suddenly can't see the list of items at all... Good luck browsing it!
Quote
I'm not sure what the options on Noisen are.
Default (i.e. refer to board's access list), Logged in users, My Friends, Custom Groups and Just Me.

Come to think of it... I doubt anyone even noticed the feature itself.
Go to any non-blog board index.
Find a topic started by you. Just below the title, there's a faint icon of a key visible at the left. Click it. There you go...
Now, click 'Groupes Spécifiques' (I'm sorry that it was never localized to English...), which means 'Custom Groups', and a new popup shows up to its right, where you can fine-tune your selection.
I think it's a pretty neat way of selecting privacy... And because I'm not going to keep 'groups' in the list anyway, I was thinking I could instead show a list of checkboxes for friends -- either a complete list of friends, or simply 'friend types', something I will probably add in the future (close friends, family, co-workers, etc.) I'm just not sure whether it'll be something that doesn't hurt performance, really.
Quote
Normal topic privacy setup would imply:

* topic starter only can see it
'Just Me'
Quote
* topic starter and their contacts can see it
'My Friends' / 'My Contacts'
Quote
* topic starter and moderators
That would be 'Just Me', too, I guess... By definition, a global moderator (or admin) can read anything on any board.
Quote
* anyone who can access the board
'Default'
Quote
This should cover all the main cases. I don't think list of groups is needed, and I think that if it's down to the above, a simple number could deal with it.
Yup -- except:
- if we add a new option in the future and it should be placed before 'Just Me' in the list (because 'Just Me' is really the deepest level you could have), the select box would have <option>1</option><option>3</option><option>2</option>, to simplify. I'm not a big fan... Although, who cares :P
- friend granularity would be hard, or even impossible, with a simple number. Unless we give friend groups a unique ID for everyone (like, I'm new to this site, and the first friend group I'll create will have id #2356 because there are already 2000+ other contact lists), and we start these IDs above 3 or 4 (used for default, logged in members, and all friends.)

Maybe we could also have 'by default' a general contact list for everyone, created when they first make a friend, and then they can add their contacts to other contact lists (with their own contact list ID), mutually exclusive or not... hmm... It would probably make a lot of sense, and would certainly help with sql queries.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #3, on November 29th, 2011, 03:34 PM »
Quote
Unfortunately, I tested it on my iPod and it has issues:
You mean the little grey line that appears after the item itself in the container? I thought that was intentional, to indicate there's more...
Quote
- because there's always a combo box, Safari will enlarge the page and focus on the input box. And because of that, you get the keyboard and you suddenly can't see the list of items at all... Good luck browsing it!
Doesn't happen on an iPad. The keyboard appears, sure, but the items are still accessible (and you can hide the keyboard should you choose to do so)
Quote
either a complete list of friends, or simply 'friend types', something I will probably add in the future (close friends, family, co-workers, etc.) I'm just not sure whether it'll be something that doesn't hurt performance, really.
Anything that's more involved than 'all or nothing' of a group is more work and it will have performance issues. The real question is whether that's needed or wanted.

I've seen plenty of requests for the 'just me' and 'just me + moderators' setup. I'm really not comfortable with having just me always show moderators, though. I'm thinking the journal/blog setup where you have private posts and public posts and some of those are going to me really 'me only' items. I'd be OK with admins (only) having access but not moderators. Just me implies a certain level of privacy, after all.
Quote
- friend granularity would be hard, or even impossible, with a simple number. Unless we give friend groups a unique ID for everyone (like, I'm new to this site, and the first friend group I'll create will have id #2356 because there are already 2000+ other contact lists), and we start these IDs above 3 or 4 (used for default, logged in members, and all friends.)
Correct. Which is why I'm not keen on offering it, for the simple reason that it's a massive pain to cope with, because it makes processing it much more complex - and this is something that has fairly major performance concerns to mess with.
Quote
It would probably make a lot of sense, and would certainly help with sql queries.
I don't really like the idea of there being an 'everyone' contact list, no matter how notional it is or where it's used, because it always leads to trouble. I've been down the road of being in an environment with 'everyone' lists and people end up making information open to more people than they thought with 'everyone'.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #4, on November 30th, 2011, 12:49 AM »
- No, it's something else.
- The keyboard sometimes gets hidden automatically. It's still not very practical. Yet, it looks good, needless to say... But I'm sure we can do better. Heck, actually I'm thinking a lot about going for the 'Wedge menu' style, like John suggested... Plus, I just know myself, I'll probably just rewrite the whole thing from scratch... Maybe using ideas and samples from other libraries.
- Okay for moderators, you've convinced me they should be kept out. Still, I think it's best to leave that option aside -- just have 'Just Me'.
- I'd like to have some user opinions on contact lists. Who do you think does it best? Noisen.com (if you ever used it)? Facebook? Google+? SMF?
- Additionally, how would contact granularity hurt performance more than a general 'My Contacts' choice...? Considering we'll be hitting an extra table in every case?

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #5, on November 30th, 2011, 02:31 AM »
Quote
- Okay for moderators, you've convinced me they should be kept out. Still, I think it's best to leave that option aside -- just have 'Just Me'.
No, I had a specific reason for 'just me + moderators'. There's a poor-man's helpdesk, there's for discussing reasons for bans etc, discussing 'application forms' on clan type sites. The list goes on, and it's much easier for us to implement it there in the core than it would ever be to bolt it on later.
Quote
- I'd like to have some user opinions on contact lists. Who do you think does it best? Noisen.com (if you ever used it)? Facebook? Google+? SMF?
Definitely not SMF.
Quote
- Additionally, how would contact granularity hurt performance more than a general 'My Contacts' choice...? Considering we'll be hitting an extra table in every case?
Because it wouldn't just be an extra table hit. If you want to store anything other than a simple number, you have to either implode it and store it inline, or you have to store it in another table, which means that's *two* extra tables vs what we have now, not one. And believe me, the notion of putting an imploded field in the topics table is a no-no, seeing how it would make the entire topics table an order of magnitude slower because right now there are no variable-size fields in it, which is a very, very good thing.

If it's kept as a simple number, it's possible to solve a touch more efficiently, because what you can then do is figure out who the users are who have the current user as a friend, and turn it into (where topic starter = me OR (privacy = friends AND topic starter IN (list of people who friended me)).

The one thing to realise about topic privacy, and this is quite important: it is going to suck compared to board privacy. It's unavoidable, because there's no way to do it in a way that adds extra conditions that can be evaluated without ORs (except in the just me or everyone cases) - ORs are bad for performance because they're an extra branch and often virtually a sub-query in their own right.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #6, on November 30th, 2011, 11:13 AM »
Quote from Arantor on November 30th, 2011, 02:31 AM
No, I had a specific reason for 'just me + moderators'. There's a poor-man's helpdesk, there's for discussing reasons for bans etc, discussing 'application forms' on clan type sites. The list goes on, and it's much easier for us to implement it there in the core than it would ever be to bolt it on later.
Hmm.. Okay, okay... But I don't see this being useful in Thoughts, for instance...
Quote
Quote
- I'd like to have some user opinions on contact lists. Who do you think does it best? Noisen.com (if you ever used it)? Facebook? Google+? SMF?
Definitely not SMF.
(Why is it that no one else is interested in this conversation...? :()
It was kind of rhetorical. Noisen has asynchronous contacts, plus the ability to hide selective contacts from viewers. That one will be in Wedge too, once I get to implementing contact lists...
Quote
Because it wouldn't just be an extra table hit. If you want to store anything other than a simple number, you have to either implode it and store it inline, or you have to store it in another table, which means that's *two* extra tables vs what we have now, not one.
Oh... I see what you're talking about -- selecting several contact lists for viewers. I was actually thinking of storing just one contact list, because I don't think there'd be a reason to select more than one. For instance -- if you have a family-only post, you select your family. If you have a work-only post, you select your co-workers. If you have a friends-only post, you select your friends. Among which can be some of your family and co-worker list members, of course. The point is having the ability to put some people into multiple lists. Then, when a list is modified, the 'buddy_list' field in {db_prefix}members is updated to reflect the entire list of contacts.
Although I'm not sure we'll be using that field much in the future... But IIRC there are reasons to leave it in.

If you start setting privacy settings on everything in your profile for instance, it'll be a disaster if you have to set multiple contact lists in each. I think it's much smarter to encourage people to put their 'safe' friends into a special list, and give that list all permissions, and deny the rest to anyone else -- guests, members and contacts that aren't in the safe list.
Quote
And believe me, the notion of putting an imploded field in the topics table is a no-no, seeing how it would make the entire topics table an order of magnitude slower because right now there are no variable-size fields in it, which is a very, very good thing.
The entire privacy thing, when based on contacts, would use the secondary table with contacts.

I'm thinking of a structure like this:

wedge_members
  ...
  contacts (1,2,3,4,5)

wedge_contact_lists AUTO_INCREMENT 10
  id_member (id_owner?)
  id_list
  name[1]
  description (?)

wedge_contacts
  id_member
  id_list
  (possibly store the list's id_owner as well, not mandatory)

So when we check for a topic's privacy validity, we just retrieve its privacy setting, if <10 (for instance) it's a special setting like 'guests' or 'members' or 'just me' or 'just me + mods' or anything else we can think of, if >=10 it's a contact list, so we INNER JOIN wedge_contacts AS c ON t.privacy = c.id_list AND c.id_member = {int:myself}, or something like that...

Well, that's the basic idea.
Quote
If it's kept as a simple number, it's possible to solve a touch more efficiently, because what you can then do is figure out who the users are who have the current user as a friend, and turn it into (where topic starter = me OR (privacy = friends AND topic starter IN (list of people who friended me)).
As I see it, it's (WHERE i'm_admin OR topic starter = me OR (privacy >= 10 AND me IN (SELECT id_member FROM wedge_contacts WHERE privacy = id_list))
Does that make sense...?
Quote
The one thing to realise about topic privacy, and this is quite important: it is going to suck compared to board privacy. It's unavoidable, because there's no way to do it in a way that adds extra conditions that can be evaluated without ORs (except in the just me or everyone cases) - ORs are bad for performance because they're an extra branch and often virtually a sub-query in their own right.
Well, it's always been a complicated query at noisen --- check out the diff file I sent you last year, and search for 'query_see_topic' or something... At one point you'll see it defined. It's quite startling. I don't even know HOW exactly it's not KILLING performance, this one... :lol:

I'm not making advances when it comes to the select box, BTW... I'm still unsure where to start from!
 1. If a generic list like Friends, Family etc, store {friends}, {family} and use $txt[trim($name, '{}')] or whatever at display time.

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #7, on November 30th, 2011, 12:26 PM »
Quote
Hmm.. Okay, okay... But I don't see this being useful in Thoughts, for instance...
Not so important for Thoughts, but it is important for topic privacy.
Quote
It was kind of rhetorical. Noisen has asynchronous contacts, plus the ability to hide selective contacts from viewers. That one will be in Wedge too, once I get to implementing contact lists...
It was, yes, which is why I didn't really get into it. Facebook is now sort of asynchronous, Google is asynchronous by design and creates multiple lists for you to be asynchronous with. But I think that might be a bit too complex for what's needed in Wedge.
Quote
Oh... I see what you're talking about -- selecting several contact lists for viewers.
Not even that. If it's a simple number, you can build it virtually into the main query so you only have to have an extra query to find out who has the current user as a friend. Doing anything extra requires another query on top of *that*.
Quote
For instance -- if you have a family-only post, you select your family. If you have a work-only post, you select your co-workers. If you have a friends-only post, you select your friends. Among which can be some of your family and co-worker list members, of course. The point is having the ability to put some people into multiple lists. Then, when a list is modified, the 'buddy_list' field in {db_prefix}members is updated to reflect the entire list of contacts.
If buddy_list is a single list of comma-separated users, it's queryable without having to do an extra query. It'll be slow, but it'll be doable. If it's *anything* else, we'll have to query it independently, decode it (I'm assuming unserialize), then build a query based on that. It's going to suck in performance terms.
Quote
Although I'm not sure we'll be using that field much in the future... But IIRC there are reasons to leave it in.
There are, but they're limited.
Quote
If you start setting privacy settings on everything in your profile for instance, it'll be a disaster if you have to set multiple contact lists in each. I think it's much smarter to encourage people to put their 'safe' friends into a special list, and give that list all permissions, and deny the rest to anyone else -- guests, members and contacts that aren't in the safe list.
It's still multiple lists to manage, and I just don't think that's entirely necessary. On a social network like Facebook or Google, where you're inherently sharing information that may be suitable for some but not all people, it's important to have that granularity. On a forum, it just isn't necessary. (Interestingly, LiveJournal makes this possible on a very granular level, you can create custom filters which works basically as discussed here, but they're created in such a way that it's not going to be that complicated... since every post is automatically put into a filter of sorts)
Quote
So when we check for a topic's privacy validity, we just retrieve its privacy setting, if <10 (for instance) it's a special setting like 'guests' or 'members' or 'just me' or 'just me + mods' or anything else we can think of, if >=10 it's a contact list, so we INNER JOIN wedge_contacts AS c ON t.privacy = c.id_list AND c.id_member = {int:myself}, or something like that...
Don't inner join. I get where you're going but inner join is a bad place to be. All of the queries (or at least, all the *important* queries that rely on topic visibility; there are many but the important ones like topic display for example) rely on having only one result returned, and inner join will generate multiple rows in the result.
Quote
As I see it, it's (WHERE i'm_admin OR topic starter = me OR (privacy >= 10 AND me IN (SELECT id_member FROM wedge_contacts WHERE privacy = id_list))
Does that make sense...?
Yes, and that's the way to do it. It's still going to hurt but probably hurt a bit less. Note that if it's an admin, we can safely not bother with this and define query_see_topic as 1=1 to avoid the whole fandango.
Quote
I don't even know HOW exactly it's not KILLING performance, this one...
It's hurting but you probably wouldn't notice it until you got to really huge boards with many many many topics.
Quote
I'm not making advances when it comes to the select box, BTW... I'm still unsure where to start from!
You know you're going to end up designing your own in the end...

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #8, on November 30th, 2011, 04:15 PM »
Quote from Arantor on November 30th, 2011, 12:26 PM
Not so important for Thoughts, but it is important for topic privacy.
So we need to establish a list of privacy IDs and their corresponding meaning...
I'm starting with the basics:
1- everyone
2- members
3- just me (author & admins)

This is what will be in the code for now.
I'm turning oThought's privacy array into an object so we can easily manipulate item position within the select box.
Dunno if 'everyone' should be set to zero, though...
Quote
It was, yes, which is why I didn't really get into it. Facebook is now sort of asynchronous, Google is asynchronous by design and creates multiple lists for you to be asynchronous with. But I think that might be a bit too complex for what's needed in Wedge.
I don't think it is. I think that there are many cases where it could be useful. Having this from the beginning will be helpful.

Here's what I've come up with for now... Do tell me if it seems reasonable.

Code: [Select]
#
# Table structure for table `contact_lists`
#

CREATE TABLE {$db_prefix}contact_lists (
  id_list mediumint(8) NOT NULL DEFAULT 0,
  id_owner mediumint(8) NOT NULL DEFAULT 0,
  PRIMARY KEY (id_list),
  KEY member (id_owner)
) AUTO_INCREMENT=10 ENGINE=MyISAM;

#
# Table structure for table `contacts`
#

CREATE TABLE {$db_prefix}contacts (
  id_member mediumint(8) NOT NULL DEFAULT 0,
  id_list mediumint(8) NOT NULL DEFAULT 0,
  is_synchronous tinyint(1) unsigned NOT NULL DEFAULT 0,
  position tinyint(4) NOT NULL DEFAULT 0,
  updated int(11) NOT NULL DEFAULT 0,
  hidden tinyint(1) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (id_member, id_list)
) ENGINE=MyISAM;

is_synchronous could be a boolean set to true if we find out that the target member also has you in their list(s). It's currently used in your contact lists, where they're separated by sync status. Also, 'position' is the position inside the list, which you can manually modify. 'updated' is the last updated date for the contact list -- although we could have both a created and updated field... Or none at all. 'hidden', as I mentioned before, prevents anyone (but the list owner) to see the name show up in the contact list.

Now we'll need to update the Import tool to actually convert buddy lists to contact lists (and automatically create a default list for every user that has at least one buddy.) I don't know if it's best to do it from the importer tool, or from within Wedge if the table is empty etc... I'd say the importer.

Thorsten, are you reading this? :P
Quote
If buddy_list is a single list of comma-separated users, it's queryable without having to do an extra query. It'll be slow, but it'll be doable.
Slower than its equivalent subselect with a secondary table?
The problem with subselects, is that they often (always??) require a table scan to complete, even if done on the proper index...
This is something that doesn't happen with INNER JOINs.
Quote
If it's *anything* else, we'll have to query it independently, decode it (I'm assuming unserialize), then build a query based on that. It's going to suck in performance terms.
The only thing we can/may/should/will/shall/would/whatever store in the data field of the member table is the list of contact lists you have. $contact_lists = unserialize($member['data']['contacts']) or something. Would need to be done on every page load (to get the list of friend groups for thought privacy), and other uses (such as users viewing a profile etc) can be done through a quick sql query.
Quote
It's still multiple lists to manage, and I just don't think that's entirely necessary. On a social network like Facebook or Google, where you're inherently sharing information that may be suitable for some but not all people, it's important to have that granularity. On a forum, it just isn't necessary.
So... You're suggesting no lists at all? Just plain contacts...?
I don't know about that. I think contact lists would have more pros and cons. And no one is forced to create multiple lists... It's just good to have them.

Heck... Either lists (id >= 10) or 'all contacts' is easily doable in a subselect. We either select id_members who are associated to our stored id_list (>= 10), or id_members who are associated with the id_member owner of the list. In which case it'd be best to store the list owner's id in the contacts table as well, to save time. Or just do a find_in_set on their buddy_list of course... But buddy lists are limited in size, unlike the contacts table.
Quote
Don't inner join. I get where you're going but inner join is a bad place to be. All of the queries (or at least, all the *important* queries that rely on topic visibility; there are many but the important ones like topic display for example) rely on having only one result returned, and inner join will generate multiple rows in the result.
I know that, but it'll only happen if the user is in several contact lists of the list owner. And we can limit results to LIMIT 1, etc...
Also, as I said above, from my experience, subselects don't use indexes. If you could help here, because you're the mysql specialist out of us both, it'd be nice to be able to use subselects, if only because it'd make life a hell of a lot easier when using {query_see_topic} in the code I'll eventually import from the Noisen diff...
Quote
It's hurting but you probably wouldn't notice it until you got to really huge boards with many many many topics.
My idea was to have such boards rely on Wedge...
Then again, it may never happen at all.
The only performance bottleneck I've been told so far, is the random list of items in the media homepage. That's because it retrieves all entries, randomizes them, and returns the first few. I still have an entry in my to-do-list to add a pseudo-randomizer variable in each item...
Quote
Quote
I'm not making advances when it comes to the select box, BTW... I'm still unsure where to start from!
You know you're going to end up designing your own in the end...
I'm hoping not.

I should be getting ready to analyze the code for each plugin, and determine whether I can 'merge' at least two of them to get the 'best of both worlds' (or more.)
The last selectbox I suggested has issues in my iPod, as I already said. The keyboard problem is gone for now (now it automatically disappears a second after clicking...), but for instance, if you have a multi-select box, every time you click something in Chosen, the list closes and you have to reopen it... A 'regular' selectbox object will actually stay opened to let me select other options. Regular wins when it comes to accessibility...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #9, on November 30th, 2011, 04:32 PM »
Quote
So we need to establish a list of privacy IDs and their corresponding meaning...
I'm starting with the basics:
1- everyone
2- members
3- just me (author & admins)
Personally I'd rather have 0 = everyone, 1 = me, 2 = members (since 0 -> nothing to limit, 1 -> I'm the only 1) but I'm not particularly fussed.
Quote
Now we'll need to update the Import tool to actually convert buddy lists to contact lists (and automatically create a default list for every user that has at least one buddy.) I don't know if it's best to do it from the importer tool, or from within Wedge if the table is empty etc... I'd say the importer.
The table structure seems straightforward enough to me. As for when to do it, it should be done from the importer.
Quote
Slower than its equivalent subselect with a secondary table?
Oh hell yes. FIND_IN_SET is bad for a reason. Like the fact you cannot under any circumstances make a usable index on it.
Quote
The problem with subselects, is that they often (always??) require a table scan to complete, even if done on the proper index...
This is something that doesn't happen with INNER JOINs.
You're right that it will cause a table scan, but only if the subquery is used IN () directly (this is not something I've done very often) but it's interesting to note that 3 years ago it was flagged as being solved in MySQL 6 as per http://bugs.mysql.com/bug.php?id=18826

But you need to be careful. INNER JOIN may be faster but you then have to process the results of a result-table that now has multiple rows, potentially many many rows you didn't want in the first place.

It's one of the reasons the board index query is fucked up, because it inner-joins the moderators table to the list of boards, so if you have 100 boards, each with 2 moderators, you get 200 rows back of which most of it is duplicated.
Quote
The only thing we can/may/should/will/shall/would/whatever store in the data field of the member table is the list of contact lists you have. $contact_lists = unserialize($member['data']['contacts']) or something. Would need to be done on every page load (to get the list of friend groups for thought privacy), and other uses (such as users viewing a profile etc) can be done through a quick sql query.
If the list's owner is stored in the table of contacts, why does it even need to be in the members table at all? Index the owner and you're golden.
Quote
So... You're suggesting no lists at all? Just plain contacts...?
I don't know about that. I think contact lists would have more pros and cons. And no one is forced to create multiple lists... It's just good to have them.
Personally I just don't see the point. I can't think that many people are going to create topics that are visible to only a subset of a subset of friends. Then again, I know it happens on LiveJournal which does make it a viable target for us (blogging context), I guess.
Quote
Heck... Either lists (id >= 10) or 'all contacts' is easily doable in a subselect. We either select id_members who are associated to our stored id_list (>= 10), or id_members who are associated with the id_member owner of the list. In which case it'd be best to store the list owner's id in the contacts table as well, to save time. Or just do a find_in_set on their buddy_list of course... But buddy lists are limited in size, unlike the contacts table.
FIND_IN_SET is the devil.
Quote
Also, as I said above, from my experience, subselects don't use indexes. If you could help here, because you're the mysql specialist out of us both, it'd be nice to be able to use subselects, if only because it'd make life a hell of a lot easier when using {query_see_topic} in the code I'll eventually import from the Noisen diff...
See above. They only don't if they're plugged into an IN () clause, not if they're other types of subselect.
Quote
My idea was to have such boards rely on Wedge...
Then again, it may never happen at all.
The only performance bottleneck I've been told so far, is the random list of items in the media homepage. That's because it retrieves all entries, randomizes them, and returns the first few. I still have an entry in my to-do-list to add a pseudo-randomizer variable in each item...
Not what I meant. I didn't mean *forum*, I specifically meant *board* as in a board within a site.

Right now, access is controlled at board level. Once you enter a board, you don't have any incremental performance concerns about access rights. A board with 1 topic has the same overhead as a board with 1m topics in it as far as assessing access to that board goes.

If you have topic-specific granularity, you have to do more work to assess it, specifically there's an extra overhead on the board index, message index, display, attachments... anywhere that has to assess topic access, which is more complex than board access (because you have to implicitly do both, though you can do it so that board access is evaluated first and if that's not going to let them in, you can skip the topic access checks)

Consequently it must slow things down compared to a base SMF install, but you probably wouldn't notice it on Noisen until you got to boards that had many many many topics, because the one-off cases like attachments or topics themselves, the incremental cost is not significantly higher, it's for the cases where you're assessing a lot at once (like board index, message index)
Quote
I'm hoping not.
Maybe not, but none of the ones you've seen thus far are ideal, so you're going to end up Frankensteining two or more together, and then putting your own spin on it anyway...

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #10, on December 1st, 2011, 12:06 AM »
Quote from Arantor on November 30th, 2011, 04:32 PM
Quote
So we need to establish a list of privacy IDs and their corresponding meaning...
I'm starting with the basics:
1- everyone
2- members
3- just me (author & admins)
Personally I'd rather have 0 = everyone, 1 = me, 2 = members (since 0 -> nothing to limit, 1 -> I'm the only 1) but I'm not particularly fussed.
I set it to 1 because we might have a 'bigger' one -- meaning 'everyone and every means possible to have this spread'. i.e., if you have a Twitter or Facebook account set up to retransmit thoughts or posts, do it.

Other possible privacy values..?
"18 years old or older"
"13 years old or older"
Etc...
I'd say it's possibly better than having a 'mature' flag on posts.
Quote
The table structure seems straightforward enough to me.
I was thinking maybe add a 'privacy' field for contact_lists as well... Although granularity for this would be hell -- what if I want my contact list to be able to browse the list they're in? Or if I don't want to? Shall I be able to select multiple viewing groups...?
Quote
As for when to do it, it should be done from the importer.
Because it can just as easily be done from Wedge... Like, we test empty($modSettings['contacts_created']), and if empty, we do the creation. And we set the var.
Quote
Quote
Slower than its equivalent subselect with a secondary table?
Oh hell yes. FIND_IN_SET is bad for a reason. Like the fact you cannot under any circumstances make a usable index on it.
But as discussed later, IN (subselect) doesn't, either...
Quote
You're right that it will cause a table scan, but only if the subquery is used IN () directly (this is not something I've done very often) but it's interesting to note that 3 years ago it was flagged as being solved in MySQL 6 as per http://bugs.mysql.com/bug.php?id=18826
Uh. v6 is a long time from now...
Is this just in an IN () situation?
I mean, could it work with something like SELECT t.id_member WHERE (SELECT TRUE FROM wedge_other AS o WHERE t.id_member = o.id_other).....? (Just off the top of my head.)
Quote
But you need to be careful. INNER JOIN may be faster but you then have to process the results of a result-table that now has multiple rows, potentially many many rows you didn't want in the first place.
I'm not sure... If there's an IN (), we'll also be getting multiple entries just the same...?

Anyway -- query_see_topic does an inner join, and it's fucking ugly because it makes inserting the query_see_topic variable more compatibled than, say, query_see_board. Having it use an IN() would make it much more elegant.
Quote
It's one of the reasons the board index query is fucked up, because it inner-joins the moderators table to the list of boards, so if you have 100 boards, each with 2 moderators, you get 200 rows back of which most of it is duplicated.
And that needs to change...
Quote
If the list's owner is stored in the table of contacts, why does it even need to be in the members table at all? Index the owner and you're golden.
We can always modify all "FIND_IN_SET({int:me}, m.buddy_list)" calls to use "{int:me} IN (SELECT id_member FROM wedge_contacts WHERE (possibly id_member = {int:me} AND) id_owner = {int:target_user}", but it requires adding id_owner to the contact_lists table...
Quote
Personally I just don't see the point. I can't think that many people are going to create topics that are visible to only a subset of a subset of friends.
The ability to create a blog for your professional friends... And another for your drinking buddies. (Same goes for topics, although less important.)

If anyone is reading this -- please tell us whether you think that it would be nice to be able to create contact lists (i.e. friends, family, work...) and whether you'd use the feature to fine-tune your topic/board privacy settings, or you just wouldn't bother yourself?
Quote
Then again, I know it happens on LiveJournal which does make it a viable target for us (blogging context), I guess.
LJ is definitely not 'the' popular blog platform these days, but they still have got 'something'. They're also the ones who have rotating avatars, which I like... (Although not THAT much, eh.)
Quote
Consequently it must slow things down compared to a base SMF install,
(We could also offer to disable topic privacy settings...)

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #11, on December 1st, 2011, 12:26 AM »
Quote
Other possible privacy values..?
"18 years old or older"
"13 years old or older"
Etc...
I'd say it's possibly better than having a 'mature' flag on posts.
This seems like it's potentially *very* complicated and something I'm not entirely sure I want to get into, to be honest.
Quote
I was thinking maybe add a 'privacy' field for contact_lists as well... Although granularity for this would be hell -- what if I want my contact list to be able to browse the list they're in? Or if I don't want to? Shall I be able to select multiple viewing groups...?
This sounds to me like overthinking for SCIENCE!
Quote
Uh. v6 is a long time from now...
Is this just in an IN () situation?
I mean, could it work with something like SELECT t.id_member WHERE (SELECT TRUE FROM wedge_other AS o WHERE t.id_member = o.id_other).....? (Just off the top of my head.)
But the bug report was 3 *years* ago. I have no idea whether that's since been backported to 5.5 or 5.6 or 5.WTF any more. I think we need to try it sometime.

I believe it can be made to work like that but I'm not sure, I don't think in subselects very often.
Quote
I'm not sure... If there's an IN (), we'll also be getting multiple entries just the same...?

Anyway -- query_see_topic does an inner join, and it's fucking ugly because it makes inserting the query_see_topic variable more compatibled than, say, query_see_board. Having it use an IN() would make it much more elegant.
It sort of depends, really. There are times it will, times it won't. It also depends on whether DISTINCT is present or not, which would solve the multi-row return case regardless of joins or in clauses.
Quote
The ability to create a blog for your professional friends... And another for your drinking buddies. (Same goes for topics, although less important.)
Hmmm. Part of me thinks that's a wonderful idea, part of me thinks it's unnecessarily complicated. I'm not entirely convinced that people would use it that way. However I know it works for LiveJournal to do something like that, so there is that little niggling bit of my gut that says we should.
Quote
LJ is definitely not 'the' popular blog platform these days, but they still have got 'something'. They're also the ones who have rotating avatars, which I like... (Although not THAT much, eh.)
They're the only popular hosted blog platform I know of that still functions with a community aspect to it. Sure, WP.org is popular for blogging as is Blogger, but they distinctly lack the community factor - I know several people with LJ accounts who regularly refer to each other and talk amongst each other. (To a degree this is how I came to understand how its privacy filters worked)
Quote
(We could also offer to disable topic privacy settings...)
In the end, even as long as there is one line of code for it, it is going to be slower than SMF. We can mitigate it (and given other things will be altered, it may even out in the end) but it must make a difference.


If anyone is reading this -- please tell us whether you think that it would be nice to be able to create contact lists (i.e. friends, family, work...) and whether you'd use the feature to fine-tune your topic/board privacy settings, or you just wouldn't bother yourself?

(New topic, perhaps?)

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #12, on December 1st, 2011, 10:10 AM »
Okay, I tested all of these:

SELECT * FROM _messages AS m WHERE id_msg IN (SELECT n.id_msg FROM _messages AS n WHERE n.id_msg < 100000);

SELECT * FROM _messages AS m WHERE (SELECT 1=1 FROM _messages AS n WHERE m.id_msg = n.id_msg AND n.id_msg < 100000);

SELECT * FROM _messages AS m INNER JOIN _messages AS n ON n.id_msg = m.id_msg AND n.id_msg < 100000;

They all returned 685 entries on Noisen, so that's the exact same request being done. NOTE, though, that because both tables are the same, MySQL may be applying specific optimizations that wouldn't take place if using two different tables. You may want to rewrite the queries with two different tables and tell me your results.

The first query does a full table scan and returns in X milliseconds. The second query does the exact same full table scan, and returns in X milliseconds (sometimes a tad more but they're all very variable). Finally, the last query does an index search, and returns in X*2 milliseconds. It's also the most 'stable' of all queries, but it's stable in that it's always slower than even the slowest return time for the first two queries...
As a reminder, the messages table on noisen has 30k+ entries. I could do a test on Cynarhum.com (220k+ entries), but I suspect I'll get similar results overall...

Talk about clusterfuck!
Quote from Arantor on December 1st, 2011, 12:26 AM
This seems like it's potentially *very* complicated and something I'm not entirely sure I want to get into, to be honest.
Well, LJ has it, Blogger has it, etc... (At least the 'mature' flag.) What I don't like is that they always ask you to confirm your age, even if you're logged in. Meh. There's a news blog on LJ about Kaamelott, which I read from time to time, it has no 'mature' content at all (it's just an Arthurian legend show after all...), but I still have to confirm my age every time.
Quote
This sounds to me like overthinking for SCIENCE!
But if we start thinking this way, then we might as well drop the concept of topic privacy entirely...?
Quote
But the bug report was 3 *years* ago. I have no idea whether that's since been backported to 5.5 or 5.6 or 5.WTF any more. I think we need to try it sometime.
Nope, not backported. From what I gather in the link you posted, subqueries are implemented in 5.x in a way that they can't use an index, and it works in 6.x not because they fixed that bug, but because they rewrote their subquery code.
Quote
I believe it can be made to work like that but I'm not sure, I don't think in subselects very often.
I didn't, until we dropped support for MySQL 4.0... And, turns out, I always hated doing inner joins...
Quote
It sort of depends, really. There are times it will, times it won't. It also depends on whether DISTINCT is present or not, which would solve the multi-row return case regardless of joins or in clauses.
My girlfriend suggested I use DISTINCT for these no later than last night. Funny ;) (Even funnier that our work fields sometime intersect...)
I'm currently helping her set up a SOAP client in PHP for a WSDL app at Oracle. Another clusterfuck, BTW... We have no idea what functions we're supposed to call (and AFAIK there's no way to request a list of available methods once the custom object is created), how we're supposed to identify, etc... Thank you Oracle for zero documentation. Plus it doesn't help that neither her or I have any prior experience with SOAP... :-/
I don't know why I'm mentioning that... Maybe there's a SOAP specialist around here :P
Quote
Quote
The ability to create a blog for your professional friends... And another for your drinking buddies. (Same goes for topics, although less important.)
Hmmm. Part of me thinks that's a wonderful idea,
Well, that's already what I'm doing at Noisen and I know that some of my most faithful users have been using it... I can quote at least one blog that is reserved to the Friends list of a member.
Quote
They're the only popular hosted blog platform I know of that still functions with a community aspect to it.
Ah yes, you just reminded me why I'm looking up to them... LJ is the only blogging platform that actually encourages communication between blog authors, rather than having parallel blogs with their own comment authors and such. It's like LJ is a huge forum with boards set up as blogs, just like on Noisen. Back when I created Noisen (2007-2008), LJ was the only similar example, and I actually used them as an example of *why* it would eventually work. Noisen was pretty much supposed to be the 'French LJ'... Too bad it never really got momentum. I'm not very good at advertising my work. I prefer development work.
Quote
In the end, even as long as there is one line of code for it, it is going to be slower than SMF.
Well, if topic privacy is disabled, we're just not going to empty out the query_see_topic variable, so it won't be any slower...?
Quote
We can mitigate it (and given other things will be altered, it may even out in the end) but it must make a difference.
On bigger boards, at least.
Because that's the thing here... When we work with normal-sized boards, there's no such thing as a slow query. Start adding bot posters or scraping or simply have a hugely successful site, and you get your first performance issues.
Quote
If anyone is reading this -- please tell us whether you think that it would be nice to be able to create contact lists (i.e. friends, family, work...) and whether you'd use the feature to fine-tune your topic/board privacy settings, or you just wouldn't bother yourself?

(New topic, perhaps?)
Done...

Arantor

  • As powerful as possible, as complex as necessary.
  • Posts: 14,278
Re: Privacy options
« Reply #13, on December 1st, 2011, 10:33 AM »
Quote
The first query does a full table scan and returns in X milliseconds. The second query does the exact same full table scan, and returns in X milliseconds (sometimes a tad more but they're all very variable). Finally, the last query does an index search, and returns in X*2 milliseconds. It's also the most 'stable' of all queries, but it's stable in that it's always slower than even the slowest return time for the first two queries...
I'd love to see the results of EXPLAINs on those queries.
Quote
But if we start thinking this way, then we might as well drop the concept of topic privacy entirely...?
It's a tough call. How much is too much?
Quote
Nope, not backported. From what I gather in the link you posted, subqueries are implemented in 5.x in a way that they can't use an index, and it works in 6.x not because they fixed that bug, but because they rewrote their subquery code.
I thought 5.5 was actually a bastardisation of the 6.x branch anyway.
Quote
I didn't, until we dropped support for MySQL 4.0... And, turns out, I always hated doing inner joins...
Eh, I still don't.
Quote
My girlfriend suggested I use DISTINCT for these no later than last night.
Heh, can't say I'm surprised.
Quote
I'm currently helping her set up a SOAP client in PHP for a WSDL app at Oracle. Another clusterfuck, BTW... We have no idea what functions we're supposed to call (and AFAIK there's no way to request a list of available methods once the custom object is created), how we're supposed to identify, etc... Thank you Oracle for zero documentation. Plus it doesn't help that neither her or I have any prior experience with SOAP...
I don't know why I'm mentioning that... Maybe there's a SOAP specialist around here
SOAP is... evil, and I'm not just saying that as a smelly hippie code hacker :P But if it's involving WSDL... WSDL is a language that indicates what services exist at a given URL, and what inputs are expected and what outputs will be given. It's sort of like XML-RPC but more convoluted IMO. (Yes, I've done SOAP work. It's not that exciting, but it should be manageable. It really depends whether you're doing it all by hand in PHP or using something like the Zend_Soap components.)
Quote
Ah yes, you just reminded me why I'm looking up to them... LJ is the only blogging platform that actually encourages communication between blog authors, rather than having parallel blogs with their own comment authors and such. It's like LJ is a huge forum with boards set up as blogs, just like on Noisen. Back when I created Noisen (2007-2008), LJ was the only similar example, and I actually used them as an example of *why* it would eventually work. Noisen was pretty much supposed to be the 'French LJ'... Too bad it never really got momentum. I'm not very good at advertising my work. I prefer development work.
*nods* And given that LJ is really not working out so well at the moment (it's not been the same since it was sold off a bit back), maybe we should be pushing that harder.
Quote
Because that's the thing here... When we work with normal-sized boards, there's no such thing as a slow query. Start adding bot posters or scraping or simply have a hugely successful site, and you get your first performance issues.
Yup, but we can still try and optimise as best possible for those cases.

Nao

  • Dadman with a boy
  • Posts: 16,082
Re: Privacy options
« Reply #14, on December 1st, 2011, 01:25 PM »
Quote from Arantor on December 1st, 2011, 10:33 AM
I'd love to see the results of EXPLAINs on those queries.
You have the queries, you can do it yourself... :P
I should have copied the explains but they were nothing special -- just had the number of processed rows over 30k in the first two queries, and at 500+ in the last.
Quote
Quote
But if we start thinking this way, then we might as well drop the concept of topic privacy entirely...?
It's a tough call. How much is too much?
I don't know... I just know that, to me, Noisen's main point was to give a larger amount of freedom to blog authors.
Basically, if you tell bloggers that they have to create a new blog for their private friends, they may say "okay", or they might just turn away and leave. I don't know, I just implemented stuff that people asked of me back in the day... And it seemed quite logical to do it.
The only issue with topic privacy, to me, is not really about performance, but mostly about not ever forgetting to add query_see_topic in every single topic query I'll have. That was a PITA, but I'm ready to do it again... (Especially since I now have the diff for that... And I've already decided that I'll mark every {db_prefix}topics as {db_prefix}topics_done or something so that once everything from the diff is implemented, I can simply look for {db_prefix} to find any remaining offenders...)
Quote
I thought 5.5 was actually a bastardisation of the 6.x branch anyway.
I don't know anything about 5.5. And it's not like it's widely used, either...
Quote
SOAP is... evil, and I'm not just saying that as a smelly hippie code hacker :P But if it's involving WSDL... WSDL is a language that indicates what services exist at a given URL, and what inputs are expected and what outputs will be given. It's sort of like XML-RPC but more convoluted IMO. (Yes, I've done SOAP work. It's not that exciting, but it should be manageable. It really depends whether you're doing it all by hand in PHP or using something like the Zend_Soap components.)
Hmm... We tried with SoapClient(), so the basic PHP stuff, and also with nusoap_client(), a downloadable library written in PHP. Both failed to work, though. Then we tried with a sample wsdl file on another site, and it worked, so it's probably down to the authentification process or something, because the Soap client object definitely didn't have the method we were trying to call. The name of the method was retrieved from within the wsdl file, it ended with "Response" in the reply, so that's what Milady figured it was. (The method to call.)
I don't know, if you're up for it and if you think you can help, would you be willing to help her by e-mail...? We already spent two evenings on the little fucker... (The first one was devoted to installing a virtual host on her PC, which didn't work until I realized the error code was 403 and I'd actually forgotten to add +Indexes to the httpd conf file... Oops. But she was like, "wow", when I fixed it. I liked that. :P)
Quote
*nods* And given that LJ is really not working out so well at the moment (it's not been the same since it was sold off a bit back), maybe we should be pushing that harder.
Possibly, yes.
Quote
Yup, but we can still try and optimise as best possible for those cases.
It's certainly something I'd like. But my knowledge of SQL optimizations is nothing compared to yours.
BTW, did you ever check out the SQL queries for the thought system? I'm always afraid of forgetting something in them... Most of the stuff is in Ajax.php where I nearly committed something that would have allowed anyone to modify anyone else's thoughts just by editing their HTML source... Oops. That was scary.