This was an idea I had a couple days ago. It immediately comes off as an overloaded mishmash of stereotypical Web 2.0 technologies. But, I searched around and couldn't find any previous implementations. Therefore, I am publishing a brief overview and then a proof of concept plan. Let me know what you guys think?
I am presuming you're familiar with social networking services ala Facebook and MySpace. One way of conceptualizing these are as trustworthy sources of individual profiles. That is, every person on the website they can be represented as three pieces of information:
- A unique identifier.
- A self-reported personality.
- A sequence of reciprocal relationships.
I also presume you're familiar with folksonomies ala "tagging" on Flickr and del.icio.us. These websites provide the feature because their content is hard for a computer to categorize. That is, there is a problem of effective tokenization. However, there is a well known example of what can result when content is easier to parse.
I presume you're familiar with e-mail spam and Bayesian spam filtering. The technique became popular after Paul Graham publicized his "A Plan for Spam". It has since spread to every popular e-mail client as a method to assist people in categorizing and thus avoiding spam. This technique works because text is trivially tokenized. In fact, raw text itself is the canonical example of a token!
Social networking profiles are trivial to categorize in exactly the same way as e-mail. But, categorizing in the boolean sense of spam is not useful. (Fake MySpace profiles excluded.) Instead, profiles can be assigned "tags" and automatically categorized using Bayesian inference. These tags are free-form, and therefore their semantic accuracy will vary depending on their statistical relevance to a profile. Here are some potential tags suggested by my grasp of basic human psychology:
- nice, mean
- fun, boring
- hard working, lazy
- slut, rapist
It is an open question if a computer will be able to accurately predict these tags on uncategorized profiles.
The data your computer uses to categorize can be made anonymous. If shared, it can be used by others to categorize profiles according to your opinions; but, it is very improbable that someone could determine how you tagged other profiles. Therefore, you can share your categorizations freely without the risk of people learning how you tagged them. Those same people could trust your opinions and integrate your categorizations into their own.
This can occur automatically between reciprocal relations. Therefore, as your friends and you categorize people, the statistically relevant categorizes will rapidly spread. Furthermore, upon viewing a new profile, the computer could accurately and quickly categorize them.
It is an open question as to what trust metrics, as applied to categorization data acceptance, will encourage the spread of quality folksonomies.
A proof of concept
I have tried to remain non-explicit in the ramifications of this idea. But, I think the potentials are too thought-provoking to leave the idea to my "Interesting Projects" file. Therefore, I will propose to my AI professor that I be allowed to replace my Prolog assignment with a preliminary implementation of the idea. A Greasemonkey script could be produced for Firefox / Mozilla based web browsers. It would add a simple tagging interface to Facebook.
In order to produce meaningful research, I'd need help from my friends. You would need to install the plugin and start tagging your relations. Does anyone want to help end the short era of social networking? (Please respond on this post, and not in IM or e-mail. If you're reading this through Facebook, click "View original post.")
Bayesian categorization isn't new. But, has it been implemented in a social mode?