Google Privacy Issues
I was reading this forbes article and it made me curious about customer habit datamining. Customer habit datamining is of course the process of trying to identify what sort of customer a person is by there buying/reading/viewing habits. Specifically to determine what would be best products/services/information to provide for the customer. Amazon of course does it’s bayesian based book suggestions. They maximize the probability of next item to purchase from what you have purchased and viewed already with what everyone else using amazon has done. Tivo of course does the same thing. Google even does it for it’s adwords by maximizing the advertising market value of a keyword against the page it is on.
The concept of Google doing this type of targeted marketing across all of it’s services seems to terrify people however. Cries of 1984, Big Brother, and the ensuing multitude of corporate dystopian futures ring throughout the air. As well they should. That data is our informational identity in the modern world. A ghost psyche if you will. It falls back on the dangers inheritant in people knowing your true name in cultures that believed in some types of magic, of voodoo dolls, and photographs that steal a persons soul. A modern cause for an ancient superstition. Once aquired by someone it could be bought, sold, traded, used to impersonate, advertise to, convict, and a whole host of other worries and fears, both rational and superstitious.
Yet at the same time many of the potential services available from something else being able to predict your next action are useful. People around the world depend on eachother for many of these predictions. From common interests in books, topics for study, potential friends, and even significant others, people use these predictions from others all of the time. Really it’s a question of trust, of intended use, and how much privacy you personally want.
I’m really rather curious how much information can be datamined if you only have temporary sessions of personal information. Imagine if your time online were split into maybe 1 hour blocks of habits, but was then stripped of identifying marks like ip addresses, logins, passwords, other sites cookies, and the like. I wonder how much would still be possible to extract. How quickly you could classify a person in order to make meaningful suggestions, but still obey their want for privacy. Imagine if every person online using google had a temporary cookie each hour assisting their journey across the web. Imagine if people could set how long google could remember you to scale how identifiable you were against how how much assistance you wanted? It would certainly be an interesting research topic for clustering and classification. Could you identify what type of person a user was in 20 minutes?
Anyhow, it’s late, and my own personal predictions about myself is that I will soon fall asleep at the keyboard, assuming it isn’t apparent I have already done so by the more disorganized sections of this essay.
