Miasma in the House of Bite Me

Sunday, July 21, 2002

This goes into my file as the story that most cracks me up for the day. ROFLOL. Add numbers to bullshit numbers, and you expect NOT to get MORE BULLSHIT? What a hoot. Why not try something logical, like... not cutting off your nose to spite your face in the FIRST PLACE?!

I'm actually serious. This is sort of like Fox News making EVERY breaking story BREAKING NEWS. You get an immediate ratings spike, but eventually, the audience will realize that half of the shit you are calling "Breaking News" is another shark bite.

If you poison your own well, eventually you will have to drink poison. What is so hard to understand about this simple concept? Amazing how hard it is to get it through to American business, whether the issue is environmental, capitalist (killing off one's own customers could be bad for future business), suckering viewers into false adrenalin for "breaking news," or, as we see here, intruding on privacy with such impunity people start to lie and distrust everyone--and then WONDER why people lie?

I been lying on these things religiously for more than 15 years. Given the methods of statistical sampling, where each data item stands for vast numbers of data items, I hope I've managed to fuck up SHITLOADS of data. But I can look ahead to the future with optimism, because there will undoubtably be lots more data to still fuck up.

New York Times - free registration required With False Numbers, Data Crunchers Try to Mine the Truth.
People give false answers to protect their privacy. Then, because the data is so unreliable, companies can't use it to help them run their businesses.

Two I.B.M. researchers have devised software that seeks to get around this information age impasse. Rakesh Agrawal and Ramakrishnan Srikant, computer scientists at the I.B.M. Almaden Research Center in San Jose, Calif., have devised a data-mining program that would cloak individual truthful answers that people might enter once their trust was won but still recover important characteristics of the overall group.

[ ... ]

"Right now, the rate of falsification on Web surveys is extremely high," Dr. Cavoukian said. Conservative estimates are 42 percent, but anecdotally the rates are far higher, she added. "People are lying," she said, "and vendors don't know what is false and accurate, so the information is useless."

Dr. Agrawal said that his way of reconstructing data was based on hiding the true numbers, although not through the sort of lying practiced by ordinary people confronting a questionnaire.

"When people lie randomly -- and that is what they do now when they answer questions -- we get very poor results," he said. But by "adding random values to true values," he said, "we can reconstruct a distribution that is very close to the actual one."
[Privacy Digest]

Bwah-ha-ha-ha-ha! So much data, so little time.

Miasma
4:34:04 PM