Go back to the directory of Risks messages
Date: Wed, 17 Aug 1994 18:43:56 -0700
From: Phil Agre <pagre@ucsd.edu>
To: risks@csl.sri.com
Subject: data mining

The 8/16/94 Wall Street Journal includes an article about the use of "data
mining" by companies with large amounts of customer transaction data.  The
full reference is:

  Laurie Hays, Using computers to divine who might buy a gas grill, Wall
  Street Journal, 16 August 1994, pages B1, B4.

The prototype of this kind of processing is American Express, which is
reported to have half a terabyte of information about its customers' 
charges on its 35 million cards.  The idea is to find patterns that predict
what product categories individual purchasers might be interested in, and 
then send them advertisements accordingly.  This kind of processing is ideal
for massively parallel computers, and many makers of such machines are doing
well.  (Thinking Machines Corporation, unfortunately, is not one of them.  
The same issue of WSJ reports on its imminent bankruptcy filing.  This is
ironic, given that American Express was one of its first important customers
outside the military-government-research world.)  From the numbers in the
article, it sounds like the number of companies using these techniques is in
the hundreds and growing rapidly.

What I find most interesting here is the set of cultural assumptions that
goes into the data searches.  For example, people who recently bought outdoor
patio furniture are pitched gas grills.  A bank analyzed customers' financial
situations and pitched home equity loans to those who were likely to pay them
back.  In each case, a certain amount of social reasoning has gone into the
algorithms, based on assumptions about normal or average behavior and the
connections in people's lives between different categories of commodities.

All of the examples in the story involve companies mining information on their
own customers, though perhaps setting up partnerships with other companies to
fashion offers to them.  No mention is made of mined information being sold.
Maybe that's because people would be too offended to hear of such information
being sold, or maybe it's because it's advisable to keep the practice quiet.
In any case it's hard to believe there's no market for it.

The data-mining techniques are used for other purposes as well, including
fraud detection and inferring bad batches of parts from patterns of warranty
service.  (That one is my favorite.)

Some employees are reported resisting the new technology, though the one 
case the article reports doesn't have specifically to do with data mining,
just with high-tech logistics in mass retailing.  It seems reasonable to 
guess that the resistance comes from deskilling: if restocking decisions are
taken away from local store managers (and this is definitely the trend) then,
other things being equal, the skill levels and thus the salaries of those
store managers tends to go down.

The massive aggregation of data in things like marketing databases and some
proposed intelligent vehicle-highway systems leads to new questions about
privacy.  If any given fact about me is innocuous, or at least innocuous
enough that I'm willing to let a company store it in exchange for credit or
automatic toll collection etc, it doesn't follow that ten thousand facts about
me are equally innocuous.  How to even conceptualize, much less regulate, this
phenomenon of emergent personal information?

In many cases, the problem is actually harder.  If you've just bought patio
furniture then you might want a gas grill: this surmise only requires a few
facts about you.  But heretofore those few facts had a sort of anonymity; they
were lost in the enormous crowd of other, equally arbitrary facts.  But with
massively parallel processing, it becomes possible for numerous individual
facts to take on individual consequences.  It's one thing if your charge slip
for patio furniture (or plastic surgery, or condoms) gets processed and filed
in a huge pile with a million others, but it's quite another thing if someone
pulls out your charge slip and says "hey, this person might want to hear from
a dating service!" or whatever.  Can we even say that someone's privacy has
been invaded here?  We need new words for these things, rather than stretching
the word "privacy" to cover all information-related offenses to human dignity
under the sun.

Phil Agre, UCSD

Go back to the top of the file