Monday, April 7, 2008

Kansas, Memphis, data: value and liability

I ended up watching the Kansas-Memphis game. The first college basketball game I've watched this year. Nevertheless, I made many predictions. Most of which were wrong.

The one prediction I was most confident about was when Kansas made the 3-point shot in the last seconds, tying the game. At that point, it was absolutely clear to me that they would win the game. I had no data, no context, no history, but it didn't matter. All I had to do was look at the faces of the teams, and it was so clear who would perform better in the next 5 minutes. I didn't need historical performance data of any kind.

Companies have so much customer data these days. These data seems valuable, and worth storing, even though their value isn't immediately apparent. But I wonder if we're forgetting that data in isolation might not be marginally that valuable any more, and further, that firms need to understand how to associate a data trail with a conversation and a person before making a business decision. And if they don't, while they might anticipate value from mining the data in the future, perhaps they shouldn't be keeping the data, because it could end up being a liability to them. As Professor Vasant Dhar and I have discussed and written about in the past, firms may well need to rethink their "data valuation" models and strategies.

Just one dimension of a much larger discussion about how firms should manage their customer data.

5 comments:

Raj said...

Speaking of data as a liability, I was following the case of TJ MAXX, who lost 45.7 million credit card records last year through a massive hack attack. According to this article, merchants aren't supposed to store cardholder data after the transaction has been cleared. It seems TJX data goes back to 2003.

http://www.informationweek.com/news/showArticle.jhtml?articleID=197003041

Another interesting article from ft.com today. Speaks about EU privacy group findings that online search companies' safeguards are not sufficient enough to protect user personal data.

http://www.ft.com/cms/s/0/31663062-04fc-11dd-a2f0-000077b07658.html

Anonymous said...

The question really isn't about whether companies should or shouldn't have data - they HAVE to have data to run their business better - its a question of having the RIGHT data.

The example of predicting the Kansas-Mephis game is intriguing, but while historical data was not available at the time of the prediction, there was 'real time' data on hand - namely, (1) Kansas had just made a near impossible 3-point shot, and (2) the game was now tied. This 'real time' data helped you make a more accurate prediction of the game than any amount of historical data could have othwerwise.

This is the crux of the problem that companies need to address. Collecting exponential amounts of data on every customer characteristic amidst the hope of uncovering 'clarity' from these endless bytes isn't the solution. Instead, companies need a sound "data strategy". An understanding of data is NEEDED to run the business, what data is CRITICAL to making sound strategic decisions, and what data can DELIVER value for the business.

The disconnect usually occurs because the teams collecting the data are not those requesting it. A consistent "data strategy" across the across the organization can help to resolve the disconnect, and reduce the risk inherent to collecting customer and transaction data.

Panos Ipeirotis said...

Interesting question. One point of view is to delete anything that is not absolutely necessary to avoid exposure to unnecessary risk. The counterpoint is that storage is cheap, so keep everything.

Having my own biases, I cannot agree that deleting any data can be beneficial. It makes more sense to have a policy to data archival, a process that moves the data offline, potentially encrypted, and keeps the data around, "just in case" the data can be used to build better models.

Anonymous said...

The problem with selecting the right data is that what you consider unnecessary today can turn out to be necessary tomorrow. In other words, the value of data changes over time.
Moreover in order for data mining operations to be as effective as possible, you need to have the broadest set of data. The broader the data set, the more likely a pattern can show up.

Unknown said...

The issue of consumer information privacy and protection vs. use of data for economic gain raises questions of the ethical use and sharing of customer data.

There are no easy answers. Best approach is probably for companies to consider the overall societal benefit as a greater goal. Economic gains in the long-term storage of data should be weighed against the risk exposure. If there are techniques for modeling the intrinsic value of data as an asset I sure would like to know.