This morning I received an email explaining that by deploying this vendor’s Big Data solution, it would let me outperform my business rivals while it goes on to balance my checkbook and wash my car on Sunday.
I get a tons of these unsolicited emails every day as a journalist. I’m not naming names since after reading the correspondence a few more times, the sender’s life seems difficult enough.
Marketing hype aside, this ignorance running through the industry about Big Data and its capabilities really sticks in my crawl.
Let me make this perfectly clear: Big Data is not a panacea for all your data ills.
If someone tells you it is, run! Run hard; run fast and don’t stop until you have as many secured doors between you and the idiot as possible.
Stuffing all of your organization’s terabytes and petabytes of data willy-nilly into a Big Data environment will not solve your problems.
Remember the first rule of programming, Garbage In Garbage Out (GIGO).
If you don’t know what you want achieve at the beginning the process, there’s no technology that will give you what you want at the end. Know what you are looking for at the outset and which data sets will help you find your answer.
Big Data is not a magic box. It is a way for applications to access heterogeneous data formats without needing to go through the time or expense of converting existing data into a common format.
In other words, it protects your existing data’s sunk cost when it comes to storage while making it easier to exploit by other applications.
This promise has a lot of people throwing all their data, as well as a few kitchen sinks, into Big Data projects.
Don’t be one of them. It just will be an expensive and painful lesson in how not to make your goals.
Again, lay out what you want to achieve with your project before deciding which data and applications you’ll need to integrate into it.
One approach in selecting which data sets to integrate first is to view data as a commodity. The most accessed data sets are liquid commodities while infrequently accessed data set are illiquid commodities.
Address the liquid markets first since that is where you can find the immediate returns.
Handling the illiquid data is a bit trickier. There are benefits in accessing illiquid data in Big Data environments like trading illiquid instruments on the market. However, apply the same risk-reward analysis before you make the investment. Will the return from incorporating the illiquid data sets into the Big Data environment justify their integration expense? Could better returns be found elsewhere?
Each organization needs to make its own call.
As with any project, be sure you know what you want to achieve, select the proper tools for the job and you’ll go far.