BiG DaTa & Vectorization
Posted by datumengineering on May 14, 2012
It has been while when Big data entered into the market and buzz the analytics world. Now a day all analytics leaders are chanting about Big data applications. Since I have started with Hadoop technologies and with Machine learning one question has been bugging in mind:
Which is a greater innovation Big Data Or Machine Learning & Vectorization?
When it comes to analytics Vectorization and machine learning more innovative. Wait a minute, I don’t want to be biased and I am not concluding here. But, i would like to showcase more on the direction when we take out data for the analytics world. We have structured data, we have enterprise data, we have data which is still measurable and suffice analytical and advanced analytical need. But how many of Business analytics use it smartly to do predictions, How many have applied different statistical algorithm to be benefited from this data ? How many times available data has been utilized to its potential ? I guess, only 20% cases. When we are still not up to the utilization of structured, measurable data then why we are so much behind the unstructured and monster data. In fact this big data need more work than enterprise data. I don’t advocate to go to saturation first and then think of innovation or out of the box, NO. My emphasis more on the best utilization of existing enterprise data and keep the innovation alive by experimenting the possible options to explore the data which is unexplored or unfeasible through conservative technologies. Innovation doesn’t mean keep thinking and just doing new things. Innovation is more meaningful when you do something meaningful to the world which other people acknowledges but they says “Not feasible”. I am not in favor of anyone here. I am coming from the world where I see data processing challenges, when I see data storage challenges, when I see data aggregation challenges, when I see lot of challenges during sorting and searching. There I would look at Hadoop related technologies. The way Hive provides query processing power, HBase provides data storage and manipulation power is indeed way beyond the other RDBMS. Their power of MAP REDUCE is exemplary. But all these Big data technologies should enter into the enterprise which is already mature enough in the analytics world by fully utilization of its enterprise data at length. If Hadoop itself claim that I am not a replacement of your current enterprise datawarehouse then why you shouldn’t first fully grind the existing EDW data and then look at Hadoop opportunities to give an edge to your enterprise competency.