Datum Engineering !

An engineered artwork to make decisions..

Data analysis drivers

Posted by datumengineering on February 11, 2013

I have been exploring data analysis and modeling techniques since months. There are lots of topics floating around in the space of data analysis like statistical modeling, predictive modeling. There have always been questions in mind which technique to choose? which is preferred way for data analysis? Some articles and lecture highlight machine learning or mathematical model over statistics modeling limitations. They mention mathematical modeling as a next step of accuracy and prediction. This kind of articles create more questions in mind of naive user.

Finally, i would thank to coursera.org for zero down this confusion and stating a clear picture of Data Analysis drivers. Now, things are pretty clear in terms of How to proceed on data analysis? Rather, defining “DATA ANALYSIS DRIVERS”. In one liner the answer is simple “Define a question or problem“. So, all depend upon how you define the problem.

To start with data analysis drivers here are steps in a data analysis

  1. Define the question
  2. Define the ideal data set
  3. Determine what data you can access
  4. Obtain the data
  5. Clean the data
  6. Exploratory data analysis
  7. Statistical prediction/modeling
  8. Interpret results
  9. Challenge results
  10. Synthesize/write up results
  11. Create reproducible code
  • Defining the question means how the business problem has stated and how you proceed on story telling on this  problem. Story telling on the problem will take you to the structuring the solution. So you should be good in story telling on the problem statement.
  • Defining the solution will help you to prepare the data (data set) for the solution.
  • Profile the source to identify what data you can access.
  • Next step is cleansing the data.
  • Now, once the data is cleansed it is either in one of the following standard: txt, csv, xml/html, json and database.
  • Based on the solution need we start building the model. Precisely, the solution will have requirement of Descriptive analysis, Inferential analysis or predictive analysis.

Henceforth, The data set and model may depend on your goal:

  1. Descriptive – a whole population.
  2. Exploratory – a random sample with many variables measured.
  3. Inferential – the right population, randomly sampled.
  4. Predictive – a training and test data set from the same population.
  5. Causal – data from a randomized study.
  6. Mechanistic – data about all components of the system.

From here knowledge on statistics, machine learning and mathematical algorithm works 🙂

Advertisements

6 Responses to “Data analysis drivers”

  1. Swathi said

    nice pitch for novice! 🙂

  2. […] Read more […]

  3. Johnd122 said

    Valuable information. Lucky me I found your site by accident, and I am shocked why this accident did not happened earlier! I bookmarked it. adcegefakkek

  4. Johnk916 said

    Magnificent website. Lots of useful information here. Im sending it to some friends ans also sharing in delicious. And obviously, thanks for your sweat! dfkbfakebedc

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: