big dataData Science / Data Analytics / Business analytics is all about analyzing the data, which is getting generated through multiple sources. Sources range from traditional databases to satellite signals to sensors in Internet of Things, and the list will go endlessly. Easier asked question is, “Where is data not getting generated?” Also the technological advancements are happening at a pace, which will leave us dumbstruck. With these advancements, comes new data, which gets generated relentlessly, for e.g., wearable devices are tracking your heart rate, sleeping pattern (data being generating even while we sleep!), calories consumed, etc.

Analyzing such wide variety of data, which is getting generated at a rapid continuous pace, requires extraordinary reasoning and skills. To cater to these needs, one should have knowledge about 4 important areas of study, which includes Statistical Analysis, Data Mining, Forecasting (Time series) & Data Visualization.

MUST KNOW for Statistical Analysis includes

  • Exploratory Data Analysis because 60% of the project time is spent in exploring data & this is one most important step which even a seasoned data scientist would miss out
  • Hypothesis testing to determine the statistically significant input variable which influence the output variable
  • Regression techniques such as Linear, Logistic, Poisson, Negative Binomial regression to build predictive models
  • Imputation to deal with the missing data including Null values, missing values, NA values, etc.

MUST KNOW for Data Mining Unsupervised Learning includes

  • Clustering / Segmentation techniques such as K-means & Hierarchical clustering which helps in building strategies for specific groups of related things
  • Dimension Reduction techniques such as PCA & SVD to effectively & smoothly manage the huge volumes of data
  • Association Rules/Market Basket Analysis to establish relationship between the various item
  • Recommendation System to recommend the next item which a customer might most likely purchase
  • Network Analysis to identify which person/item is very important within the entire network

MUST KNOW for Data Mining Supervised Learning includes:

  • Decision Tree, Random Forest, Naive Bayes, K-NN, Neural Networks & SVM. All these techniques is used in predictive modeling & classification model building
  • Artificial Intelligence & machine learning is at the heart of supervised learning & with the advent of Internet of Things the world will witness a huge demand for professionals with knowledge on Data Mining Supervised Learning techniques

MUST KNOW for Forecasting/Time series includes:

  • AR, MA, ARMA, ARIMA should be understood to forecast the future sales or profits or weather or anything which is based on data ordered in time series
  • ARCH & GARCH are the techniques, which are used when we have high frequency data, meaning, data, which gets generated as a very frequent pace such as stock market data.

MUST KNOW for Data Visualization includes:

  • Top-notch tools such as Tableau will help you visualize the data to bring about meaningful inferences for business benefit
  • Learning data visualization principles is pivotal to successfully build the visualizations/reports & effectively showcase these to the various stakeholders in the most meaningful & engaging fashion

With thorough understanding of all these concepts, one can become a successful Data Scientist.