Guide to self-study Data Science Part 2

Hello Everyone,

I hope you have gone through the “Guide to self-study Data Science Part 1” before coming here. It’s really advisable to completely read part 1, then only continue with part 2.
Link: Guide to self-study Data Science Part 1

Let’s continue,

* Linear Regression Basics

Learn the fundamentals of simple and multiple linear regression analysis. Linear regression is used for supervised learning with continuous outcomes. Some tools for performing linear regression are given below:

Python: NumPy, pylab, sci-kit-learn

R: caret package

* Machine Learning Basics

a) Supervised Learning (Continuous Variable Prediction)

  • Basic regression
  • Multi regression analysis
  • Regularized regression

b) Supervised Learning (Discrete Variable Prediction)

  • Logistic Regression Classifier
  • Support Vector Machine (SVM) Classifier
  • K-nearest neighbor (KNN) Classifier
  • Decision Tree Classifier
  • Random Forest Classifier
  • Naive Bayes

c) Unsupervised Learning

  • Kmeans clustering algorithm

Python tools for machine learning: Scikit-learn, Pytorch, TensorFlow.

8. Time Series Analysis Basics

Use for a predictive model in cases where the outcome is time-dependent, e.g., predicting stock prices. There are 3 basic methods for analyzing time-series data:

  • Exponential Smoothing
  • ARIMA (Auto-Regressive Integrated Moving Average), which is a generalization of exponential smoothing
  • GARCH (Generalized Auto Regressive Conditional Heteroskedasticity), which is an ARIMA-like model for analyzing variance.

These three techniques can be implemented in Python and R.

* Productivity Tools Basics

Knowledge on how to use basic productivity tools such as R studio, Jupyter notebook, and GitHub, is essential. For Python, Anaconda Python is the best productivity tool to install. Advanced productivity tools such as AWS and Azure are also important tools to learn.

* Data Science Project Planning Basics

Learn basics on how to plan a project. Before building any machine learning model, it is important to sit down carefully and plan what you want your model to accomplish. Before delving into writing code, it is important that you understand the problem to be solved, the nature of the dataset, the type of model to build, how the model will be trained, tested, and evaluated. Project planning and project organization are essential for increasing productivity when working on a data science project. Some resources for project planning and organization are provided below.

Good Luck!