What are the topics covered in Data Science?

The topics covered in Data Science depending upon the level of the course i.e. whether students are enrolling for a basic course or a more advanced course. Basic courses generally introduce the student to concepts of statistics and probability, and then proceed towards preliminary hypothesis testing procedure such as t-test, z-test, ANOVA, F-test etc and finally advances to cause-effect modelling such as regression, factor analysis, logistic analysis etc.
Advanced courses, on the other hand, starts with regression analysis and then progresses to more complicated algorithms such as random forest, decision trees, Bayesian principles, collaborative filtering and so on.

Data Science has very hazy boundaries, so there will be people arguing about where it starts and stops, but here are my two cents on the subject.

Data Science requires a lot of mathematical concepts understanding, such as:

  • Calculus, linear algebra, trigonometry, etc.
  • Probabilities and distributions

Then, you’ll obviously need to know about machine learning, including:

  • Inference, regression, clustering, tests
  • Time Series, Survival Analysis
  • CART, Random Forests, Map/Reduce, Feature selection
  • Models comparison
  • Neural networks, Deep learning
  • Computer vision, natural language processing, geolocation handling

Obviously, to be able to apply this knowledge, you need to be able to master IT Tools, such as:

  • Programming (R, Python, SAS, Java, whichever)
  • Software (SPSS, Spark, etc.)

Since you’ll be wrangling with data so much, you need to be able to feel comfortable with:

  • Databases (SQL and NOSQL)
  • Web languages and web semantics to extract data from websites
  • Data visualization

And finally, there are other non-obvious skills you’ll need to be a Data Scientist:

  • Communication and presentation (you are a specialist, and will most probably need to explain your findings to non-technicians to get them to adopt your recommandations)
  • Law an ethics, as it would be too easy, with the whole set of tools and technical assets that a data scientis has, to pursue wrong actions or even illegal activities, so it is important to know some boundaries
  • IT architecture, IT security, Cloud Computing. This is not of direct responsibility for a data scientist, but a good one should be smart enough not to take risks, and to take advantage of the best resources around him/her.
  • Project management, etc.

Data Science is the amalgamation of various topics that come together to help you analyze data and draw a better conclusion.

The following is a list of sub-topics for Data Science that you should cover:

Statistics: Statistics, in its most basic form, is the process of gathering data, analysing it, and making conclusions from it.

Linear Algebra: Linear Algebra is a branch of mathematics that deals with straight lines in space.

Programming: Despite performing many of the same activities as data scientists, many data scientists nowadays are referred to be “software developers.” And databases are used frequently by data scientists. You must have prior experience in this field in order to be effective in your job.

Machine Learning: We can educate computers to program themselves so that we don’t have to write explicit instructions for certain jobs.

Data Mining: Data mining is the practice of sifting through large amounts of data in order to extract useful information.

You can try this learning path on data science to learn more and become an expert