Some common Machine Learning, Statistics and Data Science terms starts with B



### Word ### Description
Backpropogation In neural networks, if the estimated output is far away from the actual output (high error), we update the biases and weights based on the error. This weight and bias updating process is known as Back Propagation. Back-propagation (BP) algorithms work by determining the loss (or error) at the output and then propagating it back into the network. The weights are updated to minimize the error resulting from each neuron. The first step in minimizing the error is to determine the gradient (Derivatives) of each node w.r.t. the final output.
Bagging Bagging or bootstrap averaging is a technique where multiple models are created on the subset of data, and the final predictions are determined by combining the predictions of all the models. Some of the algorithms that use bagging technique are :
  • Bagging meta-estimator
  • Random Forest

|Bar Chart|Bar charts are a type of graph that are used to display and compare the numbers, frequency or other measures (e.g. mean) for different discrete categories of data. They are used for categorical variables. Simple example of a bar chart:

|Bayes Theorem|Bayes’ theorem is used to calculate the conditional probability. Conditional probability is the probability of an event ‘B’ occurring given the related event ‘A’ has already occurred.

For example, Let’s say a clinic wants to cure cancer of the patients visiting the clinic.

A represents an event “Person has cancer”

B represents an event “Person is a smoker”

The clinic wishes to calculate the proportion of smokers from the ones diagnosed with cancer.

To do so use the Bayes’ Theorem (also known as Bayes’ rule) which is as follows:
bayes' theorem
|Bayesian Statistics|Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It provides people the tools to update their beliefs in the evidence of new data. It differs from classical frequentist approach and is based on the use of Bayesian probabilities to summarize evidence.
|Bias-Variance Trade-off|The error emerging from any model can be broken down into components mathematically.

Following are these component :

  1. Bias error is useful to quantify how much on an average are the predicted values different from the actual value
  2. Variance on the other side quantifies how are the prediction made on same observation different from each other

A high bias error means we have a under-performing model which keeps on missing important trends. A high variance model will over-fit on your training population and perform badly on any observation beyond training. In order to have a perfect fit in the model, the bias and variance should be balanced which is bias variance trade off.|
|Big Data|Big data is a term that describes the large volume of data – both structured and unstructured. But it’s not the amount of data that’s important. It’s how organizations use this large amount of data to generate insights. Companies use various tools, techniques and resources to make sense of this data to derive effective business strategies.|
|Binary Variable|Binary variables are those variables which can have only two unique values. For example, a variable “Smoking Habit” can contain only two values like “Yes” and “No”.|
|Binomial Distribution|Binomial Distribution is applied only on discrete random variables. It is a method of calculating probabilities for experiments having fixed number of trials.

Binomial distribution has following properties:

  1. The experiment should have finite number of trials
  2. There should be two outcomes in a trial: success and failure
  3. Trials are independent
  4. Probability of success § remains constant

For a distribution to qualifying as binomial, all of the properties must be satisfied.

So, which kind of distributions would be considered binomial? Let’s answer it using few examples:

  1. Suppose, you need to find the probability of scoring bull’s eye on a dart. Can it be called as binomial distribution? No, because the number of trials isn’t fixed. I could hit the bull’s eye on the 1st attempt or 3rd attempt or I might not be able to hit it at all. Therefore, trials aren’t fixed.
  2. A football match can have resulted in 3 ways: Win, Lose or Draw. Thus, if we are asked to find the probability of winning in this case, binomial distribution cannot be used because there are more than two outcomes.
  3. Tossing a fair coin 20 times is a case of binomial distribution as here we have finite number of trials 20 with only two outcomes “Head” or “Tail”. These trials are independent and probability of success is 1/2 across all trials.

The formula to calculate probability using Binomial Distribution is:

P ( X = r ) = nCr (pˆr)* (1-p) * (n-r)

n : No. of trials
r : No. of success
p : the probability of success
1 – p : Probability of failure
nCr : binomial coefficient given by n!/k!(n-k)!|
|Boosting|Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model. Some of the boosting algorithms are:

  • AdaBoost
  • GBM
  • XGBM
  • LightGBM
  • CatBoost

|Bootstrapping|Bootstrapping is the process of dividing the dataset into multiple subsets, with replacement. Each subset is of the same size of the dataset. These samples are called bootstrap samples.|
|Box Plot|It displays the full range of variation (from min to max), the likely range of variation (the Interquartile range ) , and a typical value (the median). Below is a visualization of a box plot:

Some of the inferences that can be made from a box plot:

  • Median: Middle quartile marks the median.
  • Middle box represents the 50% of the data
  • First quartile: 25% of data falls below these line
  • Third quartile: 75% of data falls below these line.|
    |Business Analytics|Business analytics is mainly used to show the practical methodology followed by an organization for exploring data to gain insights. The methodology focusses on statistical analysis of the data.|
    |Business Intelligence|Business intelligence are a set of strategies, applications, data, technologies used by an organization for data collection, analysis and generating insights to derive strategic business opportunities.|