Some common Machine Learning, Statistics and Data Science terms starts with M

M

[su_table]

### Word ### Description
Machine Learning Machine Learning refers to the techniques involved in dealing with vast data in the most intelligent fashion (by developing algorithms) to derive actionable insights. In these techniques, we expect the algorithms to learn by itself wiithout being explicitly programmed.
Mahout Mahout is an open source project from Apache that is used for creating scalable machine learning algorithms. It implements popular machine learning techniques such as recommendation, classification, clustering.

Features of Mahout:

  • Mahout offers a framework for doing data mining tasks on large volumes of data

  • Mahout lets applications to analyze large sets of data effectively and in quick time

  • It also offers distributed fitness function capabilities for evolutionary programming

  • It includes several MapReduce enabled clustering implementations such as k-means, fuzzy k-means, Dirichlet, and Mean-Shift|
    |MapReduce|Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce framework is usually composed of three operations:

  1. Map: each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed.
  2. Shuffle: worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node.
  3. Reduce: worker nodes now process each group of output data, per key, in parallel.

|Market Basket Analysis|Market Basket Analysis (also called as MBA) is a widely used technique among the Marketers to identify the best possible combinatory of the products or services which are frequently bought by the customers. This is also called product association analysis.Association analysis mostly done based on an algorithm named “Apriori Algorithm”. The Outcome of this analysis is called association rules. Marketers use these rules to strategize their recommendations.

When two or more products are purchased, Market Basket Analysis is done to check whether the purchase of one product increases the likelihood of the purchase of other products. This knowledge is a tool for the marketers to bundle the products or strategize a product cross sell to a customer.|
|Market Mix Modeling|Market Mix Modeling is an analytical approach that uses historical information like point of sales to quantify the impact of some of the components on sales.

Suppose the total sale is 100$, this total can be broken into sub components i.e. 60$ base sale, 20$ pricing, 18$ may be distribution and 2$ might be due to promotional activity. These numbers can be achieved using various logical methods. Every method can give a different break up. Hence, it becomes very important to standardize the process of breaking up the total sales into these components. This formal technique is formally known as MMM or Market Mix Modeling.|
|Maximum Likelihood Estimation|It is a method for finding the values of parameters which make the likelihood maximum. The resulting values are called maximum likelihood estimates (MLE).|
|Mean|For a dataset, mean is said to be the average value of all the numbers. It can sometimes be used as a representation of the whole data.

For instance, if you have the marks of students from a class, and you asked about how good is the class performing. It would be irrelevant to say the marks of every single student, instead, you can find the mean of the class, which will be a representative for class performance.
To find the mean, sum all the numbers and then divide by the number of items in the set.

For example, if the numbers are 1,2,3,4,5,6,7,8,8 then the mean would be 44/9 = 4.89.|
|Median|Median of a set of numbers is usually the middle value. When the total numbers in the set are even, the median will be the average of the two middle values. Median is used to measure the central tendency.

To calculate the median for a set of numbers, follow the below steps:

  1. Arrange the numbers in ascending or descending order
  2. Find the middle value, which will be n/2 (where n is the numbers in the set)|
    |MIS|A management information system (MIS) is a computer system consisting of hardware and software that serves as the backbone of an organization’s operations. An MIS gathers data from multiple online systems, analyzes the information, and reports data to aid in management decision-making.

Objectives of MIS:

  • To improve decision-making, by providing up-to-date, accurate data on a variety of organizational assets

  • To correlate multiple data points in order to strategize ways to improve operations|
    |ML-as-a-Service (MLaaS)|Machine learning as a service (MLaaS) is an array of services that provide machine learning tools as part of cloud computing services. This can include tools for data visualization, facial recognition, natural language processing, image recognition, predictive analytics, and deep learning. Some of the top ML-as-a-service providers are:

  • Microsoft Azure Machine Learning Studio

  • AWS Machine Learning

  • IBM Watson Machine Learning

  • Google Cloud Machine Learning Engine

  • BigML|
    |Mode|Mode is the most frequent value occuring in the population. It is a metric to measure the central tendency, i.e. a way of expressing, in a (usually) single number, important information about a random variable or a population.

Mode can be calculated using following steps:

  • Count the number of time each value appears
  • Take the value which appears the most

Let us understand it with an example:

Suppose we have a dataset having 10 data points, listed below:

4,5,2,8,4,7,6,4,6,3

So now we will calculate the number of times each value has appeared.

Value Count
2 1
3 1
4 3
5 1
6 2
7 1
8 1

So we see that the value 4 is repeating the most, i.e., 3 times. So, the mode of this dataset will be 4.|
|Model Selection|Model selection is the task of selecting a statistical model from a set of known models. Various methods that can be used for choosing the model are:

  • Exploratory Data Analysis
  • Scientific Methods

Some of the criteria for selecting the model can be:

  • Akaike Information Criterion (AIC)
  • Adjusted R2
  • Bayesian Information Criterion (BIC)
  • Likelihood ratio test|
    |Monte Carlo Simluation|The idea behind Monte Carlo Simulation is to use random samples of parameters or inputs to explore the behavior of a complex process. Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring.|
    |Multi-Class Classification|Problems which have more than one class in the target variable are called multi-class Classification problems.

For example, if the target is to predict the quality of a product, which can be Excellent, good, average, fair, bad. In this case, the variable has 5 classes, hence it is a 5-class classification problem.|
|Multivariate Analysis|Multivariate analysis is a process of comparing and analyzing the dependency of multiple variables over each other.

For example, we can perform bivariate analysis of combination of two continuous features and find a relationship between them.

|
|Multivariate Regression|Multivariate, as the word suggests, refers to ‘multiple dependent variables’. A regression model designed to deal with multiple dependent variables is called a multivariate regression model.

Consider the example – for a given set of details about a student’s interests, previous subject-wise score etc, you want to predict the GPA for all the semesters (GPA1, GPA2, …. ). This problem statement can be addressed using multivariate regression since we have more than one dependent variable.|