Data Mining Implementation Process
Let’s study the Data Mining implementation process in detail
Business understanding:
In this phase, business and data-mining goals are established.
• First, you need to understand business and client objectives. You need to
define what your client wants (which many times even they do not know
themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your assessment.
• Using business objectives and current scenario, define your data mining
goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its appropriate
for the data mining goals.
• First, data is collected from multiple data sources available in the
organization.
• These data sources may include multiple databases, flat filer or data cubes.
There are issues like object matching and schema integration which can
arise during Data Integration process. It is a quite complex and tricky process
as data from various sources unlikely to match easily. For example, table A
contains an entity named cust_no whereas another table B contains an
entity named cust-id.Ahmed Yasir Khan Page 3 of 11
• Therefore, it is quite difficult to ensure that both of these given objects refer
to the same value or not. Here, Metadata should be used to reduce errors in
the data integration process.
• Next, the step is to search for properties of acquired data. A good way to
explore the data is to answer the data mining questions (decided in business
phase) using the query, reporting, and visualization tools.
• Based on the results of query, the data quality should be ascertained.
Missing data if any should be acquired.
Data preparation:
In this phase, data is made production ready.
The data preparation process consumes about 90% of the time of the project.
The data from different sources should be selected, cleaned, transformed,
formatted, anonymized, and constructed (if required).
Data cleaning is a process to “clean” the data by smoothing noisy data and filling in
missing values.
For example, for a customer demographics profile, age data is missing. The data is
incomplete and should be filled. In some cases, there could be data outliers. For
instance, age has a value 300. Data could be inconsistent. For instance, name of the
customer is different in different tables.
Data transformation operations change the data to make it useful in data mining.
Following transformation can be applied
Data transformation:
Data transformation operations would contribute toward the success of the mining
process.
Smoothing: It helps to remove noise from the data.
Aggregation: Summary or aggregation operations are applied to the data. I.e., the
weekly sales data is aggregated to calculate the monthly and yearly total.Ahmed Yasir Khan Page 4 of 11
Generalization: In this step, Low-level data is replaced by higher-level concepts
with the help of concept hierarchies. For example, the city is replaced by the
county.
Normalization: Normalization performed when the attribute data are scaled up o
scaled down. Example: Data should fall in the range -2.0 to 2.0 post-normalization.
Attribute construction: these attributes are constructed and included the given
set of attributes helpful for data mining.
The result of this process is a final data set that can be used in modeling.
Modelling
In this phase, mathematical models are used to determine data patterns.
• Based on the business objectives, suitable modeling techniques should be
selected for the prepared dataset.
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
• Results should be assessed by all stakeholders to make sure that model can
meet data mining objectives.
Evaluation:
In this phase, patterns identified are evaluated against the business objectives.
• Results generated by the data mining model should be evaluated against the
business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of data
mining.
• A go or no-go decision is taken to move the model in the deployment phase.Ahmed Yasir Khan Page 5 of 11
Deployment:
In the deployment phase, you ship your data mining discoveries to everyday
business operations.
• The knowledge or information discovered during data mining process
should be made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring of
data mining discoveries is created.
• A final project report is created with lessons learned and key experiences
during the project. This helps to improve the organization’s business policy.