An example of an every-day decision problem that resists a manually defined solution is the discrimination of spam email from non-spam email.
How would you write a program to filter emails as they come into your email account and decide whether to put them in the spam folder or the inbox folder?
Photo credited to balise42, Some rights reserved
Some of my thoughts on how to do this are:
- I’d collect examples of emails I knew to be spam or not-spam
- I’d read the emails I had collected and write down any patterns I saw in either group
- I’d think about abstracting those patterns into more general rules I could program
- I’d look for emails that I could safely and quickly categorize as either spam or non-spam
- I’d write tests for my program to ensure it was making accurate decisions
- I’d monitor the deployed system and keep an eye on the decisions it was making
I could write a program to do this, and so could you. It would take a lot of time. A lot of emails would have to be read. The problem would need to be thought about very deeply. It would take a lot of development and testing time before the system could be trusted enough to be put into operations. Once in operations, there would be so many hard coded rules that were specific to the email I had read that it would be a maintenance nightmare.
The process above also describes a machine learning solution to the problem of discriminating spam email from non-spam email. The punch line is that machine learning methods can automate the process for you.
Pro Tip: Approaching complex problems in this way is an incredibly valuable skill that will serve you well later on in preparing data and selecting the right machine learning method. Thinking through the process of “how would I manually write a program to solve this” is a master skill that is often overlooked and forgotten by professionals.