In the code below, we’ll show the following:
We can search for the best scalers. Instead of just the StandardScaler(), we can try MinMaxScaler(), Normalizer() and MaxAbsScaler().
We can search for the best variance threshold to use in the selector, i.e., VarianceThreshold().
We can search for the best value of k for the KNeighborsClassifier().
The parameters variable below is a dictionary that specifies the key:value pairs. Note the key must be written, with a double underscore __ separating the module name that we selected in the Pipeline() and its parameter. Note the following:
The scaler has no double underscore, as we have specified a list of objects there.
We would search for the best threshold for the selector, i.e., VarianceThreshold(). Hence we have specified a list of values [0, 0.0001, 0.001, 0.5] to choose from.
Different values are specified for the n_neighbors, p and leaf_size parameters of the KNeighborsClassifier().
…
parameters = {‘scaler’: [StandardScaler(), MinMaxScaler(),
Normalizer(), MaxAbsScaler()],
‘selector__threshold’: [0, 0.001, 0.01],
‘classifier__n_neighbors’: [1, 3, 5, 7, 10],
‘classifier__p’: [1, 2],
‘classifier__leaf_size’: [1, 5, 10, 15]
}
…
parameters = {‘scaler’: [StandardScaler(), MinMaxScaler(),
Normalizer(), MaxAbsScaler()],
‘selector__threshold’: [0, 0.001, 0.01],
‘classifier__n_neighbors’: [1, 3, 5, 7, 10],
‘classifier__p’: [1, 2],
‘classifier__leaf_size’: [1, 5, 10, 15]
}
The pipe along with the above list of parameters are then passed to a GridSearchCV() object, that searches the parameters space for the best set of parameters as shown below:
…
grid = GridSearchCV(pipe, parameters, cv=2).fit(X_train, y_train)
print('Training set score: ’ + str(grid.score(X_train, y_train)))
print('Test set score: ’ + str(grid.score(X_test, y_test)))
…
grid = GridSearchCV(pipe, parameters, cv=2).fit(X_train, y_train)
print('Training set score: ’ + str(grid.score(X_train, y_train)))
print('Test set score: ’ + str(grid.score(X_test, y_test)))
Running the example you should see the following:
Training set score: 0.8928571428571429
Test set score: 0.8571428571428571
Training set score: 0.8928571428571429
Test set score: 0.8571428571428571