Creating and Training a Model
Creating the model​
In your notebook, create a RandomForestRegressor
model, sending it the
full_pipe
pipeline that you created earlier.
n_estimators = 100
max_depth = 6
max_features = 3
rf = make_pipeline(
full_pipe,
RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, max_features=max_features),
)
Training the model​
To train the model, call the fit
method of the model (in your notebook). The
transforms in the pipeline are applied to the training data before the model is
trained:
rf.fit(X_train, y_train)
Output:
Pipeline(steps=[('columntransformer',
...
...
...
)]
)
Make predictions using the testing data​
To get a prediction, call the predict()
method of the model (in your
notebook):
predictions = rf.predict(X_test)
print(predictions)
Sample Output:
[0.32 0.36]
The prediction
value will between 0
and 1
. The higher the value, the
higher the probability of the transaction being fraudulent. The value can be
exactly 0
or 1
, because this is the behavior of the RandomForestRegressor
model that made the prediction.
Get the error in the prediction​
Get the error in the predictions
made by the model (with the training data),
compared to the predictions in the testing data (y_test
):
mse = mean_squared_error(y_test, predictions)
print(mse)
Sample Output:
0.11599999999999999
Iterating on your model​
Based on the error in the prediction of a training model, you may choose to iterate on the model by updating your features, using a model with different parameters, and/or choosing a new model. Iterating on the model is not covered in this tutorial.