Using Your Models Outside of AstronomicAL

Loading your model

The other tutorials show how you can use AstronomicAL to create accurate and robust classifiers using significantly less training data than normally required. This tutorial will explain the steps that will allow you to use your trained classifiers outside of AstronomicAL.

After every iteration of active learning or whenever you press the checkpoint button during training, your model is saved inside models/. As the models are created using Scikit Learn it is a quick and straightforward process to reload the models and use them on any new data.

To load your model in another python file or jupyter notebook run the following code:

1from joblib import load
2clf = load('models/your_classifier_model.joblib')

You can then simple run clf.predict(some_new_data) to generate new predictions. If you want the probability outputs of the predictions you can run clf.predict_proba(some_new_data) instead.

Note

You must use data that contains the same number of features that you trained with your model. This includes any feature combinations that you generated in the settings.

If you do not use the same amount of features, you will receive the following error:

ValueError: X has # features, but Classifier is expecting # features as input.

Loading a Committee

If you created a committee of classifiers during training, you would need to continue using the committee as a whole to get the same performance. When your committee is saved, rather than just saving as a single model, each individual model is saved and bundled into a directory within data/.

By default, the committees in AstronomicAL return the class probabilities averaged across each learner (known as consensus probabilities) and then the maximum probability is chosen to assign the prediction. It is recommended to recreate this behaviour outside of AstronomicAL to ensure that the model performs as expected.

 1from joblib import load
 2import glob
 3import numpy as np
 4
 5classifiers = []
 6
 7committee_dir = "models/committee_dir"
 8
 9classifiers = {} # Dictionary to hold all the individual classifiers in the committee
10for i, filename in enumerate(glob.iglob(f"{committee_dir}/*.joblib")):
11
12    # This is only required if you are using a scalar during training
13    if "SCALER" in filename:
14        continue
15
16    classifiers[f"{i}"] = load(filename)
17
18for i, clf in enumerate(classifiers):
19    pred_clf = classifiers[clf].predict_proba(some_new_data) # get the probability output for each classifier
20    if i == 0:
21        predictions_total = pred_clf
22    else:
23        predictions_total = predictions_total + pred_clf
24
25predictions_avg = predictions_total / len(classifiers.keys()) # This is the averaged probabilities for each class
26predictions = np.argmax(predictions_avg, axis=1) # This is the final predictions made

Scaling New Data

During the settings assignment, you had the option to scale your features. If this option was selected, you must scale any new data with the same scaler. Whenever your model is saved, the scaler (if used) is saved along with your updated models. The code below explains how to apply this scaler to your new data.

scaler = load("models/your_classifier_model-SCALER.joblib")

new_data_scaled = scaler.transform(some_new_data)