machine learning - reducing FP rate scikit-learn random forest -
i working scikit-learn random forest classifier , want reduce fp rate increasing number of trees needed successful vote greater 50% 75%, after reading documentation not sure of how this. have suggestions. (i think there should way because according documentation predict method of classifier decides based on majority vote). appreciated, thanks!
lets have classifier use 75% agreement within estimators. in case gets new sample, , odds 51%-49% in favour of 1 class, want do?
the reason 50% rule used, because decision rule proposed may lead cases classifier says "i cannot predict label of these samples".
what can do, wrap results of classifier, , whatever calculations wish -
from sklearn.ensemble import randomforestclassifier sklearn import datasets import numpy np def my_decision_function(arr): diff = np.abs(arr[:,0]-arr[:,1]) arr [ diff < 0.5 ] = [-1,-1] # if >0.5, 1 class has more 0.75 prediction return arr x, y = datasets.make_classification(n_samples=100000, n_features=20, n_informative=2, n_redundant=2) train_samples = 100 # samples used training models x_train = x[:train_samples] x_test = x[train_samples:] y_train = y[:train_samples] y_test = y[train_samples:] clf = randomforestclassifier().fit(x_train,y_train) print my_decision_function(clf.predict_proba(x_train))
now, each sample less 0.75% @ least 1 class have [-1,-1]
prediction. adjustments made if use multi-label classification, hope notion clear.
Comments
Post a Comment