python - SciKit-learn--Gaussian Naive Bayes Implementantion -
i have started using scikit-learn , trying train , predict gaussian naive bayes classificator. don't know i'm doing , if me.
problem: enter x quantity of items of type 1 , have response of type 0
how did it: in order generate data training make this:
#this of type 1 ganado={ "hora": "16:43:35", "fecha": "19/06/2015", "tiempo": 10, "brazos": "der", "sentado": "no", "puntuacion final pasteles": 50, "nombre": "usuario1", "puntuacion final botellas": 33 } #this type 0 perdido={ "hora": "16:43:35", "fecha": "19/06/2015", "tiempo": 10, "brazos": "der", "sentado": "no", "puntuacion final pasteles": 4, "nombre": "usuario1", "puntuacion final botellas": 3 } train=[] repeticion in range(0,400): train.append(ganado) repeticion in range(0,1): train.append(perdido)
i label data weak condiction:
listlabel=[] data in train: condition=data["puntuacion final pasteles"]+data["puntuacion final botellas"] if condition<20: listlabel.append(0) else: listlabel.append(1)
and generate data testing this:
#this should type 1 pruebaganado={ "hora": "16:43:35", "fecha": "19/06/2015", "tiempo": 10, "brazos": "der", "sentado": "no", "puntuacion final pasteles": 10, "nombre": "usuario1", "puntuacion final botellas": 33 } #this should type 0 pruebaperdido={ "hora": "16:43:35", "fecha": "19/06/2015", "tiempo": 10, "brazos": "der", "sentado": "no", "puntuacion final pasteles": 2, "nombre": "usuario1", "puntuacion final botellas": 3 } test=[] repeticion in range(0,420): test.append(pruebaganado) test.append(pruebaperdido)
after that, use train
and listlabel
train classifier:
vec = dictvectorizer() x=vec.fit_transform(train) gnb = gaussiannb() trained=gnb.fit(x.toarray(),listlabel)
once have trained classifier use data testing
testx=vec.fit_transform(test) predicted=trained.predict(testx.toarray())
finally results 0
. tell me did wrong , how fix please?
first of all, since data has features not informative (same value data), cleaned bit:
ganado={ "a": 50, "b": 33 } perdido={ "a": 4, "b": 3 } pruebaganado={ "a": 10, "b": 33 } pruebaperdido={ "a": 2, "b": 3 }
all rest not important, , cleaning code focus on counts.
now, gaussian naive bayes probability: may notice, classifier tries tell that:
p((a,b)=(10,33)|class=0)*p(class=0) > p((a,b)=(10,33)|class=1)*p(class=1)
because assumes both a
, b
have normal distribution, , probabilities in case low, priors gave -(1,400) negligible. can see formula here. way, can exact probabilities:
t = [pruebaganado,pruebaperdido] t = vec.fit_transform(t) print model.predict_proba(t.toarray()) #prints: [[ 1. 0.] [ 1. 0.]]
so classifier sure 0 right class. now, lets change bit test data:
pruebaganado={ "puntuacion final pasteles": 20, "puntuacion final botellas": 33 }
now have:
[[ 0. 1.] [ 1. 0.]]
so did nothing wrong, matter of calculation. way, challenge replace gaussiannb
multinomialnb
, , see how priors change all.
also, unless have reason use here gaussiannb
, consider using kind of tree classification, in opinion may suit problem better.
Comments
Post a Comment