python - SciKit-learn--Gaussian Naive Bayes Implementantion -


i have started using scikit-learn , trying train , predict gaussian naive bayes classificator. don't know i'm doing , if me.

problem: enter x quantity of items of type 1 , have response of type 0

how did it: in order generate data training make this:

 #this of type 1     ganado={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 50,             "nombre": "usuario1",             "puntuacion final botellas": 33         }     #this type 0     perdido={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 4,             "nombre": "usuario1",             "puntuacion final botellas": 3         }     train=[]     repeticion in range(0,400):         train.append(ganado)      repeticion in range(0,1):             train.append(perdido) 

i label data weak condiction:

listlabel=[] data in train:     condition=data["puntuacion final pasteles"]+data["puntuacion final botellas"]            if condition<20:         listlabel.append(0)     else:         listlabel.append(1) 

and generate data testing this:

  #this should type 1     pruebaganado={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 10,             "nombre": "usuario1",             "puntuacion final botellas": 33         }     #this should type 0     pruebaperdido={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 2,             "nombre": "usuario1",             "puntuacion final botellas": 3         }         test=[]         repeticion in range(0,420):             test.append(pruebaganado)             test.append(pruebaperdido) 

after that, use trainand listlabel train classifier:

vec = dictvectorizer() x=vec.fit_transform(train) gnb = gaussiannb() trained=gnb.fit(x.toarray(),listlabel) 

once have trained classifier use data testing

testx=vec.fit_transform(test) predicted=trained.predict(testx.toarray()) 

finally results 0. tell me did wrong , how fix please?

first of all, since data has features not informative (same value data), cleaned bit:

ganado={     "a": 50,     "b": 33 } perdido={         "a": 4,         "b": 3     } pruebaganado={         "a": 10,         "b": 33     } pruebaperdido={         "a": 2,         "b": 3     } 

all rest not important, , cleaning code focus on counts.

now, gaussian naive bayes probability: may notice, classifier tries tell that:

p((a,b)=(10,33)|class=0)*p(class=0)   >   p((a,b)=(10,33)|class=1)*p(class=1) 

because assumes both a , b have normal distribution, , probabilities in case low, priors gave -(1,400) negligible. can see formula here. way, can exact probabilities:

t = [pruebaganado,pruebaperdido] t = vec.fit_transform(t) print model.predict_proba(t.toarray()) #prints: [[ 1.  0.] [ 1.  0.]] 

so classifier sure 0 right class. now, lets change bit test data:

pruebaganado={     "puntuacion final pasteles": 20,     "puntuacion final botellas": 33 } 

now have:

[[ 0.  1.] [ 1.  0.]] 

so did nothing wrong, matter of calculation. way, challenge replace gaussiannb multinomialnb, , see how priors change all.

also, unless have reason use here gaussiannb, consider using kind of tree classification, in opinion may suit problem better.


Comments