python - SciKit-learn--Gaussian Naive Bayes Implementantion -


i have started using scikit-learn , trying train , predict gaussian naive bayes classificator. don't know i'm doing , if me.

problem: enter x quantity of items of type 1 , have response of type 0

how did it: in order generate data training make this:

 #this of type 1     ganado={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 50,             "nombre": "usuario1",             "puntuacion final botellas": 33         }     #this type 0     perdido={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 4,             "nombre": "usuario1",             "puntuacion final botellas": 3         }     train=[]     repeticion in range(0,400):         train.append(ganado)      repeticion in range(0,1):             train.append(perdido) 

i label data weak condiction:

listlabel=[] data in train:     condition=data["puntuacion final pasteles"]+data["puntuacion final botellas"]            if condition<20:         listlabel.append(0)     else:         listlabel.append(1) 

and generate data testing this:

  #this should type 1     pruebaganado={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 10,             "nombre": "usuario1",             "puntuacion final botellas": 33         }     #this should type 0     pruebaperdido={             "hora": "16:43:35",             "fecha": "19/06/2015",             "tiempo": 10,             "brazos": "der",             "sentado": "no",             "puntuacion final pasteles": 2,             "nombre": "usuario1",             "puntuacion final botellas": 3         }         test=[]         repeticion in range(0,420):             test.append(pruebaganado)             test.append(pruebaperdido) 

after that, use trainand listlabel train classifier:

vec = dictvectorizer() x=vec.fit_transform(train) gnb = gaussiannb() trained=gnb.fit(x.toarray(),listlabel) 

once have trained classifier use data testing

testx=vec.fit_transform(test) predicted=trained.predict(testx.toarray()) 

finally results 0. tell me did wrong , how fix please?

first of all, since data has features not informative (same value data), cleaned bit:

ganado={     "a": 50,     "b": 33 } perdido={         "a": 4,         "b": 3     } pruebaganado={         "a": 10,         "b": 33     } pruebaperdido={         "a": 2,         "b": 3     } 

all rest not important, , cleaning code focus on counts.

now, gaussian naive bayes probability: may notice, classifier tries tell that:

p((a,b)=(10,33)|class=0)*p(class=0)   >   p((a,b)=(10,33)|class=1)*p(class=1) 

because assumes both a , b have normal distribution, , probabilities in case low, priors gave -(1,400) negligible. can see formula here. way, can exact probabilities:

t = [pruebaganado,pruebaperdido] t = vec.fit_transform(t) print model.predict_proba(t.toarray()) #prints: [[ 1.  0.] [ 1.  0.]] 

so classifier sure 0 right class. now, lets change bit test data:

pruebaganado={     "puntuacion final pasteles": 20,     "puntuacion final botellas": 33 } 

now have:

[[ 0.  1.] [ 1.  0.]] 

so did nothing wrong, matter of calculation. way, challenge replace gaussiannb multinomialnb, , see how priors change all.

also, unless have reason use here gaussiannb, consider using kind of tree classification, in opinion may suit problem better.


Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -