Sklearn Label Encoder - Not getting desired output based on prediction and inverse transform

871 views Asked by At

I'm new to the Python ML using scikit. I was working on a solution to create a model with three columns Pets, Owner and location.

import pandas
import joblib
from sklearn.tree import DecisionTreeClassifier
from collections import defaultdict
from sklearn import preprocessing 

df = pandas.DataFrame({
    'pets': ['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'], 
    'owner': ['Champ', 'Ron', 'Brick', 'Champ', 'Veronica', 'Ron'], 
    'location': ['San_Diego', 'New_York', 'New_York', 'San_Diego', 'San_Diego', 
                 'New_York']
})

Now, with the label encoder I'm encoding the entire Data Frame.

le = preprocessing.LabelEncoder()
df_encoded = df.apply(le.fit_transform)
df_array=df_encoded.values

Now, I'm splitting the encoded array into Input set (Pets and Owner) and an Output set (location)

IpSet = df_array[:,0:2]
Opset = df_array[:,2:3]

Then, I create a new model of decision tree classifier and am fitting the input and output set.

model = DecisionTreeClassifier()
model.fit(IpSet,Opset)

Now, I'm trying to predict the Location using the model for a new Dataframe. I'm using the same Label encoder as used earlier.

df_Predict = pandas.DataFrame({
    'pets': ['cat'], 
    'owner': ['Champ']})
df_encoded_Predict = df_Predict.apply(le.fit_transform)
predictions_train = model.predict(df_encoded_Predict)
print(le.inverse_transform(predictions_train)[:1])

With this, I'm expecting to see the value 'San Diego'. Not sure, why I'm getting 'Champ' as an output.

Could someone help me through this?

1

There are 1 answers

0
sulhi On

The logic you following is not correct.

    df_encoded = df.apply(le.fit_transform)

Here the same encoder ( le ) fitted for every column and end of this line execution le has only the location information.

When you need to use already fitted encoder use the .transform() method instead of following.

       df_encoded_Predict = df_Predict.apply(le.fit_transform)