I am trying to build a prediction model but currently keep getting an error: raise ValueError("Input contains NaN") ValueError: Input contains NaN. I tried to use np.any(np.isnan(dataframe)) and np.any(np.isnan(dataframe)), but I just keep getting new errors. For example, TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.
Here is the code so far:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np
dataframe = pd.read_csv('file.csv', delimiter=',')
le = LabelEncoder()
dfle = dataframe
dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')
newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]
X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values
y = dfle.column3
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()
ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
# np.all(np.isfinite(dfle))
# np.any(np.isnan(dfle))
X = ohe.fit_transform(X).toarray()
You can do multiple things to deal with this error first, you can fill the Nan values by 0
dataframe = pd.read_csv('file.csv', delimiter=',').fillna(0)or you can use
sklearnimputation techniques to fill the Nan value.https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute
Multiple Imputation techniques are available but you should use
KNNImputer.