When using scikit-learn fit, getting an error "Expected a 2-dimensional container but got <class 'pandas.core.series.Series'> instead."

210 views Asked by At

Hello i tried to look at this error code online but coulnd't find much information about it. What i understand is that have to make like one Dataframe of X_train and Y_train but i don't know on how to do that. These 2 variables are a series, so do i have to make a new df or is there a different way?

The full error code: ValueError: Expected a 2-dimensional container but got <class 'pandas.core.series.Series'> instead. Pass a DataFrame containing a single row (i.e. single sample) or a single column (i.e. single feature) instead.

my code (the error is getting showen at the line where model.fit(X_train, y_train) is located :

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

df = pd.read_excel('Month.xlsx', usecols='W, AJ')
data = pd.read_excel('Month.xlsx')

x = df.iloc[:,0]
y = df.iloc[:,1]

x = x.replace(r'^\s*$', 0, regex=True)
y = y.replace(r'^\s*$', 0, regex=True)

# Opsplitsen van de dataset in trainings- en testset
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

# Maken van het lineaire regressiemodel
model = LinearRegression()

# Trainen van het model
model.fit(X_train, y_train)

# Voorspellen met het getrainde model
y_pred = model.predict(X_test)

# Visualisatie van de regressielijn
plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.xlabel('Luchtvochtigheid')
plt.ylabel('Regenval')
plt.show()
1

There are 1 answers

0
nemo On

The scikit-learn fit expects a matrix (in your case a nx1 array) as input. Thus, you have to make x a DataFrame (not a Series), by replacing the assignment of x (x = df.iloc[:,0]) by:

x = df.iloc[:,0:1]