Here the code from begin :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns
sns.set_style("whitegrid")
from sklearn.model_selection import train_test_split #function
from xgboost import XGBRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score
from sklearn.metrics import r2_score #used to evaluate models and find error score accuracies
This is head of dataset :
Name Year Seller_type Transmission km_driven selling_price owner
0 X-Trail 2015 Individu Automatic 128729 191000000 1st owner
1 Terios 2019 Individu Automatic 76361 202000000 1st owner
2 HR-V 2017 Individu Automatic 45992 266000000 1st owner
3 City 2021 Individu Automatic 3544 269000000 1st owner
4 BR-V 2018 Individu Automatic 85512 179000000 1st owner
Tail of dataset :
Name Year Seller_type Transmission km_driven selling_price owner
694 Terios 2021 Individu Automatic 104087 198422198 1st owner
695 Terios 2013 Individu Automatic 172253 283860358 1st owner
696 Xenia 2020 Showroom Automatic 62043 405074059 2nd owner
697 Terios 2021 Individu Automatic 28409 120411290 2nd owner
698 Terios 2014 Individu Automatic 187904 244024577 2nd owner
Separate the independent & dependent
X = car_dataset.drop('selling_price',axis=1)
y = car_dataset['selling_price'] #Store it feature selling price into Y
Split training & test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=42)
MODEL TRAINING
xg = XGBRegressor()
xg.fit(X_train, y_train)
y_pred = xg.predict(X_test)
accuracy_score(Y_test, y_pred)
Result : 0.0
R2_score
r2_score(y_test, y_pred)
Result : -0.0441435169799782
What I have tried:
Notes : Before separate the data become independent & dependent, i have convert the text into numerical values.
Please i need ur help guys.