There are huge range of possibilities on why a model is over-fitting. I would like to address few common issues.
Before getting into answer I would like to give a short explanation on what
Dropout is from this research paper published by the original author on:
Dropout
Dropout:
A simple method to prevent over-fitting of neural network models by removing or dropping units(also called as neurons) while training randomly. There by the model won't be dependent on particular neurons or units.
Since you are facing a over-fitting problem you need to add a dropout layers along with your dense layers.
Stochastic Gradient Descent Tricks tells why SGD is good for training larger datasets.
Also in your case since it is a binary classification problem I would go for
binary_crossentropy as my loss function.
Okay now lets get into the code and see:
Add our imports first:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout
Read the dataset:
dataset = pd.read_csv("DIABETES DATA.csv")
Our scalers and encoders:
encoder = LabelEncoder()
scaler = StandardScaler()
Our independent and dependent variables:
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
Encode the categorical vars and scale X
X.iloc[:, 0] = encoder.fit_transform(X.iloc[:, 0:1])
X.iloc[:, :] = scaler.fit_transform(X.iloc[:, :])
X = np.array(X)
Our neural network:
classifier = Sequential()
classifier.add(Dense(1000, input_dim=9, activation="relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(500, activation="relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(250, activation="relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(100, activation="relu"))
classifier.add(Dropout(0.2))
classifier.add(Dense(1, activation="relu"))
classifier.summary()
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1000) 10000
_________________________________________________________________
dropout_1 (Dropout) (None, 1000) 0
_________________________________________________________________
dense_2 (Dense) (None, 500) 500500
_________________________________________________________________
dropout_2 (Dropout) (None, 500) 0
_________________________________________________________________
dense_3 (Dense) (None, 250) 125250
_________________________________________________________________
dropout_3 (Dropout) (None, 250) 0
_________________________________________________________________
dense_4 (Dense) (None, 100) 25100
_________________________________________________________________
dropout_4 (Dropout) (None, 100) 0
_________________________________________________________________
dense_5 (Dense) (None, 1) 101
=================================================================
Total params: 660,951
Trainable params: 660,951
Non-trainable params: 0
Choosing adam since SGD works well for larger datasets
classifier.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
Finally:
classifier.fit(X, y, batch_size=20, epochs=100)
Gives:
768/768 [==============================] - 1s 679us/step - loss: 0.4177 - acc: 0.7539
Epoch 99/100
768/768 [==============================] - 1s 677us/step - loss: 0.4609 - acc: 0.7552
Epoch 100/100
768/768 [==============================] - 1s 712us/step - loss: 0.4449 - acc: 0.8008