Trending December 2023 # Approaching Classification With Neural Networks # Suggested January 2024 # Top 17 Popular

You are reading the article Approaching Classification With Neural Networks updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Approaching Classification With Neural Networks

This article was published as a part of the Data Science Blogathon.

Introduction on Classification

Classification is one of the basic tasks that a machine can be trained to perform. This can include classifying whether it will rain or not today using the weather data, determining the expression of the person based on the facial image, or the sentiment of the review based on text etc. Classification is extensively applied in various applications thus making it one of the most fundamental tasks under supervised machine learning.

But before we start with the classification let’s get started…

About the Dataset

The dataset we are using to train our model is the Iris Dataset. This dataset consists of 150 samples belonging to 3 species of Iris flower i.e. Iris Setosa, Iris Versicolour and Iris Virginica. This is a multi-variate dataset i.e. there are 4 features provided for each sample i.e. sepal length, sepal width, petal length and petal width. We need to use these 4 features and classify the type of iris species. Thus a multi-class classification model is used to train on this dataset. More information about this dataset can be found here.

Getting Started with Classification

Let’s get started by first importing required libraries,

import os import pandas as pd import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras import layers from tensorflow.keras import models from tensorflow.keras import optimizers from tensorflow.keras import losses from tensorflow.keras import metrics from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score from tensorflow import keras from sklearn.preprocessing import LabelEncoder

Check the version of TensorFlow installed by following,


Next, we need to download and extract the dataset from here. Then move it to the location of notebook/script or copy the location of the dataset. Now read the CSV file from that location,

file_path = 'iris_dataset.csv' df = pd.read_csv(file_path) df.head()

We can see that our dataset has 4 input features and 1 target variable. The target variable consists of 3 classes i.e. ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-verginica’. Now let’s further prepare our dataset for model training.

Data Preparation

First, let’s check if our dataset consists of any null values.


There are no null values. Therefore we can continue to separate the inputs and targets.

X = df.drop('target', axis=1) y = df['target']

Since now we have separated the input features (X) and target labels (y), let’s split the dataset into training and validation sets. For this purpose let’s use Scikit-Learn’s train_test_split method to split our dataset.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) print("The length of training dataset: ", len(X_train)) print("The length of validation dataset: ", len(X_test))

In the above code, we have split the dataset such that the validation data contains 20% of the randomly selected samples from the whole dataset. Let’s now further do some processing before we create the model.

Data Processing

Since we have the data split ready, let’s now do some basic processing like feature scaling and encoding labels. The input features contain attributes of petal and sepal i.e. length and width in centimetres. Therefore these features are numerical that need to be normalized i.e. transform the data such that the mean is 0 and the standard deviation is 1.

Let’s use Scikit-learn’s StandardScalar module to do the same.

features_encoder = StandardScaler() ######################################################## X_train = features_encoder.transform(X_train) X_test = features_encoder.transform(X_test)

Now we should encode the categorical target labels. This is because our model won’t be able to understand if the categories are represented in strings. Therefore let’s encode the labels using Scikit-learn’s LabelEncoder module.

label_encoder = LabelEncoder() ######################################################## y_train = label_encoder.transform(y_train).reshape(-1, 1) y_test = label_encoder.transform(y_test).reshape(-1, 1)

Now let’s check the shapes of the datasets,

print(X_train.shape) print(X_test.shape) print(y_train.shape) print(y_test.shape)

Great! Now we are ready to define and train our model.

Creating Model

Let’s define the model for classification using the Keras Sequential API. We can stack the required layers and define the model architecture. For this model, let’s define Dense layers to define the input, output and intermediate layers.

model = Sequential([ layers.Dense(8, activation="relu", input_shape=(4,)), layers.Dense(16, activation="relu"), layers.Dense(32, activation="relu"), layers.Dense(3, activation="softmax") ])

In the above model, we have defined 4 Dense layers. The output layer consists of 3 neurons i.e. equal to the number of output labels present. We are using the softmax activation function at the final layer because it enables the model to provide probabilities for each of the labels. The output label that has the highest probability is the output prediction determined by the model. In other layers, we have used the relu activation function.

Now let’s compile the model by defining the loss function, optimizer and metrics.

loss=losses.SparseCategoricalCrossentropy(), metrics=metrics.SparseCategoricalAccuracy())

According to the above code, we have used SGD or Stochastic Gradient Descent as the optimizer with a default learning rate of 0.01. The SparseCategoricalCrossEntropy loss function is used. We are using SparseCategoricalCrossEntropy rather than CategoricalCrossEntropy loss function because our outputs categories are in the integer format. CategoricalCrossEntropy would be a good choice when the categories are one-hot encoded. Finally, we are using SparseCategoricalAccuracy as the metric that is tracked.

Now let’s train the model…

Model Training and Evaluation

Now let’s train our model using the processed training data for 200 epochs and provide the test dataset for validation.

history =, y=y_train, epochs=200, validation_data=(X_test, y_test), verbose=0)

Now we have trained our model using the training dataset. Before evaluation let’s check the summary of the model we have defined.

# Check model summary model.summary()

Now let’s evaluate the model on the testing dataset.

# Perform model evaluation on the test dataset model.evaluate(X_test, y_test)

That’s great results… Now let’s define some helper functions to plot the accuracy and loss plots.

# Plot history # Function to plot loss def plot_loss(history): plt.plot(history.history['loss'], label='loss') plt.plot(history.history['val_loss'], label='val_loss') plt.ylim([0,10]) plt.xlabel('Epoch') plt.ylabel('Error (Loss)') plt.legend() plt.grid(True) ######################################################## # Function to plot accuracy def plot_accuracy(history): plt.plot(history.history['sparse_categorical_accuracy'], label='accuracy') plt.plot(history.history['val_sparse_categorical_accuracy'], label='val_accuracy') plt.ylim([0, 1]) plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() plt.grid(True)

Now let’s pass the model training history and check the model performance on the dataset.

plot_loss(history) plot_accuracy(history)

We can see from the graphs below that the model has learnt over time to classify different species almost accurately.

Save and Load Model

Since we have the trained model,  we can export it for further use cases, deploy it in applications, or continue the training from left off. We can do this by using the save method and exporting the model in H5 format.

# Save the model"trained_classifier_model.h5")

We can load the saved model checkpoint by using the load_model method.

# Load the saved model and perform classification loaded_model = models.load_model('trained_classifier_model.h5')

Now let’s try to find predictions from the loaded model. Since the model contains softmax as the output activation function, we need to use the np.argmax() method to pick the class with the highest probability.

# The results the model returns are softmax outputs i.e. the probabilities of each class. results = loaded_model.predict(X_test) preds = np.argmax(results, axis=1)

Now we can evaluate the predictions by using metric functions.

# Predictions print(accuracy_score(y_test, preds)) print(classification_report(y_test, preds))

Awesome! Our results match the previous ones.

Till now we have trained a deep neural network using TensorFlow to perform basic classification tasks using tabular data. By using the above method, we can train classifier models on any tabular dataset with any number of input features. By leveraging the different types of layers available in Keras, we can optimize and have more control over the model training, thus improving the metric performance. It is recommended to try replicating the above procedure on other datasets and experiment by changing different hyperparameters like learning rate, the number of layers, optimizers etc until we get desirable model performance.


You're reading Approaching Classification With Neural Networks

Intent Classification With Convolutional Neural Networks

This article was published as a part of the Data Science Blogathon


is a machine-learning approach that groups text into pre-defined categories. It is an integral tool in Natural Language Processing (NLP) used for varied tasks like spam and non-spam email classification, sentiment analysis of movie reviews, detection of hate speech in social media posts, etc. Although there are a lot of machine learning algorithms available for text classification like Naive Bayes, Support Vector Machines, Logistic Regression, etc., in this article we will be using a deep-learning-based convolutional neural network architecture to perform intent classification of text commands.

  What are CNNs?

Though CNNs are associated more frequently with computer vision problems, recently they have been used in NLP with interesting results. CNNs are just several layers of convolutions with non-linear activation functions like ReLU or tanh or SoftMax applied to the results.

A 1-D convolution is shown in the above image. A filter/kernel of size 3 is passed over the input of size 6. Convolution is a mathematical operation where the elements in the filter are multiplied element-wise with the input over which the filter is currently present and the corresponding products are summed up to obtain the output element (as is shown by c3 = w1i2 + w2i3 + w3i4). The filter keeps going over the input, performing convolutions, and obtaining the output elements. We need 2-D convolutions in image processing tasks since images are 2-D vectors, but 1-D convolutions are enough for 1-D text manipulations. A convolutional neural network is simply a neural network where layers that perform convolutions are present. There can be multiple filters present in a single convolutional layer, which help to capture information about different input features.

  Why CNNs in text classification?

The filters/kernels in CNNs can help identify relevant patterns in text data – bigrams, trigrams, or n-grams (contiguous sequence of n words) depending on kernel size. Since CNNs are translation invariant, they can detect these patterns irrespective of their position in the sentence. Local order of words is not that important in text classification, so CNNs can perform this task effectively. Each filter/kernel detects a specific feature, such as if the sentence contains positive (‘good’, ‘amazing’) or negative (‘bad’, ‘terrible’) terms in the case of sentiment analysis. Like sentiment analysis, most text classification tasks are determined by the presence or absence of some key phrases present anywhere in the sentence. This can be effectively modelled by CNNs which are good at extracting local and position-invariant features from data. Hence we have chosen CNNs for our intent classification task.

  Loading the Dataset import pandas as pd commands=pd.read_csv('TextCommands.csv’) commands.columns = ['text','label','misc'] commands.head()

The dataset looks like this :

Source: Author’s Jupyter notebook

The different intents/labels are numbered from 1 to 26. The dataset is pretty balanced among the different labels. The dataset should ideally be balanced because a severely imbalanced dataset can be challenging to model and require specialized techniques.

  Data Preprocessing

Data preprocessing is a particularly important task in NLP. We apply three main pre-processing methods here :

Tokenizing: Keras’ inbuilt tokenizer API has fit the dataset which splits the sentences into words and creates a dictionary of all unique words found and their uniquely assigned integers. Each sentence is converted into an array of integers representing all the unique words present in it.

Sequence Padding: The array representing each sentence in the dataset is filled with zeroes to the left to make the size of the array 10 and bring all arrays to the same length.

Finally, the labels are converted into one-hot vectors using the to_categorical function from Keras.utils library.

The corresponding code : import numpy as np from chúng tôi import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.utils import to_categorical MAX_SEQUENCE_LENGTH = 10 MAX_NUM_WORDS = 5000 tokenizer = Tokenizer(num_words=MAX_NUM_WORDS) tokenizer.fit_on_texts(commands['text']) sequences = tokenizer.texts_to_sequences(commands['text']) word_index = tokenizer.word_index print('Found %s unique tokens.' % len(word_index)) data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) labels = to_categorical(np.asarray(commands['label'])) print('Shape of data tensor:', data.shape) print('Shape of label tensor:', labels.shape)

142 unique tokens are found in our dataset. Next, we need to split the data into train and test sets. The random shuffling of indices is used to split the dataset into roughly 90% training data and the rest test data.

VALIDATION_SPLIT = 0.1 indices = np.arange(data.shape[0]) np.random.shuffle(indices) data = data[indices] labels = labels[indices] num_validation_samples = int(VALIDATION_SPLIT * data.shape[0]) x_train = data[:-num_validation_samples] y_train = labels[:-num_validation_samples] x_val = data[-num_validation_samples:] y_val = labels[-num_validation_samples:]   Model Building

We start by importing the necessary packages to build the model and creating an embedding layer.

from keras.layers import Dense, Input, GlobalMaxPooling1D from keras.layers import Conv1D, MaxPooling1D, Embedding, Flatten from keras.models import Model from keras.models import Sequential from keras.initializers import Constant EMBEDDING_DIM = 60 num_words = min(MAX_NUM_WORDS, len(word_index) + 1) embedding_layer = Embedding(num_words,EMBEDDING_DIM,input_length=MAX_SEQUENCE_LENGTH,trainable=True) A keras functional model is implemented. It has the following layers :

An input layer that takes the array of length 10 representing a sentence.

An embedding layer of dimension 60 whose weights can be updated during training. It helps to convert each word into a fixed-length dense vector of size 60. The input dimension is set as the size of the vocabulary and the output dimension is 60. Each word in the input will hence get represented by a vector of size 60.

Two convolutional layers (Conv1D) with 64 filters each, kernel size of 3, and relu activation.

A max-pooling layer(MaxPooling1D) with pool size 2. Max Pooling in CNN is an operation that selects the maximum element from the region of the input which is covered by the filter/kernel. Pooling reduces the dimensions of the output, but it retains the most important information.

A flatten layer to flatten the input without affecting batch size. If the input to the flatten layer is a tensor of shape 1 X 3 X 64, the output will be a tensor of shape 1 X 192.

A dense (fully connected) layer of 100 units and relu activation.

A dense layer of 26 units and softmax activation that outputs the final probabilities of belonging to each of the 26 classes. Softmax activation is used here since it goes best with categorical cross-entropy loss, which is the loss we are going to be using to train the model.

The model architecture is shown below :

Source: Created by Author

The code for building the model :

sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32') embedded_sequences = embedding_layer(sequence_input) x = Conv1D(64, 3, activation='relu')(embedded_sequences) x = Conv1D(64, 3, activation='relu')(x) x = MaxPooling1D(2)(x) x=Flatten()(x) x = Dense(100, activation='relu')(x) preds = Dense(27, activation='softmax')(x) model = Model(sequence_input, preds) model.summary()

The model is compiled with categorical cross-entropy loss and rmsprop optimizer. Categorical cross-entropy is a loss function commonly used for multi-class classification tasks. The rmsprop optimizer is a gradient-based optimization technique that uses a moving average of squared gradients to normalize the gradient. This helps to overcome the vanishing gradients problem. Accuracy is used as the main performance metric. The model summary can be seen below :

Source: Author’s Jupyter Notebook

Model Training and Evaluation

The model is trained for 30 epochs with batch size 50.

s=0.0 for i in range (1,50):, y_train,batch_size=50, epochs=30, validation_data=(x_val, y_val)) # evaluate the model scores = model.evaluate(x_val, y_val, verbose=0) s=s+(scores[1]*100)

The model is evaluated by calculating its accuracy. Accuracy of classification is calculated by dividing the number of correct predictions by the total number of predictions.

# evaluate the model scores = model.evaluate(x_val, y_val, verbose=0) print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

The accuracy of our model comes out to be 94.87%! You can try improving the accuracy further by playing around with the model hyperparameters, further tuning the model architecture or changing the train-test split ratio.

  Using the model to classify a new unseen text command

We can use our trained model to classify new text commands not present in the dataset into one of the 26 different labels. Each new text has to be tokenized and padded before being fed as input to the model. The model.predict() function returns the probabilities of the data belonging to each of the 26 classes. The class with the greatest probability is the predicted class.

# new instance where we do not know the answer sequences_new = tokenizer.texts_to_sequences(Xnew) data = pad_sequences(sequences_new, maxlen=MAX_SEQUENCE_LENGTH) # make a prediction yprob = model.predict(data) yclasses=yprob.argmax(axis=-1) # show the inputs and predicted outputs print("X=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%snX=%s, Predicted=%s" % (Xnew[0], yclasses[0],Xnew[1],yclasses[1],Xnew[2],yclasses[2],Xnew[3],yclasses[3],Xnew[4],yclasses[4],Xnew[5],yclasses[5],Xnew[6],yclasses[6],Xnew[7],yclasses[7],Xnew[8],yclasses[8],Xnew[9],yclasses[9]))

The output from the above code is :

Source: Author’s Jupyter notebook

The output l


To conclude, Natural Language Processing is a continuously expanding field filled with emerging technologies and applications. It has a massive impact in areas like chatbots, social media monitoring, recommendation systems, machine translation, etc. Now, you have learned how to use CNNs for text classification, go ahead and try to apply them in other areas of Natural Language Processing. The results might end up surprising you!

Thank you for reading.

Read here about NPL using CNNs for Sentence Classification!

Connect at: [email protected]

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


Image Classification Using Convolutional Neural Network With Python

This article was published as a part of the Data Science Blogathon

Hello guys! In this blog, I am going to discuss everything about image classification.

In the past few years, Deep Learning has been proved that its a very powerful tool due to its ability to handle huge amounts of data. The use of hidden layers exceeds traditional techniques, especially for pattern recognition. One of the most popular Deep Neural Networks is Convolutional Neural Networks(CNN).

A convolutional neural network(CNN) is a type of Artificial Neural Network(ANN) used in image recognition and processing which is specially designed for processing data(pixels).

Before moving further we need to understand what is the neural network? Let’s go…

Neural Network:

A neural network is constructed from several interconnected nodes called “neurons”.  Neurons are arranged into the input layer, hidden layer, and output layer. The input layer corresponds to our predictors/features and the Output layer to our response variable/s.

Multi-Layer Perceptron(MLP):

The neural network with an input layer, one or more hidden layers, and one output layer is called a multi-layer perceptron (MLP). MLP is Invented by Frank Rosenblatt in the year of 1957. MLP given below has 5 input nodes, 5 hidden nodes with two hidden layers, and one output node

How does this Neural Network work?

– Input layer neurons receive incoming information from the data which they process and distribute to the hidden layers.

– That information, in turn, is processed by hidden layers and is passed to the output neurons.

– The information in this artificial neural network(ANN) is processed in terms of one activation function. This function actually imitates the brain neurons.

– Each neuron contains a value of activation functions and a threshold value.

– The threshold value is the minimum value that must be possessed by the input so that it can be activated.

– The task of the neuron is to perform a weighted sum of all the input signals and apply the activation function on the sum before passing it to the next(hidden or output) layer.

Let us understand what is weightage sum?

Say that, we have values 𝑎1, 𝑎2, 𝑎3, 𝑎4 for input and weights as 𝑤1, 𝑤2, 𝑤3, 𝑤4 as the input to one of the hidden layer neuron say 𝑛𝑗,  then the weighted sum is represented as

 𝑆𝑗 = σ 𝑖=1to4 𝑤𝑖*𝑎𝑖 + 𝑏𝑗 

where 𝑏𝑗 : bias due to node

What are the Activation Functions?

These functions are needed to introduce a non-linearity into the network. The activation function is applied and that output is passed to the next layer.

*Possible Functions*

• Sigmoid: Sigmoid function is differentiable. It produces output between 0 and 1.

• Hyperbolic Tangent: Hyperbolic Tangent is also differentiable. This Produces output between -1 and 1.

• ReLU: ReLU is Most popular function. ReLU is used widely in deep learning.

• Softmax: The softmax function is used for multi-class classification problems. It is a generalization of the sigmoid function. It also produces output between 0 and 1

Now, let’s go with our topic CNN…


Now imagine there is an image of a bird, and you want to identify it whether it is really a bird or something other. The first thing you should do is feed the pixels of the image in the form of arrays to the input layer of the neural network (MLP networks used to classify such things). The hidden layers carry Feature Extraction by performing various calculations and operations. There are multiple hidden layers like the convolution, the ReLU, and the pooling layer that performs feature extraction from your image. So finally, there is a fully connected layer that you can see which identifies the exact object in the image. You can understand very easily from the following figure:


Convolution Operation involves matrix arithmetic operations and every image is represented in the form of an array of values(pixels).

Let us understand example:

a = [2,5,8,4,7,9]

b = [1,2,3]

In Convolution Operation, the arrays are multiplied one by one element-wise, and the product is grouped or summed to create a new array that represents a*b.

The first three elements of matrix a are now multiplied by the elements of matrix b. The product is summed to get the result and stored in a new array of a*b.

This process remains continuous until the operation gets completed.


into a feed-forward neural network which is also called a Multi-Layer Perceptron.

Up to this point, we have seen concepts that are important for our building CNN model.

Now we will move forward to see a case study of CNN.

1) Here we are going to import the necessary libraries which are required for performing CNN tasks.

import NumPy as np %matplotlib inline import matplotlib.image as mpimg import matplotlib.pyplot as plt import TensorFlow as tf

2) Here we required the following code to form the CNN model

model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(16,(3,3),activation = "relu" , input_shape = (180,180,3)) , tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(32,(3,3),activation = "relu") , tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(64,(3,3),activation = "relu") , tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(128,(3,3),activation = "relu"), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Flatten(),  tf.keras.layers.Dense(550,activation="relu"),      #Adding the Hidden layer tf.keras.layers.Dropout(0.1,seed = 2023), tf.keras.layers.Dense(400,activation ="relu"), tf.keras.layers.Dropout(0.3,seed = 2023), tf.keras.layers.Dense(300,activation="relu"), tf.keras.layers.Dropout(0.4,seed = 2023), tf.keras.layers.Dense(200,activation ="relu"), tf.keras.layers.Dropout(0.2,seed = 2023), tf.keras.layers.Dense(5,activation = "softmax")   #Adding the Output Layer


A convoluted image can be too large and so it is reduced without losing features or patterns, so pooling is done.

Here Creating a Neural network is to initialize the network using the Sequential model from Keras.

Flatten()- Flattening transforms a two-dimensional matrix of features into a vector of features.

3) Now let’s see a summary of the CNN model


It will print the following output

Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 178, 178, 16) 448 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 89, 89, 16) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 87, 87, 32) 4640 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 43, 43, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 41, 41, 64) 18496 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 20, 20, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 18, 18, 128) 73856 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 9, 9, 128) 0 _________________________________________________________________ flatten (Flatten) (None, 10368) 0 _________________________________________________________________ dense (Dense) (None, 550) 5702950 _________________________________________________________________ dropout (Dropout) (None, 550) 0 _________________________________________________________________ dense_1 (Dense) (None, 400) 220400 _________________________________________________________________ dropout_1 (Dropout) (None, 400) 0 _________________________________________________________________ dense_2 (Dense) (None, 300) 120300 _________________________________________________________________ dropout_2 (Dropout) (None, 300) 0 _________________________________________________________________ dense_3 (Dense) (None, 200) 60200 _________________________________________________________________ dropout_3 (Dropout) (None, 200) 0 _________________________________________________________________ dense_4 (Dense) (None, 5) 1005 ================================================================= Total params: 6,202,295 Trainable params: 6,202,295 Non-trainable params: 0

4) So now we are required to specify optimizers.

from tensorflow.keras.optimizers import RMSprop,SGD,Adam adam=Adam(lr=0.001)

Optimizer is used to reduce the cost calculated by cross-entropy

The loss function is used to calculate the error

The metrics term is used to represent the efficiency of the model

5)In this step, we will see how to set the data directory and generate image data.

bs=30 #Setting batch size train_dir = "D:/Data Science/Image Datasets/FastFood/train/"   #Setting training directory validation_dir = "D:/Data Science/Image Datasets/FastFood/test/"   #Setting testing directory from tensorflow.keras.preprocessing.image import ImageDataGenerator # All images will be rescaled by 1./255. train_datagen = ImageDataGenerator( rescale = 1.0/255. ) test_datagen = ImageDataGenerator( rescale = 1.0/255. ) # Flow training images in batches of 20 using train_datagen generator #Flow_from_directory function lets the classifier directly identify the labels from the name of the directories the image lies in train_generator=train_datagen.flow_from_directory(train_dir,batch_size=bs,class_mode='categorical',target_size=(180,180)) # Flow validation images in batches of 20 using test_datagen generator validation_generator = test_datagen.flow_from_directory(validation_dir, batch_size=bs, class_mode = 'categorical', target_size=(180,180))

The output will be:

Found 1465 images belonging to 5 classes. Found 893 images belonging to 5 classes.

6) Final step of the fitting model.

history =, validation_data=validation_generator, steps_per_epoch=150 epochs=30, validation_steps=50 verbose=2)

The output will be:

Epoch 1/30 5/5 - 4s - loss: 0.8625 - acc: 0.6933 - val_loss: 1.1741 - val_acc: 0.5000 Epoch 2/30 5/5 - 3s - loss: 0.7539 - acc: 0.7467 - val_loss: 1.2036 - val_acc: 0.5333 Epoch 3/30 5/5 - 3s - loss: 0.7829 - acc: 0.7400 - val_loss: 1.2483 - val_acc: 0.5667 Epoch 4/30 5/5 - 3s - loss: 0.6823 - acc: 0.7867 - val_loss: 1.3290 - val_acc: 0.4333 Epoch 5/30 5/5 - 3s - loss: 0.6892 - acc: 0.7800 - val_loss: 1.6482 - val_acc: 0.4333 Epoch 6/30 5/5 - 3s - loss: 0.7903 - acc: 0.7467 - val_loss: 1.0440 - val_acc: 0.6333 Epoch 7/30 5/5 - 3s - loss: 0.5731 - acc: 0.8267 - val_loss: 1.5226 - val_acc: 0.5000 Epoch 8/30 5/5 - 3s - loss: 0.5949 - acc: 0.8333 - val_loss: 0.9984 - val_acc: 0.6667 Epoch 9/30 5/5 - 3s - loss: 0.6162 - acc: 0.8069 - val_loss: 1.1490 - val_acc: 0.5667 Epoch 10/30 5/5 - 3s - loss: 0.7509 - acc: 0.7600 - val_loss: 1.3168 - val_acc: 0.5000 Epoch 11/30 5/5 - 4s - loss: 0.6180 - acc: 0.7862 - val_loss: 1.1918 - val_acc: 0.7000 Epoch 12/30 5/5 - 3s - loss: 0.4936 - acc: 0.8467 - val_loss: 1.0488 - val_acc: 0.6333 Epoch 13/30 5/5 - 3s - loss: 0.4290 - acc: 0.8400 - val_loss: 0.9400 - val_acc: 0.6667 Epoch 14/30 5/5 - 3s - loss: 0.4205 - acc: 0.8533 - val_loss: 1.0716 - val_acc: 0.7000 Epoch 15/30 5/5 - 4s - loss: 0.5750 - acc: 0.8067 - val_loss: 1.2055 - val_acc: 0.6000 Epoch 16/30 5/5 - 4s - loss: 0.4080 - acc: 0.8533 - val_loss: 1.5014 - val_acc: 0.6667 Epoch 17/30 5/5 - 3s - loss: 0.3686 - acc: 0.8467 - val_loss: 1.0441 - val_acc: 0.5667 Epoch 18/30 5/5 - 3s - loss: 0.5474 - acc: 0.8067 - val_loss: 0.9662 - val_acc: 0.7333 Epoch 19/30 5/5 - 3s - loss: 0.5646 - acc: 0.8138 - val_loss: 0.9151 - val_acc: 0.7000 Epoch 20/30 5/5 - 4s - loss: 0.3579 - acc: 0.8800 - val_loss: 1.4184 - val_acc: 0.5667 Epoch 21/30 5/5 - 3s - loss: 0.3714 - acc: 0.8800 - val_loss: 2.0762 - val_acc: 0.6333 Epoch 22/30 5/5 - 3s - loss: 0.3654 - acc: 0.8933 - val_loss: 1.8273 - val_acc: 0.5667 Epoch 23/30 5/5 - 3s - loss: 0.3845 - acc: 0.8933 - val_loss: 1.0199 - val_acc: 0.7333 Epoch 24/30 5/5 - 3s - loss: 0.3356 - acc: 0.9000 - val_loss: 0.5168 - val_acc: 0.8333 Epoch 25/30 5/5 - 3s - loss: 0.3612 - acc: 0.8667 - val_loss: 1.7924 - val_acc: 0.5667 Epoch 26/30 5/5 - 3s - loss: 0.3075 - acc: 0.8867 - val_loss: 1.0720 - val_acc: 0.6667 Epoch 27/30 5/5 - 3s - loss: 0.2820 - acc: 0.9400 - val_loss: 2.2798 - val_acc: 0.5667 Epoch 28/30 5/5 - 3s - loss: 0.3606 - acc: 0.8621 - val_loss: 1.2423 - val_acc: 0.8000 Epoch 29/30 5/5 - 3s - loss: 0.2630 - acc: 0.9000 - val_loss: 1.4235 - val_acc: 0.6333 Epoch 30/30 5/5 - 3s - loss: 0.3790 - acc: 0.9000 - val_loss: 0.6173 - val_acc: 0.8000

The above function trains the neural network using the training set and evaluates its performance on the test set. The functions return two metrics for each epoch ‘acc’ and ‘val_acc’ which are the accuracy of predictions obtained in the training set and accuracy attained in the test set respectively.


Hence, we see that sufficient accuracy has been met. However, anyone can run this model by increasing the number of epochs or any other parameters.

I hope you liked my article. Do share with your friends, colleagues.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Plant Seedlings Classification Using Cnn – With Python Code

This article was published as a part of the Data Science Blogathon


Hello Readers!!

I mage. The dataset has 12 sets of images and our ultimate is to classify plant species from an image.

If you want to learn more about the dataset, check this Link. We are going to perform multiple steps such as importing the libraries and modules, reading images and resizing them, cleaning the images, preprocessing of images, model building, model training, reduce overfitting, and finally predictions on the testing dataset.

📌Check out my latest articles here

📌Solve Sudoku using Deep Learning, check here

                                                                                Image Source 














This dataset is provided by Aarhus University Signal Processing group. This is a typical image recognition problem statement. We have provided a dataset of images that has plant photos at various stages of growth. Each photo has its unique id and filename. The dataset contains 960 unique plants that are from 12 plant species. The final aim is to build a classifier that is capable to determine the plant species from a photo.

List of Species




Common Chickweed

Common wheat

Fat Hen

Loose Silky-bent


Scentless Mayweed

Shepherds Purse

Small-flowered Cranesbill

Sugar beet


First import all the necessary libraries for our further analysis. We are going to use NumPy, Pandas, matplotlib, OpenCV, Keras, and sci-kit-learn. Check the below commands for importing all the required libraries

import numpy as np # MATRIX OPERATIONS import pandas as pd # EFFICIENT DATA STRUCTURES import matplotlib.pyplot as plt # GRAPHING AND VISUALIZATIONS import math # MATHEMATICAL OPERATIONS import cv2 # IMAGE PROCESSING - OPENCV from glob import glob # FILE OPERATIONS import itertools # KERAS AND SKLEARN MODULES from keras.utils import np_utils from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import Flatten from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.layers import BatchNormalization from keras.callbacks import ModelCheckpoint,ReduceLROnPlateau,CSVLogger from sklearn import preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix # GLOBAL VARIABLES scale = 70 seed = 7 GETTING THE DATA AND RESIZING THE IMAGES 

For training our model, we need to read the data first. Our dataset has different sizes of images, so we are going to resize the images. Reading the data and resizing them are performed in a single step. Check the below code for complete information on how to perform different operations.

path_to_images = 'plant-seedlings-classification/train/png' images = glob(path_to_images) trainingset = [] traininglabels = [] num = len(images) count = 1 #READING IMAGES AND RESIZING THEM for i in images: print(str(count)+'/'+str(num),end='r') trainingset.append(cv2.resize(cv2.imread(i),(scale,scale))) traininglabels.append(i.split('/')[-2]) count=count+1 trainingset = np.asarray(trainingset) traininglabels = pd.DataFrame(traininglabels) CLEANING THE IMAGES AND REMOVING THE BACKGROUND 

It is a very important step to performing the cleaning. Cleaning an image is an intensive task. We will be performing the following steps in order to clean the images

Convert the RGB images into the HSV

In order to remove the noise, we will have to blur the images

In order to remove the background, we will have to create a mask.

new_train = [] sets = []; getEx = True for i in trainingset: blurr = cv2.GaussianBlur(i,(5,5),0) hsv = cv2.cvtColor(blurr,cv2.COLOR_BGR2HSV) #GREEN PARAMETERS lower = (25,40,50) upper = (75,255,255) mask = cv2.inRange(hsv,lower,upper) struc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(11,11)) mask = cv2.morphologyEx(mask,cv2.MORPH_CLOSE,struc) new = np.zeros_like(i,np.uint8) new[boolean] = i[boolean] new_train.append(new) if getEx: plt.subplot(2,3,1);plt.imshow(i) # ORIGINAL plt.subplot(2,3,2);plt.imshow(blurr) # BLURRED plt.subplot(2,3,3);plt.imshow(hsv) # HSV CONVERTED plt.subplot(2,3,4);plt.imshow(mask) # MASKED plt.subplot(2,3,5);plt.imshow(boolean) # BOOLEAN MASKED plt.subplot(2,3,6);plt.imshow(new) # NEW PROCESSED IMAGE getEx = False new_train = np.asarray(new_train) # CLEANED IMAGES for i in range(8): plt.subplot(2,4,i+1) plt.imshow(new_train[i]) CONVERTING THE LABELS INTO NUMBERS 

The labels are strings and these are hard to process. So we’ll convert these labels into a binary classification.

The classification can be represented by an array of 12 numbers which will follow the condition:

0 if the species is not detected.

1 if the species is detected.

Example: If Blackgrass is detected, the array will be = [1,0,0,0,0,0,0,0,0,0,0,0]

labels = preprocessing.LabelEncoder()[0]) print('Classes'+str(labels.classes_)) encodedlabels = labels.transform(traininglabels[0]) clearalllabels = np_utils.to_categorical(encodedlabels) classes = clearalllabels.shape[1] print(str(classes)) traininglabels[0].value_counts().plot(kind='pie') DEFINING OUR MODEL AND SPLITTING THE DATASET 

In this step, we are going to split the training dataset for validation. We are using the train_test_split() function from scikit-learn. Here we are splitting the dataset keeping the test_size=0.1. It means 10% of total data is used as testing data and the other 90% as training data. Check the below code for splitting the dataset.

new_train = new_train/255 x_train,x_test,y_train,y_test = train_test_split(new_train,clearalllabels,test_size=0.1,random_state=seed,stratify=clearalllabels) PREVENTING OVERFITTING 

Overfitting is a problem in machine learning in which our model performs very well on train g data but performs poorly on testing data.

The problem of overfitting is severe in deep learning where deep neural networks get overfitted. The problem of overfitting affects our end results badly.

To get rid of it, we need to reduce it. In this problem, we are using the ImageDataGenerator() function which randomly changes the characteristics of images and provides randomness in the data. To avoid overfitting, we need a function. This function randomly changes the image characteristics. Check the below code on how to reduce overfitting

generator = ImageDataGenerator(rotation_range = 180,zoom_range = 0.1,width_shift_range = 0.1,height_shift_range = 0.1,horizontal_flip = True,vertical_flip = True) DEFINING THE CONVOLUTIONAL NEURAL NETWORK 

Our dataset consists of images so we can’t use machine learning algorithms like linear regression, logistic regression, decision trees, etc. We need a deep neural network for the images. In this problem, we are going to use a convolutional neural network. This neural network will take images as input and it will provide the final output as a species value. We are randomly using 4 convolution layers and 3 fully connected layers. Also, We are using multiple functions like Sequential(), Conv2D(), Batch Normalization, Max Pooling, Dropout, and Flatting.

We are using a convolutional neural network for training.

This model has 4 convolution layers.

This model has 3 fully connected layers.

np.random.seed(seed) model = Sequential() model.add(Conv2D(filters=64, kernel_size=(5, 5), input_shape=(scale, scale, 3), activation='relu')) model.add(BatchNormalization(axis=3)) model.add(Conv2D(filters=64, kernel_size=(5, 5), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(BatchNormalization(axis=3)) model.add(Dropout(0.1)) model.add(Conv2D(filters=128, kernel_size=(5, 5), activation='relu')) model.add(BatchNormalization(axis=3)) model.add(Conv2D(filters=128, kernel_size=(5, 5), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(BatchNormalization(axis=3)) model.add(Dropout(0.1)) model.add(Conv2D(filters=256, kernel_size=(5, 5), activation='relu')) model.add(BatchNormalization(axis=3)) model.add(Conv2D(filters=256, kernel_size=(5, 5), activation='relu')) model.add(MaxPooling2D((2, 2))) model.add(BatchNormalization(axis=3)) model.add(Dropout(0.1)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(256, activation='relu')) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(classes, activation='softmax')) model.summary() FITTING THE CNN ONTO THE DATA 

Next is to fit the CNN model onto our dataset so that model will get learn from the training dataset and weights get updated. This trained CNN model can be further used to get the final predictions on our testing dataset. There are some pre-requirements that we have to follow like reducing the learning rate, find the best weights for the model and save these calculated weights so that we can use them further for testing and getting predictions.

We need the following as per our general knowledge

Best weights for the model

Reduce learning rate

Save the last weights of the model

lrr = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.4, min_lr=0.00001) filepath="drive/DataScience/PlantReco/weights.best_{epoch:02d}-{val_acc:.2f}.hdf5" checkpoints = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max') filepath="drive/DataScience/PlantReco/weights.last_auto4.hdf5" checkpoints_full = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=False, mode='max') callbacks_list = [checkpoints, lrr, checkpoints_full] #MODEL # hist = model.fit_generator(datagen.flow(trainX, trainY, batch_size=75), # epochs=35, validation_data=(testX, testY), # steps_per_epoch=trainX.shape[0], callbacks=callbacks_list) # LOADING MODEL model.load_weights("../input/plantrecomodels/weights.best_17-0.96.hdf5") dataset = np.load("../input/plantrecomodels/Data.npz") data = dict(zip(("x_train","x_test","y_train", "y_test"), (dataset[k] for k in dataset))) x_train = data['x_train'] x_test = data['x_test'] y_train = data['y_train'] y_test = data['y_test'] print(model.evaluate(x_train, y_train)) # Evaluate on train set print(model.evaluate(x_test, y_test)) # Evaluate on test set CONFUSION MATRIX 

A confusion matrix is a way to check how our model performs on data. It is a good way to analyse the error in the model. Check the below code for the confusion matrix

# PREDICTIONS y_pred = model.predict(x_test) y_class = np.argmax(y_pred, axis = 1) y_check = np.argmax(y_test, axis = 1) cmatrix = confusion_matrix(y_check, y_class) print(cmatrix) GETTING PREDICTIONS 

In the final part, we are getting our predictions on the testing dataset. Check the below code for getting the predictions using the trained model

path_to_test = '../input/plant-seedlings-classification/test/*.png' pics = glob(path_to_test) testimages = [] tests = [] count=1 num = len(pics) for i in pics: print(str(count)+'/'+str(num),end='r') tests.append(i.split('/')[-1]) testimages.append(cv2.resize(cv2.imread(i),(scale,scale))) count = count + 1 testimages = np.asarray(testimages) newtestimages = [] sets = [] getEx = True for i in testimages: blurr = cv2.GaussianBlur(i,(5,5),0) hsv = cv2.cvtColor(blurr,cv2.COLOR_BGR2HSV) lower = (25,40,50) upper = (75,255,255) mask = cv2.inRange(hsv,lower,upper) struc = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(11,11)) mask = cv2.morphologyEx(mask,cv2.MORPH_CLOSE,struc) masking = np.zeros_like(i,np.uint8) masking[boolean] = i[boolean] newtestimages.append(masking) if getEx: plt.subplot(2,3,1);plt.imshow(i) plt.subplot(2,3,2);plt.imshow(blurr) plt.subplot(2,3,3);plt.imshow(hsv) plt.subplot(2,3,4);plt.imshow(mask) plt.subplot(2,3,5);plt.imshow(boolean) plt.subplot(2,3,6);plt.imshow(masking) getEx=False newtestimages = np.asarray(newtestimages) # OTHER MASKED IMAGES for i in range(6): plt.subplot(2,3,i+1) plt.imshow(newtestimages[i]) Newtestimages=newtestimages/255 prediction = model.predict(newtestimages) # PREDICTION TO A CSV FILE pred = np.argmax(prediction,axis=1) predStr = labels.classes_[pred] result = {'file':tests,'species':predStr} result = pd.DataFrame(result) result.to_csv("Prediction.csv",index=False) End Notes

So in this article, we had a detailed discussion on Plants Seedlings Classification Using CNN. Hope you learn something from this blog and it will help you in the future. Thanks for reading and your patience. Good luck!

You can check my articles here: Articles

Email id: [email protected]

Connect with me on LinkedIn: LinkedIn.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.


Beginner’s Guide On How To Train A Classification Model With Tensorflow

This article was published as a part of the Data Science Blogathon


In this article, we will cover everything from gathering data to preparing the steps for model training and evaluation. Deep learning algorithms can have huge functional uses when provided with quality data to sort through. Diverse fields such as sales forecasting and extrapolation use deep learning algorithms to perfect their process. Fields such as the evaluation of skin diseases from image data also use deep learning to deliver results.

Deep learning and TensorFlow can be your best friends while creating projects using deep learning concepts. To understand the process of building a classification model using tabular datasets, keep reading this article.

Prerequisites that you may need:

TensorFlow 2+





The dataset that you use can make your life easy or give you endless headaches. Make sure that you have the right datasets for your projects. Kaggle contains clean, well-designed datasets that you can use to work on this project that we have covered in this article. Here, we have the wine quality dataset from Kaggle.

Kaggle Dataset for WIne

The dataset here is well designed. However, it doesn’t classify the wines as good or bad. Here, the wines are rated on a scale depending on their quality. To follow along, you may download it and take the CSV onto your machine. Next, you can open up JupyterLab. You may use any other IDE as well. However, we have worked on JupyterLab and will include screenshots from the same.

Phase One: Data Exploration and Preparation

First, you need to import Numpy and Pandas and then import the dataset as well. The code snippet given below is an example that you can follow. The code snippet also prints a random sample containing 5 rows.


import numpy as np import pandas as pd import as px import plotly.graph_objects as go df = pd.read_csv('winequalityN.csv') df.sample(5)


Here’s a look into what the dataset looks like right now:

To get to the results, we still have some more work to do.

Basic preparation

The dataset that we are working with has a few defects, but the problem is not so significant as there is a large sample of 4123 rows in total.




You can use a code similar to the one below to remove all the defects:


df = df.dropna() df.isna().sum()


All the features are numerical except for the type of column which can be either white wine or red wine. The following part of the code will convert that into a binary column known as “is_white_wine” where if the value is 1 then it is white wine or 0 when red wine.


df['is_white_wine'] = [ 1 if typ == 'white' else 0 for typ in df['type'] ] df.head()


So after adding the feature we also need to make the target variable binary and convert it fully into a binary classification problem.

Changing it to a problem of binary classification

All the wines in the dataset are graded from a range of 9 to 3 where a higher value denotes a better wine. The following code divides the types and quality and displays that in a graphical manner


white = df[df['type']=='white'] red = df[df['type'] == 'red'] fig = make_subplots(rows=1, cols=2, column_widths=[0.35, 0.35], subplot_titles=['White Wine Quality', 'Red Wine Quality']) fig.append_trace(go.Bar(x=white['quality'].value_counts().index, y=white['quality'].value_counts(), text = white['quality'].value_counts(), marker=dict( color='snow', line=dict(color='black', width=1) ), name='' ), 1,1 ) fig.append_trace(go.Bar(x=red['quality'].value_counts().index, y=red['quality'].value_counts(), text=red['quality'].value_counts(), marker=dict( color='coral', line=dict(color='red', width=1) ), name='' ), 1,2 ) fig.update_traces(textposition='outside') fig.update_layout(margin={'b':0,'l':0,'r':0,'t':100}, paper_bgcolor='rgb(248, 248, 255)', plot_bgcolor='rgb(248, 248, 255)', showlegend=False, title = {'font': { 'family':'monospace', 'size': 22, 'color':'grey'}, 'text':'Quality Distribution In Red & White Wine', 'x':0.50,'y':1})


We will simplify this and make or give a value of good or 1 if any wine has a grade higher than 6 and all other wines will be termed as bad or 0. The following code does the task.


df['is_good_wine'] = [ ] df.drop('quality', axis=1, inplace=True) df.drop('type', axis=1, inplace=True) df.head()


So now our dataset looks like this after all the transformation and changes and now we will move on to the next phase.

Phase Two: Training the classification model

We will stick to a general split rule of 80 and 20. The following code will do that task.


from sklearn.model_selection import train_test_split X = df.drop('is_good_wine', axis=1) y = df['is_good_wine'] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) X_train,y_train


After this, you will now have rows: 5170 in the training set. You will also have rows: 1293 in the testing set. To train your neural network model, this should be a decent amount needed. Before we begin training the data, we must also scale the data. Let’s do that now. You can follow along if you have all the prerequisites.

Scale the Data

or close enough. We may end up confusing the neural network that you’re trying to build if you leave the dataset like this. Here, we need to scale the data. We use StandardScaler from Scikit-Learn to fit and transform the data to make it ready for the model and as you can see that all the values have been scaled to a relative closer range which is now ready for our neural network.


from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) X_train_scaled


This is how the scaled data looks like.

The value range is much narrow and hence it is perfect for a neural network and now we move on to training it with Tensorflow

You need to remember a few things before you begin to train your model which is as follows.

Class Balance – If you do not have an equal amount of good and bad wines, then accuracy might not be the most accurate measure but precision and recall can be used to find the accuracy of the model

Loss Function – You should go with a binary cross-entropy as that is the best one to go for and you should not confuse it with categorical cross-entropy.

Now we will move on to defining the neural architecture and remember the above key points.

The Neural Network

The following architecture was chosen at random and hence you can adjust it to whatever you want to. This model here goes from 12 different input features to the first hidden layer of 128 neurons and then 2 more hidden layers of 256 neurons. Then it ends with 1 neuron at the end and the hidden layers ReLU as the activation function and the output layer is got by using a Sigmoid function. The following code demonstrates it.


import tensorflow as tf tf.random.set_seed(42) model = tf.keras.Sequential([ tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) loss=tf.keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(lr=0.03), metrics=[ tf.keras.metrics.BinaryAccuracy(name='accuracy'), tf.keras.metrics.Precision(name='precision'), tf.keras.metrics.Recall(name='recall') ] ) history =, y_train, epochs=100)


This image shows the final 5 epochs of the model. Each epoch on average takes around 1 second on google collab to get trained.

We also have kept track of the accuracy, loss, precision, and recall function during training and saved them to history. We can now visualize the various metrics so that we can get a sense of how the whole model is doing.

Phase Three: Visualisation and Evaluation of the classification model using TensorFlow

We will begin by first importing some important modules like Matplotlib and changing the settings a bit. The following code shows how to plot the results.


import matplotlib.pyplot as plt from matplotlib import rcParams rcParams['figure.figsize'] = (18, 8) rcParams[''] = False rcParams['axes.spines.right'] = False plt.plot( np.arange(1, 101), history.history['loss'], label='Loss' ) plt.plot( np.arange(1, 101), history.history['accuracy'], label='Accuracy' ) plt.plot( np.arange(1, 101), history.history['precision'], label='Precision' ) plt.plot( np.arange(1, 101), history.history['recall'], label='Recall' ) plt.title('Evaluation metrics', size=20) plt.xlabel('Epoch', size=14) plt.legend();


Note: Here we are plotting multiple lines together for the loss, accuracy, precision, and also recall. They all share the same X-Axis which is actually the corresponding epoch number. The normal behavior is that the loss should decrease and all the remaining parameters should increase.

Here in our model, we can see that it is following the trend and loss is decreasing as the other factors are increasing. There are some occasional spikes that would smoothen out if you were to train the model for more epochs. Since there is no formation of a plateau, you can still train the model for more epochs. The important question to solve next is whether if we are overfitting or not?

Predictions for Classification Model with TensorFlow

Now we move onto the prediction part where we will use the predict() function to predict the output on the scaled data of testing. The following code demonstrates it.


predictions = model.predict(X_test_scaled) predictions


You need to convert them to the corresponding classes and the logic is simple as if the result is more than 0.5, then we assign a value of 1 or a good wine and 0 otherwise which denotes a bad wine as shown by the following code to find the optimal threshold. We will first find the ROC_AUC score manually and also via an inbuilt function.


from sklearn.metrics import roc_curve from sklearn.metrics import auc from sklearn.metrics import roc_auc_score def plot_roc_curve(fpr, tpr): plt.plot(fpr, tpr, color='orange', label='ROC') plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver Operating Characteristic (ROC) Curve') plt.legend() # Computing manually fpr, tpr, thresholds and roc auc fpr, tpr, thresholds = roc_curve(y_test, predictions) roc_auc = auc(fpr, tpr) print("ROC_AUC Score : ",roc_auc) print("Function for ROC_AUC Score : ",roc_auc_score(y_test, predictions)) # Function present optimal_idx = np.argmax(tpr - fpr) optimal_threshold = thresholds[optimal_idx] print("Threshold value is:", optimal_threshold) plot_roc_curve(fpr, tpr)


ROC_AUC Score : 0.8337780313224288 Function for ROC_AUC Score : 0.8337780313224288 Threshold value is: 0.5035058

So now we have found the optimal threshold value, we will proceed to the next step.


prediction_classes = [ ] prediction_classes[:20]


These are how the first 20 data values of the output look like. Now we need to move on to the evaluation of the model. We will begin with the confusion matrix which can be found by the following code.


from sklearn.metrics import confusion_matrix print(confusion_matrix(y_test, prediction_classes))


Since there are more False Negatives, 185, than there are false positives, 109, hence we can deduce that the recall value of the test set will be lower than the precision. The below code can be used to print all the details like precision, accuracy, and recall on any test_set.


from sklearn.metrics import accuracy_score, precision_score, recall_score print(f'Accuracy: {accuracy_score(y_test, prediction_classes):.2f}') print(f'Precision: {precision_score(y_test, prediction_classes):.2f}') print(f'Recall: {recall_score(y_test, prediction_classes):.2f}')


As we can see that the model is slightly leaning on the side of overfitting but it is a decent model for a quick build and test. WIth more epochs and better data exploration, you can further enhance the model. You can find all the above codes in the following link.


So that is all you need to know on how to train and test a neural network that can classify and can be used for binary classification. The dataset we have used here is almost ready to be used and has very little preparation and work needed to be done on it but real-world data is often messier. There are some rooms for improvement and more training or training for a long time can make the model even better. Even adding layers to the model will help along with increasing the number of neurons. I hope now you can build your first Tensorflow model and begin coding right away and if you run into any roadblock, feel free to hit me up or drop a mail.

That’s all for today, you can find more articles by me here.

Arnab Mondal (LinkedIn)

[email protected]

Links to external images used :

The media shown in this article on Classification Model with TensorFlow is not owned by Analytics Vidhya and are used at the Author’s discretion.


Evaluating A Classification Model For Data Science

This article was published as a part of the Data Science Blogathon.

Machine Learning tasks are mainly divided into three types

Supervised Learning — In Supervised learning, the model is first trained using a Training set(it contains input-expected output pairs). This trained model can be later used to predict output for any unknown input.

Unsupervised Learning — In unsupervised learning, the model by itself tries to identify patterns in the training set.

Reinforcement Learning —  This is an altogether different type. Better not to talk about it.

Supervised learning task mainly consists of Regression & Classification. In Regression, the model predicts continuous variables whereas the model predicts class labels in Classification.

For this entire article, let’s assume you’re a Machine Learning Engineer working at Google. You are ordered to evaluate a handwritten alphabet recognizer. Train classifier model, training & test set are provided to you.

The first evaluation metric anyone would use is the “Accuracy” metric. Accuracy is the ratio of correct prediction count by total predictions made. But wait a minute . . .

Is Accuracy enough to evaluate a model?

Short answer: No

So why is accuracy not enough? you may ask

So there are four distinct possibilities as shown below

The above table is self-explanatory. But just for the sake of some revision let’s briefly discuss it.

If the model predicts “A” as an “A”, then the case is called True Positive.

If the model predicts “A” a “Not A”, then the case is called False Negative.

If the model predicts “Not A” as an “A”, then the case is called False Positive.

If the model predicts “Not A” as a “Not A”, then the case is called True Negative

Another easy way of remembering this is by referring to the below diagram.

As some of you may have already noticed, the Accuracy metric does not represent any information about False Positive, False Negative, etc. So there is substantial information loss as these may help us evaluate & upgrade our model.

Okay, so what are other useful evaluation metrics? Confusion Matrix for Evaluation of Classification Model

A confusion matrix is a n x n matrix (where n is the number of labels) used to describe the performance of a classification model. Each row in the confusion matrix represents an actual class whereas each column represents a predicted class.

2) Predicted Target labels

## dummy example from sklearn.metrics import confusion_matrix y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) >>> array([[2, 0, 0], [0, 0, 1],

           [1, 0, 2]])

We will take a tiny section of the confusion matrix above for a better understanding.

Precision =  TP/(TP+FP)

2) Predicted Target labels

## dummy example from sklearn.metrics import precision_score y_true = [0, 1, 1, 0, 1, 0] y_pred = [0, 0, 1, 0, 0, 1] precision_score(y_true, y_pred) >>> 0.5

Precision in itself will not be enough as a model can make just one correct positive prediction & return the rest as negative. So the precision will be 1/(1+0)=1. We need to use precision along with another metric called “Recall”.


Recall is also called “True Positive Rate” or “sensitivity”.

2) Predicted Target labels

## dummy example from sklearn.metrics import recall_score y_true = [0, 1, 1, 0, 1, 0] y_pred = [0, 0, 1, 0, 0, 1] recall_score(y_true, y_pred) >>> 0.333333 Hybrid of both

There is another classification metric that is a combination of both Recall & Precision. It is called the F1 score. It is the harmonic mean of recall & precision. The harmonic mean is more sensitive to low values, so the F1 will be high only when both precision & recall are high.

2) Predicted Target labels

## dummy example from sklearn.metrics import f1_score y_true = [[0, 0, 0], [1, 1, 1], [0, 1, 1]] y_pred = [[0, 0, 0], [1, 1, 1], [1, 1, 0]] f1_score(y_true, y_pred, average=None) >>> array([0.66666667, 1. , 0.66666667]) Ideal Recall or Precision

We can play with the classification model threshold to adjust recall or precision. In reality, there is no ideal recall or precision. It all depends on what kind of classification task is it. For example, in the case of a cancer detection system, you’ll prefer having high recall & low precision. Whereas in the case of an abusive word detector, you’ll prefer having high precision but low recall.

Precision/Recall Trade-off

Sadly, increasing recall will decrease precision & vice versa. This is called Precision/Recall Trade-off.

Precision & Recall vs Threshold

We can plot precision & recall vs threshold to get information about how their value changes according to the threshold. Here below is a dummy graph example.

## dummy example from sklearn.metrics import precision_recall_curve precisions, recalls, thresholds = precision_recall_curve(y_true, y_predicted) plt.plot(thresholds, precisions[:-1], "b--", label="Precision") plt.plot(thresholds, recalls[:-1], "g-", label="Recall")

As you can see as the threshold increases precision increases but at the cost of recall. From this graph, one can pick a suitable threshold as per their requirements.

Precision vs Recall

Another way to represent the Precision/Recall trade-off is to plot precision against recall directly. This can help you to pick a sweet spot for your model.

ROC Curve for Evaluation of Classification Model 

2. FPR is the ratio of Negative classes inaccurately being classified as positive.


Below is a dummy code for ROC curve.

from sklearn.metrics import roc_curve fpr, tpr, thresholds = roc_curve(y_true, y_predicted) plt.plot(fpr, tpr, linewidth=2, label=label) plt.plot([0, 1], [0, 1], 'k--')

In the below example graph, we have compared ROC curves for SGD & Random Forest Classifiers.

ROC curve is mainly used to evaluate and compare multiple learning models. As in the graph above, SGD & random forest models are compared. A perfect classifier will transit through the top-left corner. Any good classifier should be as far as possible from the straight line passing through (0,0) & (1,1). In the above graph, you can observe that the Random Forest model is working better compared to SGD.  PR curve is preferred over ROC curve when either the positive class is rare or you prioritize more about False Positive.



The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion


Update the detailed information about Approaching Classification With Neural Networks on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!