Trending December 2023 # Ml Interpretability Using Lime In R # Suggested January 2024 # Top 14 Popular

You are reading the article Ml Interpretability Using Lime In R updated in December 2023 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Ml Interpretability Using Lime In R

Overview

Merely building the model is not enough without stakeholders not being to interpret the outputs of your model

In this article, understand how to interpret your model using LIME in R

Introduction

I thought spending hours preprocessing the data is the most worthwhile thing in Data Science. That is what my misconception was, as a beginner. Now, I realize, that even more rewarding is being able to explain your predictions and model to a layman who does not understand much about machine learning or other jargon of the field.

Consider this scenario – your problem statement deals with predicting if a patient has cancer or not. Painstakingly, you obtain and clean the data, build a model on it, and after much effort, experimentation, and hyperparameter tuning, you arrive at an accuracy of over 90%. That’s great You walk up to a doctor and tell him that you can predict with 90% certainty that a patient has cancer or not.

However, one question the doctor asks that leaves you stumped – “How can I and the patient trust your prediction when each patient is different from the other and multiple parameters can decide between a malignant and a benign tumor?”

This is where model interpretability comes in – nowadays, there are multiple tools to help you explain your model and model predictions efficiently without getting into the nitty-gritty of the model’s cogs and wheels. These tools include SHAP, Eli5, LIME, etc. Today, we will be dealing with LIME.

In this article, I am going to explain LIME and how it makes interpreting your model easy in R.

What is LIME?

LIME stands for Local Interpretable Model-Agnostic Explanations. First introduced in 2023, the paper which proposed the LIME technique was aptly named “Why Should I Trust You?” Explaining the Predictions of Any Classifier by its authors, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.

Built on this basic but crucial tenet of trust, the idea behind LIME is to answer the ‘why’ of each prediction and of the entire model. The creators of LIME outline four basic criteria for explanations that must be satisfied:

The explanations for the predictions should be understandable, i.e. interpretable by the target demographic.

We should be able to explain individual predictions. The authors call this local fidelity

The method of explanation should be applicable to all models. This is termed by the authors as the explanation being model-agnostic

Along with the individual predictions, the model should be explainable in its entirety, i.e. global perspective should be considered

How does LIME work?

Expanding more on how LIME works, the main assumption behind it is that every model works like a simple linear model at the local scale, i.e. at individual row-level data. The paper and the authors do not set out to prove this, but we can go by the intuition that at an individual level, we can fit this simple model on the row and that its prediction will be very close to our complex model’s prediction for that row. Interesting isn’t it?

Further, LIME extends this phenomenon by fitting such simple models around small changes in this individual row and then extracting the important features by comparing the simple model and the complex model’s predictions for that row.

LIME works both on tabular/structured data and on text data as well.

You can read more on how LIME works using Python here, we will be covering how it works using R.

So fire up your Notebooks or R studio, and let us get started!

Using LIME in R

Step 1: The first step is to install LIME and all the other libraries which we will need for this project. If you have already installed them, you can skip this and start with Step 2

install.packages('lime') install.packages('MASS') install.packages("randomForest") install.packages('caret') install.packages('e1071')

Step 2: Once you have installed these libraries, we will first import them:

library(lime) library(MASS) library(randomForest) library(caret) library(e1071)

Since we took up the example of explaining the predictions of whether a patient has cancer or not, we will be using the biopsy dataset. This dataset contains information on 699 patients and their biopsies of breast cancer tumors.

Step 3: We will import this data and also have a look at the first few rows:

data(biopsy)

Step 4: Data Exploration

4.1) We will first remove the ID column since it is just an identifier and of no use to us

biopsy$ID <- NULL

4.2) Let us rename the rest of the columns so that while visualizing the explanations we have a clearer idea of the feature names as we understand the predictions using LIME.

names(biopsy) <- c('clump thickness', 'uniformity cell size', 'uniformity cell shape', 'marginal adhesion','single epithelial cell size', 'bare nuclei', 'bland chromatin', 'normal nucleoli', 'mitoses','class')

4.3) Next, we will check if there are any missing values. If so, we will first have to deal with them before proceeding any further.

sum(is.na(biopsy))

4.4) Now, here we have 2 options. We can either impute these values, or we can use the chúng tôi function to drop the rows containing missing values. We will be using the latter option since cleaning the data is beyond the scope of the article.

biopsy <- na.omit(biopsy) sum(is.na(biopsy))

Finally, let us confirm our dataframe by looking at the first few rows:

head(biopsy, 5)

Step 5: We will divide the dataset into train and test. We will check the dimensions of the data

## 75% of the sample size smp_size <- floor(0.75 * nrow(biopsy)) ## set the seed to make your partition reproducible - similar to random state in Python set.seed(123) train_ind <- sample(seq_len(nrow(biopsy)), size = smp_size) train_biopsy <- biopsy[train_ind, ] test_biopsy <- biopsy[-train_ind, ]

Let us check the dimensions:

cat(dim(train_biopsy), dim(test_biopsy))

Thus, there are 512 rows in the train set and 171 rows in the test set.

Step 6: We will be using a random forest model using the caret library. We will also not be performing any hyperparameter tuning, just a 10-fold CV repeated 5 times and a basic Random Forest model. So sit back, while we train and fit the model on our training set.

I encourage you to experiment with these parameters using other models as well

model_rf <- caret::train(class ~ ., data = train_biopsy,method = "rf", #random forest trControl = trainControl(method = "repeatedcv", number = 10,repeats = 5, verboseIter = FALSE))

Let us view the summary of our model

model_rf

Step 7: We will now apply the predict function of this model on our test set and build a confusion matrix

biopsy_rf_pred <- predict(model_rf, test_biopsy) confusionMatrix(biopsy_rf_pred, as.factor(test_biopsy$class))

Step 8: Now that we have our model, we will use LIME to create an explainer object. This object is associated with the rest of the LIME functions we will be using for viewing the explanations as well.

Just like we train the model and fit it on the data, we use the lime() function to train this explainer, and then new predictions are made using the explain() function

explainer <- lime(train_biopsy, model_rf)

Let us explain 5 new observations from the test set using only 5 of the features. Feel free to experiment with the n_features parameter. You can also pass

the entire test set, or

a single row of the test set

explanation <- explain(test_biopsy[15:20, ], explainer, n_labels = 1, n_features = 5)

The other parameters you can experiment with are:

n_permutations: The number of permutations to use for each explanation.

feature_select: The algorithm to use for selecting features. We can choose among

“auto”: If n_features <= 6 use "forward_selection" else use "highest_weights"

“none”: Ignore n_features and use all features.

“forward_selection”: Add one feature at a time until n_features is reached, based on the quality of a ridge regression model.

“highest_weights”: Fit a ridge regression and select the n_features with the highest absolute weight.

“lasso_path”: Fit a lasso model and choose the n_features whose lars path converge to zero at the latest.

“tree”: Fit a tree to select n_features (which needs to be a power of 2). It requires the last version of XGBoost.

    dist_fun: The distance function to use. We will use this to compare our local model prediction for a row and the global model(random forest) prediction for that row. The default is Gower’s distance but we can also use euclidean, manhattan, etc.

    kernel_width: The distances of the predictions of individual permutations with the global predictions are calculated from above, and converted to a similarity score.

    Step 9: Let us visualize this explanation for a better understanding:

    How to interpret and explain this result?

    Blue/Red color: Features that have positive correlations with the target are shown in blue, negatively correlated features are shown in red.

    Uniformity cell shape <=1.5: lower values positively correlate with a benign tumor.

    Bare nuclei <= 7: lower bare nuclei values negatively correlate with a malignant tumor.

    Cases 65, 67, and 70 are similar, while the benign case 64 has unusual parameters

    The uniformity of cell shape and the single epithelial cell size are unusual in this case.

    Despite these deviating values, the tumor is still benign, indicating that the other parameter values of this case compensate for this abnormality.

    Let us visualize a single case as well with all the features:

    explanation <- explain(test_biopsy[93, ], explainer, n_labels = 1, n_features = 10) plot_features(explanation)

    On the contrary, uniformity of cell size <= 5.0 and marginal adhesion <= 4: low values of these 2 parameters contribute negatively to the malignancy with a malignant tumor. Thus, the lower these values are, the lesser the chances of the tumor being malignant.

    Thus, from the above, we can conclude that higher values of the parameters would indicate that a tumor has more chances of being malignant.

    We can confirm the above explanations by looking at the actual data in this row:

    End Notes

    Concluding, we explored LIME and how to use it to interpret the individual results of our model. These explanations make for better storytelling and help us to explain why certain predictions were made by the model to a person who might have domain expertise, but no technical know-how of model building. Moreover, using it is pretty much effortless and requires only a few lines of code after we have our final model.

    However, this is not to say that LIME has no drawbacks. The LIME Cran package we have used is not a direct replication of the original Python implementation that we were presented with the paper and thus, does not support image data like its Python counterpart. Another drawback could be that the local model might not always be accurate.

    I look forward to exploring more on LIME using different datasets and models, as well, exploring other techniques in R. Which tools have you used to interpret your model in R? Do share how you used them and your experiences with LIME below!

    Related

    You're reading Ml Interpretability Using Lime In R

    Reproducible Ml Reports Using Yaml Configs (With Codes)

    This article was published as a part of the Data Science Blogathon

    Research is to see what everybody else has seen and to think what

    nobody else has thought – Albert  Szent-Gyorgyi

    Introduction:

    The data science lifecycle is an iterative process where every step is visited again and again at various stages. This is mainly due to the research /experiment-based approach the field demands and most times, there is no right or wrong result. Every result has its relevance based on the data, approach, assumption made along the way, the factors considered/skipped, etc.

    Finally, the approach which gives us relatively better results and the one that makes business sense makes it to production. But the cycle doesn’t stop there, even post-production one needs to constantly monitor the model performance and make revisions as often as appropriate.

    As the business has realized the importance of data and the benefits of its right usage, the size of the data science teams has increased over the years. More teams are carrying out various experiments, revisions, and optimizations. It can become very complex in no time if a process is not brought in the place where every experiment is tracked, measured and results documented for reference. This goes a long way in avoiding redundant research and experiments.

    To achieve this, replicability and reproducibility place an important role i.e is the ability to perform data analysis and achieve the same results as someone else.

    Why do we need reproducible reports?

    In this article, we will explore the process of building and managing machine learning reports using configuration files and generate HTML reports. For this simple machine learning project, I will use the Breast Cancer Wisconsin (Diagnostic) Data Set. The objective of this ML project is to predict whether a person has a benign or malignant tumour.

    Let’s get started !!

    We will first conventionally build a classification model.

    We will build the same model using the YAML configuration file.

    Finally, we will generate an HTML report and save it.

    Classification model – without config file:

    Let’s create a Jupyter notebook by name notebook.ipynb and have the below code in it. I am using VSCode as my editor, it gives a nice and easier way to create a Jupyter notebook.

    #mport important packages import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier import joblib #path to the dataset #filename = "../Data/breast-cancer-wisconsin.data" filename = "./Data/breast-cancer-wisconsin.csv" #load data data = pd.read_csv(filename) #replace "?" with -99999 data = data.replace('?', -99999) # drop id column data = data.drop(['id'], axis=1) # Define X (independent variables) and y (target variable) X = data.drop(['diagnosis','Unnamed: 32'], axis=1) y = data['diagnosis'] #split data into train and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # call our classifer and fit to our data classifier = KNeighborsClassifier(n_neighbors=5, weights="uniform", algorithm = "auto", leaf_size = 25, p=1, metric="minkowski", n_jobs=-1) #training the classifier classifier.fit(X_train, y_train) #test our classifier result = classifier.score(X_test, y_test) print("Accuracy score is. {:.1f}".format(result)) #save our classifier in the model directory joblib.dump(classifier, './Model/knn.pkl')

    If you notice, in the above code there are various hardcoded numbers, file names, model parameters, train/test split percentage, etc. If you wish to experiment then you can make changes in the code and re-run it.

    YAML file formats are popular for their ease of readability. YAML is relatively easy to write and within simple YAML files, there are no data formatting items, such as braces and square brackets; most of the relations between items are defined using indentation.

    Let’s create our config file in YAML – Config.YAML

    #INITIAL SETTINGS data_directory: "./Data/" data_name: "breast-cancer-wisconsin.csv" drop_columns: ["id","Unnamed: 32"] target_name: "diagnosis" test_size: 0.3 random_state: 123 model_directory: "./Model" model_name: KNN_classifier.pkl #kNN parameters n_neighbors: 3 weights: uniform algorithm: auto leaf_size: 15 p: 2 metric: minkowski n_jobs: 1

    Now that we have built our model the conventional way, let’s move to the next section where we will do it slightly differently.

    Classification model – with a config file: 

    There are two major changes compared to the last approach.

    Loading and reading of the YAML file.

    Replacing all the hardcoded parameters with variables from the YAML config file.

    Let’s the below chunk of code to notebook.ipynb  which will load the Config.yaml.

    # folder to load config file CONFIG_PATH = "./" # Function to load yaml configuration file def load_config(config_name): with open(os.path.join(CONFIG_PATH, config_name)) as file: config = yaml.safe_load(file) return config config = load_config("Config.yaml")

    Now, let’s proceed to replace the hardcoded parameter with variables from the config file. For example, we will modify the train/test split code.

    # split data into train and test set X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=config["test_size"], random_state=config["random_state"] )

    Here are the changes we made:

    The test_size = 0.2 is replaced with config[“test_size”]

    The random state = 42  is replaced with config[“random_state”]

    After making similar changes across, the final file would look like this.

    # Import important packages import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier import joblib import os import yaml # folder to load config file CONFIG_PATH = "./" # Function to load yaml configuration file def load_config(config_name): with open(os.path.join(CONFIG_PATH, config_name)) as file: config = yaml.safe_load(file) return config config = load_config("Config.yaml") # load data data = pd.read_csv(os.path.join(config["data_directory"], config["data_name"])) # replace "?" with -99999 data = data.replace("?", -99999) # drop id column data = data.drop(config["drop_columns"], axis=1) # Define X (independent variables) and y (target variable) X = np.array(data.drop(config["target_name"], 1)) y = np.array(data[config["target_name"]]) # split data into train and test set X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=config["test_size"], random_state= config["random_state"] ) # call our classifer and fit to our data classifier = KNeighborsClassifier( n_neighbors=config["n_neighbors"], weights=config["weights"], algorithm=config["algorithm"], leaf_size=config["leaf_size"], p=config["p"], metric=config["metric"], n_jobs=config["n_jobs"], ) # training the classifier classifier.fit(X_train, y_train) # test our classifier result = classifier.score(X_test, y_test) print("Accuracy score is. {:.1f}".format(result)) # save our classifier in the model directory joblib.dump(classifier, os.path.join(config["model_directory"], config["model_name"]))

    You can find the entire code on Github.

    So far, we have successfully built a classification model, built a YAML config file, loaded the config file on Jupyter notebook, and parameterized our entire code. Now, if you make changes to the config file and run the notebook.ipynb, you will see the model results very similar to the conventional approach.

    We will move to the last section where we will generate a report of everything we have done so far.

    Generating reports:

    Here are the steps to be followed to generate the report:

    Open the terminal as administrator and navigate to your project folder.

    We will be using nbconvert library for report generation. If it is not installed then do a pip install nbconvert or conda install nbconvert

    Type jupyter nbconvert –execute –to html notebook.ipynb  in the terminal. The –execute executes all the cells in the Jupyter notebook.

    A chúng tôi file will be generated and saved in your project folder.

    If you wish to experiment on your model then instead of making changes in your code directly, make changes to your chúng tôi and follow the above steps to generate the report.

    Conclusion:

    Now we understand the importance of using a configuration file in a Machine learning project. In this article, we learned what is a configuration file, the importance of the configuration file in your machine learning project, how to create a YAML file and use it in your ML project. Now you can start using the configuration file in your next machine learning project.

    If you learned something new or enjoyed reading this article, please share it so that others can see it.

    Happy learnings !!!!

    You can connect with me – Linkedin

    You can find the code for reference – Github

    References:

    The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

    Related

    Build High Performance Time Series Models Using Auto Arima In Python And R

    Introduction

    Picture this – You’ve been tasked with forecasting the price of the next iPhone and have been provided with historical data. This includes features like quarterly sales, month-on-month expenditure, and a whole host of things that come with Apple’s balance sheet. As a data scientist, which kind of problem would you classify this as? Time series modeling, of course.

    From predicting the sales of a product to estimating the electricity usage of households, time series forecasting is one of the core skills any data scientist is expected to know, if not master. There are a plethora of different techniques out there which you can use, and we will be covering one of the most effective ones, called Auto ARIMA, in this article.

    We will first understand the concept of ARIMA which will lead us to our main topic – Auto ARIMA. To solidify our concepts, we will take up a dataset and implement it in both Python and R.

    If you are familiar with time series and it’s techniques (like moving average, exponential smoothing, and ARIMA),  you can skip directly to section 4. For beginners, start from the below section which is a brief introduction to time series and various forecasting techniques.

    What is a time series ?

    Before we learn about the techniques to work on time series data, we must first understand what a time series actually is and how is it different from any other kind of data. Here is the formal definition of time series – It is a series of data points measured at consistent time intervals. This simply means that particular values are recorded at a constant interval which may be hourly, daily, weekly, every 10 days, and so on. What makes time series different is that each data point in the series is dependent on the previous data points. Let us understand the difference more clearly by taking a couple of examples.

    Example 1:

    Suppose you have a dataset of people who have taken a loan from a particular company (as shown in the table below). Do you think each row will be related to the previous rows? Certainly not! The loan taken by a person will be based on his financial conditions and needs (there could be other factors such as the family size etc., but for simplicity we are considering only income and loan type) . Also, the data was not collected at any specific time interval. It depends on when the company received a request for the loan.

    Example 2:

    Let’s take another example. Suppose you have a dataset that contains the level of CO2 in the air per day (screenshot below). Will you be able to predict the approximate amount of CO2 for the next day by looking at the values from the past few days? Well, of course. If you observe, the data has been recorded on a daily basis, that is, the time interval is constant (24 hours).

    You must have got an intuition about this by now – the first case is a simple regression problem and the second is a time series problem. Although the time series puzzle here can also be solved using linear regression, but that isn’t really the best approach as it neglects the relation of the values with all the relative past values. Let’s now look at some of the common techniques used for solving time series problems.

    Methods for time series forecasting

    There are a number of methods for time series forecasting and we will briefly cover them in this section. The detailed explanation and python codes for all the below mentioned techniques can be found in this article: 7 techniques for time series forecasting (with python codes).

    Naive Approach: In this forecasting technique, the value of the new data point is predicted to be equal to the previous data point. The result would be a flat line, since all new values take the previous values.

    Simple Average: The next value is taken as the average of all the previous values. The predictions here are better than the ‘Naive Approach’ as it doesn’t result in a flat line but here, all the past values are taken into consideration which might not always be useful. For instance, when asked to predict today’s temperature, you would consider the last 7 days’ temperature rather than the temperature a month ago.

    Moving Average : This is an improvement over the previous technique. Instead of taking the average of all the previous points, the average of ‘n’ previous points is taken to be the predicted value.

    Weighted moving average : A weighted moving average is a moving average where the past ‘n’ values are given different weights.

    Simple Exponential Smoothing: In this technique, larger weights are assigned to more recent observations than to observations from the distant past.

    Holt’s Linear Trend Model: This method takes into account the trend of the dataset. By trend, we mean the increasing or decreasing nature of the series. Suppose the number of bookings in a hotel increases every year, then we can say that the number of bookings show an increasing trend. The forecast function in this method is a function of level and trend.

    Holt Winters Method: This algorithm takes into account both the trend and the seasonality of the series. For instance – the number of bookings in a hotel is high on weekends & low on weekdays, and increases every year; there exists a weekly seasonality and an increasing trend.

    ARIMA: ARIMA is a very popular technique for time series modeling. It describes the correlation between data points and takes into account the difference of the values. An improvement over ARIMA is SARIMA (or seasonal ARIMA). We will look at ARIMA in a bit more detail in the following section.

    Introduction to ARIMA

    In this section we will do a quick introduction to ARIMA which will be helpful in understanding Auto Arima. A detailed explanation of Arima, parameters (p,q,d), plots (ACF PACF) and implementation is included in this article : Complete tutorial to Time Series.

    ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following assumptions –

    The data series is stationary, which means that the mean and variance should not vary with time. A series can be made stationary by using log transformation or differencing the series.

    The data provided as input must be a univariate series, since arima uses the past values to predict the future values.

    ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving average term). Let us understand each of these components –

    AR term refers to the past values used for forecasting the next value. The AR term is defined by the parameter ‘p’ in arima. The value of ‘p’ is determined using the PACF plot.

    MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in arima represents the MA term. ACF plot is used to identify the correct ‘q’ value.

    Order of differencing  specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to determine whether the series is stationary and help in identifying the d value.

    Steps for ARIMA implementation

    The general steps to implement an ARIMA model are –

    Load the data

    The first step for model building is of course to load the dataset

    Preprocessing

    Depending on the dataset, the steps of preprocessing will be defined. This will include creating timestamps, converting the dtype of date/time column, making the series univariate, etc.

    Make series stationary

    In order to satisfy the assumption, it is necessary to make the series stationary. This would include checking the stationarity of the series and performing required transformations

    Determine d value

    For making the series stationary, the number of times the difference operation was performed will be taken as the d value

    Create ACF and PACF plots

    This is the most important step in ARIMA implementation. ACF PACF plots are used to determine the input parameters for our ARIMA model

    Determine the p and q values

    Read the values of p and q from the plots in the previous step

    Fit ARIMA model

    Using the processed data and parameter values we calculated from the previous steps, fit the ARIMA model

    Predict values on validation set

    Predict the future values

    Calculate RMSE

    To check the performance of the model, check the RMSE value using the predictions and actual values on the validation set

    What is Auto ARIMA?

    Auto ARIMA (Auto-Regressive Integrated Moving Average) is a statistical algorithm used for time series forecasting. It automatically determines the optimal parameters for an ARIMA model, such as the order of differencing, autoregressive (AR) terms, and moving average (MA) terms. Auto ARIMA searches through different combinations of these parameters to find the best fit for the given time series data. This automated process saves time and effort, making it easier for users to generate accurate forecasts without requiring extensive knowledge of time series analysis.

    Why do we need Auto ARIMA?

    Although ARIMA is a very powerful model for forecasting time series data, the data preparation and parameter tuning processes end up being really time consuming. Before implementing ARIMA, you need to make the series stationary, and determine the values of p and q using the plots we discussed above. Auto ARIMA makes this task really simple for us as it eliminates steps 3 to 6 we saw in the previous section. Below are the steps you should follow for implementing auto ARIMA:

    Load the data: This step will be the same. Load the data into your notebook

    Preprocessing data: The input should be univariate, hence drop the other columns

    Fit Auto ARIMA: Fit the model on the univariate series

    Predict values on validation set: Make predictions on the validation set

    Calculate RMSE: Check the performance of the model using the predicted values against the actual values

    We completely bypassed the selection of p and q feature as you can see. What a relief! In the next section, we will implement auto ARIMA using a toy dataset.

    Implementation in Python and R

    

    #building the model from pyramid.arima import auto_arima model = auto_arima(train, trace=True, error_action='ignore', suppress_warnings=True) model.fit(train) forecast = model.predict(n_periods=len(valid)) forecast = pd.DataFrame(forecast,index = valid.index,columns=['Prediction']) #plot the predictions for validation set plt.plot(train, label='Train') plt.plot(valid, label='Valid') plt.plot(forecast, label='Prediction') plt.show() #calculate rmse from math import sqrt from sklearn.metrics import mean_squared_error rms = sqrt(mean_squared_error(valid,forecast)) print(rms) output - 76.51355764316357

    Below is the R Code for the same problem:

    # loading packages library(forecast) library(Metrics) # reading data data = read.csv("international-airline-passengers.csv") # splitting data into train and valid sets train = data[1:100,] valid = data[101:nrow(data),] # removing "Month" column train$Month = NULL # training model model = auto.arima(train) # model summary summary(model) # forecasting forecast = predict(model,44) # evaluation rmse(valid$International.airline.passengers, forecast$pred) How does Auto Arima select the best parameters

    In the above code, we simply used the .fit() command to fit the model without having to select the combination of p, q, d. But how did the model figure out the best combination of these parameters? Auto ARIMA takes into account the AIC and BIC values generated (as you can see in the code) to determine the best combination of parameters. AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) values are estimators to compare models. The lower these values, the better is the model.

    Check out these links if you are interested in the maths behind AIC and BIC.

    Frequently Asked Questions

    I have found auto ARIMA to be the simplest technique for performing time series forecasting. Knowing a shortcut is good but being familiar with the math behind it is also important. In this article I have skimmed through the details of how ARIMA works but do make sure that you go through the links provided in the article. For your easy reference, here are the links again:

    I would suggest practicing what we have learned here on this practice problem: Time Series Practice Problem. You can also take our training course created on the same practice problem, Time series forecasting, to provide you a head start.

    Related

    How To Work Ai/Ml And Edge Computing In Iot

    The “edge” is magical. For example, environmental science studies habitat borders where certain plant varieties grow strongly at the edge and not further. Similar phenomena have been observed in astronomy at the edges of the universe.

    Human societies are no exception. A new revolution is underway with high computing powers that move to the edge, a phenomenon becoming increasingly known as Edge Computing.

    Industry 4.0, IoT

    IoT is about collecting and analyzing data, insights, automation, and automating processes that involve machines, people, things, places, and other objects. IoT is therefore a mixture of sensors, actuators, and connectivity. It also includes storage and cloud computing.

    Industry 4.0, also called the fourth industrial revolution, is heavily dependent on IoT tech. It will reshape automotive, transport, and healthcare as well as commerce. Their merging is a natural phenomenon, as AI and ML are the driving forces behind all 5G/IoT innovations.

    5G supports strong IoT

    One of the most important innovations in 5G is its strong support for IoT. This includes support for low-cost and long-battery-life sensors.

    Data storage and computing will become a ‘fabric continuum’. That fabric touches all industries and a slew of AI, ML and Edge Compute solutions will eventually discover their optimum zones and roles depending on the use case.

    Hyperscalers

    Google, Amazon, Microsoft and other cloud service providers have introduced a new type of IaaS (Infrastructure as a Service) with manageability and transparency and competitive pricing.

    Also read:

    How to choose The Perfect Domain Name

    Connectivity

    Edge Computing

    Some other transformations can be seen more clearly in terms of player engagements. Telecom operators using Hyperscalers (mentioned previously) and the transition of Operations Technology (OT, such as machine control systems moving to the IT domain).

    In the hopes that they can be used and paid for by consumers and enterprises, IoT applications are becoming more popular. It is evident that it will be most effective to deliver it from the closest location to the customer for a variety of reasons. This zone is also a hub of activity.

    All these players seek to control the space between the enterprise and the consumers.

    This implies a shift from the Capex to an Opex model which will have an impact on enterprise financial models.

    Edge Cloud in IoT World

    This can be seen as a positive change in the value chain. In what is now known as the Edge Cloud industry, the cloud is moving towards the edge.

    An easy explanation of Edge Cloud

    The Edge Cloud is simply a combination of smart edge devices, including sensors, nodes, and gateways, with software (algorithms and security stacks. Connectivity modules. Sensing and actuating parts. and A processor in a full stack) to manage hundreds of sensors per gateway.

    Where does the tech for the cloud come from?

    These technologies are available and are being used in scalable ways. The protocols are standardized to allow data to be offloaded to the cloud for non-critical data processing but to retain real-time processing at Edge Cloud.

    Can AI and ML be stopped?

    We expect millions of IoT devices and services to use Edge Clouds to help human life as Silicon prices fall and increase in volume. AI/ML is moving toward the edge is an unstoppable process, and in fact, it’s a requirement.

    These technologies have clear benefits for the following industries:

    RT processing ensures low latency responses (improves safety, reduces defect rate)

    Data security at the local level

    Data sorting, filtering and pre-processing (eases cloud load).

    Combining licensed and unlicensed spectrum results in more efficient transport systems.

    AI image processing, object detection and audio/video recognition capabilities also have improved dramatically. These capabilities are now available with some Silicon vendors as an additional capability. As deployments increase, we expect them to be more common and better priced.

    Is this Edge Cloud real?

    Are there overhypes about Edge Computing? There are many assumptions that underlie this incredible boom in Edge cloud theory.

    It is not clear if Edge Cloud will grow as big as it seems. It is a highly-growth industry, there is some evidence. The ownership of this service is still being determined. Telco versus Hyperscaler, specialist enterprise system integrator, or Cloud Edge specialist.

    Edge Cloud is not a one-size-fits-all solution. It is not clear what applications will require Edge Cloud capabilities, such as those in manufacturing RPA or healthcare. Companies are still evaluating business cases and pricing models.

    Edge Computing is a solution searching for an application. Are consumers less willing to spend because of the short round trip delay? Is it possible to confuse edge storage with edge computing, and treat all applications the same?

    What about the sunk costs of central data centers? They could be underutilized, or even end up storing non-critical data. If Edge Computing is all about Real-Time, then a 5G slice could be an important part of the solution. What business model will be sustainable between hyperscalers and telcos?

    Conclusion

    These new revelations will be revealed soon. As the space develops, it will be more obvious how AI/ML technology will align in the new world order. This could lead to new positions for service providers and their alliances with the solution providers.

    AI/ML will eventually combine to create a unique landscape that will attract users and consumers over the next decade.

    How To Create Polynomial Regression Model In R?

    A Polynomial regression model is the type of model in which the dependent variable does not have linear relationship with the independent variables rather they have nth degree relationship. For example, a dependent variable x can depend on an independent variable y-square. There are two ways to create a polynomial regression in R, first one is using polym function and second one is using I() function.

    Example1 set.seed(322) x1<−rnorm(20,1,0.5) x2<−rnorm(20,5,0.98) y1<−rnorm(20,8,2.15) Method1 Model1<−lm(y1~polym(x1,x2,degree=2,raw=TRUE)) summary(Model1) Output Call: lm(formula = y1 ~ polym(x1, x2, degree = 2, raw = TRUE)) Residuals: Min 1Q Median 3Q Max −4.2038 −0.7669 −0.2619 1.2505 6.8684 Coefficients: (Intercept) 11.2809 17.0298 0.662 0.518 polym(x1, x2, degree = 2, raw = TRUE)1.0 −2.9603 6.5583 −0.451 0.659 polym(x1, x2, degree = 2, raw = TRUE)2.0 1.9913 1.9570 1.017 0.326 polym(x1, x2, degree = 2, raw = TRUE)0.1 −1.3573 6.1738 −0.220 0.829 polym(x1, x2, degree = 2, raw = TRUE)1.1 −0.5574 1.2127 −0.460 0.653 polym(x1, x2, degree = 2, raw = TRUE)0.2 0.2383 0.5876 0.406 0.691 Residual standard error: 2.721 on 14 degrees of freedom Multiple R−squared: 0.205, Adjusted R−squared: −0.0789 F−statistic: 0.7221 on 5 and 14 DF, p−value: 0.6178 Method2 Model_1_M2<−lm(y1 ~ x1 + x2 + I(x1^2) + I(x2^2) + x1:x2) summary(Model_1_M2) Output Call: lm(formula = y1 ~ x1 + x2 + I(x1^2) + I(x2^2) + x1:x2) Residuals: Min 1Q Median 3Q Max −4.2038 −0.7669 −0.2619 1.2505 6.8684 Coefficients: (Intercept) 11.2809 17.0298 0.662 0.518 x1 −2.9603 6.5583 −0.451 0.659 x2 −1.3573 6.1738 −0.220 0.829 I(x1^2) 1.9913 1.9570 1.017 0.326 I(x2^2) 0.2383 0.5876 0.406 0.691 x1:x2 −0.5574 1.2127 −0.460 0.653 Residual standard error: 2.721 on 14 degrees of freedom Multiple R−squared: 0.205, Adjusted R−squared: −0.0789 F−statistic: 0.7221 on 5 and 14 DF, p−value: 0.6178 Example2

    Third degree polynomial regression model −

    Model1_3degree<−lm(y1~polym(x1,x2,degree=3,raw=TRUE)) summary(Model1_3degree) Output Call: lm(formula = y1 ~ polym(x1, x2, degree = 3, raw = TRUE)) Residuals: Min 1Q Median 3Q Max −4.4845 −0.8435 −0.2514 0.8108 6.7156 Coefficients: (Intercept) 63.0178 115.9156 0.544 0.599 polym(x1, x2, degree = 3, raw = TRUE)1.0 33.3374 83.3353 0.400 0.698 polym(x1, x2, degree = 3, raw = TRUE)2.0 −10.2012 42.4193 −0.240 0.815 polym(x1, x2, degree = 3, raw = TRUE)3.0 −1.4147 6.4873 −0.218 0.832 polym(x1, x2, degree = 3, raw = TRUE)0.1 −42.6725 72.9322 −0.585 0.571 polym(x1, x2, degree = 3, raw = TRUE)1.1 −8.9795 22.7650 −0.394 0.702 polym(x1, x2, degree = 3, raw = TRUE)2.1 2.8923 7.6277 0.379 0.712 polym(x1, x2, degree = 3, raw = TRUE)0.2 9.6863 14.2095 0.682 0.511 polym(x1, x2, degree = 3, raw = TRUE)1.2 0.2289 2.6744 0.086 0.933 polym(x1, x2, degree = 3, raw = TRUE)0.3 −0.6544 0.8341 −0.785 0.451 Residual standard error: 3.055 on 10 degrees of freedom Multiple R−squared: 0.2841, Adjusted R−squared: −0.3602 F−statistic: 0.441 on 9 and 10 DF, p−value: 0.8833 Example3 Model1_4degree<−lm(y1~polym(x1,x2,degree=4,raw=TRUE)) summary(Model1_4degree) Output Call: lm(formula = y1 ~ polym(x1, x2, degree = 4, raw = TRUE)) Residuals: 1 2 3 4 5 6 7 8 4.59666 −0.41485 −0.62921 −0.62414 −0.49045 2.15614 −0.42311 −0.12903 9 10 11 12 13 14 15 16 2.27613 0.60005 −1.94649 1.79153 0.01765 0.03866 −1.40706 0.85596 17 18 19 20 0.51553 −3.71274 0.05606 −3.12731 Coefficients: (Intercept) −1114.793 2124.374 −0.525 0.622 polym(x1, x2, degree = 4, raw = TRUE)1.0 −263.858 2131.701 −0.124 0.906 polym(x1, x2, degree = 4, raw = TRUE)2.0 −267.502 1250.139 −0.214 0.839 polym(x1, x2, degree = 4, raw = TRUE)3.0 317.739 433.932 0.732 0.497 polym(x1, x2, degree = 4, raw = TRUE)4.0 −6.803 40.546 −0.168 0.873 polym(x1, x2, degree = 4, raw = TRUE)0.1 967.989 2009.940 0.482 0.650 polym(x1, x2, degree = 4, raw = TRUE)1.1 256.227 869.447 0.295 0.780 polym(x1, x2, degree = 4, raw = TRUE)2.1 −125.888 473.845 −0.266 0.801 polym(x1, x2, degree = 4, raw = TRUE)3.1 −59.450 70.623 −0.842 0.438 polym(x1, x2, degree = 4, raw = TRUE)0.2 −314.183 674.159 −0.466 0.661 polym(x1, x2, degree = 4, raw = TRUE)1.2 −18.033 112.576 −0.160 0.879 polym(x1, x2, degree = 4, raw = TRUE)2.2 34.781 57.232 0.608 0.570 polym(x1, x2, degree = 4, raw = TRUE)0.3 41.854 91.862 0.456 0.668 polym(x1, x2, degree = 4, raw = TRUE)1.3 −4.360 9.895 −0.441 0.678 polym(x1, x2, degree = 4, raw = TRUE)0.4 −1.763 4.178 −0.422 0.690 Residual standard error: 3.64 on 5 degrees of freedom Multiple R−squared: 0.4917, Adjusted R−squared: −0.9315 F−statistic: 0.3455 on 14 and 5 DF, p−value: 0.9466

    Exploring Recommendation System (With An Implementation Model In R)

    Introduction

    We are sub-consciously exposed to recommendation systems when we visit websites such as Amazon, Netflix, imdb and many more. Apparently, they have become an integral part of online marketing (pushing products online). Let’s learn more about them here.

    In this article, I’ve explained the working of recommendation system using a real life example, just to show you this is not limited to online marketing. It is being used by all industries. Also, we’ll learn about its various types followed by a practical exercise in R. The term ‘recommendation engine’ & ‘recommendation system’ has been used interchangeably. Don’t get confused!

    Recommendation System in Banks – Example

    Today, every industry is making full use of recommendation systems with their own tailored versions. Let’s take banking industry for an example.

    Bank X wants to make use of the transactions information and accordingly customize the offers they provide to their existing credit and debit card users. Here is what the end state of such analysis looks like:

    Customer Z walks in to a Pizza Hut. He pays the food bill through bank Xs card. Using all the past transaction information, bank X knows that Customer Z likes to have an ice cream after his pizza. Using this transaction information at pizza hut, bank has located the exact location of the customer.  Next, it finds 5 ice cream stores which are close enough to the customer and 3 of which have ties with bank X.

    This is the interesting part. Now, here are the deals with these ice-cream store:

    Store 5 : Bank profit – $4, Customer expense – $11, Propensity of customer to respond – 20%

    Let’s assume the marked prize is proportional to the desire of customer to have that ice-cream. Hence, customer struggles with the trade-off that whether to fulfil his desire at the extra cost or buy the cheaper ice cream. Bank X wants the customer to go to store 3,4 or 5 (higher profits). It can increase the propensity of the customer to respond if it gives him a reasonable deal. Let’s assume that discounts are always whole numbers. For now, the expected value was :

    Expected value = 20%*{2 + 2 + 5 + 6 + 4 } = $ 19/5 = $3.8

    Can we increase the expected value by giving out discounts. Here is how the propensity varies at store (3,4,5) varies :

    Store 3 : Discount of $1 increases propensity by 5%, a discount of $2 by 7.5% and a discount of $3 by 10%

    Store 4 : Discount of $1 increases propensity by 25%, a discount of $2 by 30%, a discount of $3 by 35% and a discount of $4 by 80%

    Store 5 : No change with any discount

    Banks cannot give multiple offers at the same time with competing merchants. You need to assume that an increase in ones propensity gives equal percentage point decrease in all other propensity. Here is the calculation for the most intuitive case – Give a discount of $2 at store 4.

    Expected value = 50%/4 * (2 +  2 + 5 + 4) + 50% * 5 = $ 13/8 + $2.5 = $1.6 + $2.5 = $4.1 

    Think Box : Is there any better option available which can give bank a higher profit? I’d be interested to know!

    You see, making recommendations isn’t about extracting data, writing codes and be done with it. Instead, it requires mathematics (apparently), logical thinking and a flair to use a programming language. Trust me, third one is the easiest of all. Feeling confident? Let’s proceed.

    What exactly is the work of a recommendation engine?

    Previous example would have given you a fair idea. It’s time to make it crystal clear. Let’s understand what all a recommendation engine can do in context of previous example (Bank X):

    It finds out the merchants/Items which a customer might be interested into after buying something else.

    It estimates the profit & loss if many competing items can be recommended to the customer. Now based on the profile of the customer, recommends a customer centric or product centric offering. For a high value customer, which other banks are also interested to gain wallet share, you might want to bring out best of your offers.

    It can enhance customer engagement by providing offers which can be highly appealing to the customer. Such that, (s)he might have purchased the item anyway but with an additional offer, the bank might win his/her interest of such attributing customer.

    What are the types of Recommender Engines ?

    There are broadly two types of recommender engines and based on the industry we make this choice. We have explained each of these algorithms in our previous articles, but here I try to put a practical explanation to help you understand them easily.

    I’ve explained these algorithms in context of the industry they are used in and what makes them apt for these industries.

    Context based algorithms:

    As the name suggest, these algorithms are strongly based on driving the context of the item. Once you have gathered this context level information on items, you try to find look alike items and recommend them. For instance on Youtube, you can find genre, language, starring of a video. Now based on these information we can find look alike (related) of these videos. Once we have look alike, we simply recommend these videos to a customer who originally saw the first video only. Such algorithms are very common in video online channels, song online stores etc. Plausible reason being, such context level information is far easier to get when the product/item can be explained with few dimensions.

    Collaborative filtering algorithms:

    This is one of the most commonly used algorithm because it is not dependent on any additional information. All you need is the transaction level information of the industry. For instance: e-commerce player like Amazon and banks like American Express often use these algorithm to make merchant/product recommendations. Further, there are several types of collaborative filtering algorithms :

    User-User Collaborative filtering: Here we find look alike customer to every customer and offer products which first customer’s look alike has chosen in past. This algorithm is very effective but takes a lot of time and resources since it requires to compute every customer pair information. Therefore, for big base platforms, this algorithm is hard to implement without a very strong parallelizable system.

    Item-Item Collaborative filtering: It is quite similar to previous algorithm, but instead of finding customer look alike, we try finding item look alike. Once we have item look alike matrix, we can easily recommend alike items to customer who have purchased any item from the store. This algorithm is far less resource consuming than user-user collaborative filtering. Hence, for a new customer the algorithm takes far lesser time than user-user collaborate as we don’t need all similarity scores between customers. And with fixed number of products, product-product look alike matrix is fixed over time.

    Other simpler algorithms: There are other approaches like market basket analysis, which generally do not have high predictive power than last algorithms.

    How do we decide the performance metric of such engines?

    Good question! We must know that performance metrics are strongly driven by business objectives. Generally, there are three possible metrics which you might want to optimise:

    Based on dollar value:

    If your overall aim is to increase profit/revenue metric using recommendation engine, then your evaluation metric should be incremental revenue/profit/sale with each recommended rank. Each rank should have an unbroken order and the average revenue/profit/sale should be over the expected benefit over the offer cost.

    Based on propensity to respond:

    If the aim is just to activate customers, or make customers explore new items/merchant, this metric might be very helpful. Here you need to track the response rate of the customer with each rank.

    Based on number of transactions:

    Some times you are interested in activity of the customer. For higher activity, customer needs to do higher number of transactions. So we track number of transaction for the recommended ranks.

    Other metrics:

     There are other metrics which you might be interested in like satisfaction rate or number of calls to call centre etc. These metrics are rarely used as they generally won’t give you results for the entire portfolio but sample.

    Building an Item-Item collaborative filtering Recommendation Engine using R

    Let’s get some hands-on experience building a recommendation engine. Here, I’ve demonstrated building an item-item collaborative filter recommendation engine. The data contains just 2 columns namely individual_merchant and individual_customer. The data is available to download – Download Now.

     }

    End Notes

    Recommended engines have become extremely common because they solve one of the commonly found business case for all industries. Substitute to these recommendation engine are very difficult because they predict for multiple items/merchant at the same time. Classification algorithms struggle to take in so many classes as the output variable.

    In this article, we learnt about the use of recommendation systems in Banks. We also looked at implementing a recommendation engine in R. No doubt, they are being used across all sectors of industry, with a common aim to enhance customer experience.

    You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

    Related

    Update the detailed information about Ml Interpretability Using Lime In R on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!