Trending March 2024 # Build High Performance Time Series Models Using Auto Arima In Python And R # Suggested April 2024 # Top 9 Popular

You are reading the article Build High Performance Time Series Models Using Auto Arima In Python And R updated in March 2024 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Build High Performance Time Series Models Using Auto Arima In Python And R

Introduction

Picture this – You’ve been tasked with forecasting the price of the next iPhone and have been provided with historical data. This includes features like quarterly sales, month-on-month expenditure, and a whole host of things that come with Apple’s balance sheet. As a data scientist, which kind of problem would you classify this as? Time series modeling, of course.

From predicting the sales of a product to estimating the electricity usage of households, time series forecasting is one of the core skills any data scientist is expected to know, if not master. There are a plethora of different techniques out there which you can use, and we will be covering one of the most effective ones, called Auto ARIMA, in this article.

We will first understand the concept of ARIMA which will lead us to our main topic – Auto ARIMA. To solidify our concepts, we will take up a dataset and implement it in both Python and R.

If you are familiar with time series and it’s techniques (like moving average, exponential smoothing, and ARIMA),  you can skip directly to section 4. For beginners, start from the below section which is a brief introduction to time series and various forecasting techniques.

What is a time series ?

Before we learn about the techniques to work on time series data, we must first understand what a time series actually is and how is it different from any other kind of data. Here is the formal definition of time series – It is a series of data points measured at consistent time intervals. This simply means that particular values are recorded at a constant interval which may be hourly, daily, weekly, every 10 days, and so on. What makes time series different is that each data point in the series is dependent on the previous data points. Let us understand the difference more clearly by taking a couple of examples.

Example 1:

Suppose you have a dataset of people who have taken a loan from a particular company (as shown in the table below). Do you think each row will be related to the previous rows? Certainly not! The loan taken by a person will be based on his financial conditions and needs (there could be other factors such as the family size etc., but for simplicity we are considering only income and loan type) . Also, the data was not collected at any specific time interval. It depends on when the company received a request for the loan.

Example 2:

Let’s take another example. Suppose you have a dataset that contains the level of CO2 in the air per day (screenshot below). Will you be able to predict the approximate amount of CO2 for the next day by looking at the values from the past few days? Well, of course. If you observe, the data has been recorded on a daily basis, that is, the time interval is constant (24 hours).

You must have got an intuition about this by now – the first case is a simple regression problem and the second is a time series problem. Although the time series puzzle here can also be solved using linear regression, but that isn’t really the best approach as it neglects the relation of the values with all the relative past values. Let’s now look at some of the common techniques used for solving time series problems.

Methods for time series forecasting

There are a number of methods for time series forecasting and we will briefly cover them in this section. The detailed explanation and python codes for all the below mentioned techniques can be found in this article: 7 techniques for time series forecasting (with python codes).

Naive Approach: In this forecasting technique, the value of the new data point is predicted to be equal to the previous data point. The result would be a flat line, since all new values take the previous values.

Simple Average: The next value is taken as the average of all the previous values. The predictions here are better than the ‘Naive Approach’ as it doesn’t result in a flat line but here, all the past values are taken into consideration which might not always be useful. For instance, when asked to predict today’s temperature, you would consider the last 7 days’ temperature rather than the temperature a month ago.

Moving Average : This is an improvement over the previous technique. Instead of taking the average of all the previous points, the average of ‘n’ previous points is taken to be the predicted value.

Weighted moving average : A weighted moving average is a moving average where the past ‘n’ values are given different weights.

Simple Exponential Smoothing: In this technique, larger weights are assigned to more recent observations than to observations from the distant past.

Holt’s Linear Trend Model: This method takes into account the trend of the dataset. By trend, we mean the increasing or decreasing nature of the series. Suppose the number of bookings in a hotel increases every year, then we can say that the number of bookings show an increasing trend. The forecast function in this method is a function of level and trend.

Holt Winters Method: This algorithm takes into account both the trend and the seasonality of the series. For instance – the number of bookings in a hotel is high on weekends & low on weekdays, and increases every year; there exists a weekly seasonality and an increasing trend.

ARIMA: ARIMA is a very popular technique for time series modeling. It describes the correlation between data points and takes into account the difference of the values. An improvement over ARIMA is SARIMA (or seasonal ARIMA). We will look at ARIMA in a bit more detail in the following section.

Introduction to ARIMA

In this section we will do a quick introduction to ARIMA which will be helpful in understanding Auto Arima. A detailed explanation of Arima, parameters (p,q,d), plots (ACF PACF) and implementation is included in this article : Complete tutorial to Time Series.

ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following assumptions –

The data series is stationary, which means that the mean and variance should not vary with time. A series can be made stationary by using log transformation or differencing the series.

The data provided as input must be a univariate series, since arima uses the past values to predict the future values.

ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving average term). Let us understand each of these components –

AR term refers to the past values used for forecasting the next value. The AR term is defined by the parameter ‘p’ in arima. The value of ‘p’ is determined using the PACF plot.

MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in arima represents the MA term. ACF plot is used to identify the correct ‘q’ value.

Order of differencing  specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to determine whether the series is stationary and help in identifying the d value.

Steps for ARIMA implementation

The general steps to implement an ARIMA model are –

Load the data

The first step for model building is of course to load the dataset

Preprocessing

Depending on the dataset, the steps of preprocessing will be defined. This will include creating timestamps, converting the dtype of date/time column, making the series univariate, etc.

Make series stationary

In order to satisfy the assumption, it is necessary to make the series stationary. This would include checking the stationarity of the series and performing required transformations

Determine d value

For making the series stationary, the number of times the difference operation was performed will be taken as the d value

Create ACF and PACF plots

This is the most important step in ARIMA implementation. ACF PACF plots are used to determine the input parameters for our ARIMA model

Determine the p and q values

Read the values of p and q from the plots in the previous step

Fit ARIMA model

Using the processed data and parameter values we calculated from the previous steps, fit the ARIMA model

Predict values on validation set

Predict the future values

Calculate RMSE

To check the performance of the model, check the RMSE value using the predictions and actual values on the validation set

What is Auto ARIMA?

Auto ARIMA (Auto-Regressive Integrated Moving Average) is a statistical algorithm used for time series forecasting. It automatically determines the optimal parameters for an ARIMA model, such as the order of differencing, autoregressive (AR) terms, and moving average (MA) terms. Auto ARIMA searches through different combinations of these parameters to find the best fit for the given time series data. This automated process saves time and effort, making it easier for users to generate accurate forecasts without requiring extensive knowledge of time series analysis.

Why do we need Auto ARIMA?

Although ARIMA is a very powerful model for forecasting time series data, the data preparation and parameter tuning processes end up being really time consuming. Before implementing ARIMA, you need to make the series stationary, and determine the values of p and q using the plots we discussed above. Auto ARIMA makes this task really simple for us as it eliminates steps 3 to 6 we saw in the previous section. Below are the steps you should follow for implementing auto ARIMA:

Load the data: This step will be the same. Load the data into your notebook

Preprocessing data: The input should be univariate, hence drop the other columns

Fit Auto ARIMA: Fit the model on the univariate series

Predict values on validation set: Make predictions on the validation set

Calculate RMSE: Check the performance of the model using the predicted values against the actual values

We completely bypassed the selection of p and q feature as you can see. What a relief! In the next section, we will implement auto ARIMA using a toy dataset.

Implementation in Python and R



#building the model from pyramid.arima import auto_arima model = auto_arima(train, trace=True, error_action='ignore', suppress_warnings=True) model.fit(train) forecast = model.predict(n_periods=len(valid)) forecast = pd.DataFrame(forecast,index = valid.index,columns=['Prediction']) #plot the predictions for validation set plt.plot(train, label='Train') plt.plot(valid, label='Valid') plt.plot(forecast, label='Prediction') plt.show() #calculate rmse from math import sqrt from sklearn.metrics import mean_squared_error rms = sqrt(mean_squared_error(valid,forecast)) print(rms) output - 76.51355764316357

Below is the R Code for the same problem:

# loading packages library(forecast) library(Metrics) # reading data data = read.csv("international-airline-passengers.csv") # splitting data into train and valid sets train = data[1:100,] valid = data[101:nrow(data),] # removing "Month" column train$Month = NULL # training model model = auto.arima(train) # model summary summary(model) # forecasting forecast = predict(model,44) # evaluation rmse(valid$International.airline.passengers, forecast$pred) How does Auto Arima select the best parameters

In the above code, we simply used the .fit() command to fit the model without having to select the combination of p, q, d. But how did the model figure out the best combination of these parameters? Auto ARIMA takes into account the AIC and BIC values generated (as you can see in the code) to determine the best combination of parameters. AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) values are estimators to compare models. The lower these values, the better is the model.

Check out these links if you are interested in the maths behind AIC and BIC.

Frequently Asked Questions

I have found auto ARIMA to be the simplest technique for performing time series forecasting. Knowing a shortcut is good but being familiar with the math behind it is also important. In this article I have skimmed through the details of how ARIMA works but do make sure that you go through the links provided in the article. For your easy reference, here are the links again:

I would suggest practicing what we have learned here on this practice problem: Time Series Practice Problem. You can also take our training course created on the same practice problem, Time series forecasting, to provide you a head start.

Related

You're reading Build High Performance Time Series Models Using Auto Arima In Python And R

Ai With Python – Analyzing Time Series Data

AI with Python – Analyzing Time Series Data

Predicting the next in a given input sequence is another important concept in machine learning. This chapter gives you a detailed explanation about analyzing time series data.

Introduction

Time series data means the data that is in a series of particular time intervals. If we want to build sequence prediction in machine learning, then we have to deal with sequential data and time. Series data is an abstract of sequential data. Ordering of data is an important feature of sequential data.

Basic Concept of Sequence Analysis or Time Series Analysis

Sequence analysis or time series analysis is to predict the next in a given input sequence based on the previously observed. The prediction can be of anything that may come next: a symbol, a number, next day weather, next term in speech etc. Sequence analysis can be very handy in applications such as stock market analysis, weather forecasting, and product recommendations.

Example

Consider the following example to understand sequence prediction. Here A,B,C,D are the given values and you have to predict the value E using a Sequence Prediction Model.

Installing Useful Packages

For time series data analysis using Python, we need to install the following packages −

Pandas

Pandas is an open source BSD-licensed library which provides high-performance, ease of data structure usage and data analysis tools for Python. You can install Pandas with the help of the following command −

pip install pandas

If you are using Anaconda and want to install by using the conda package manager, then you can use the following command −

conda install -c anaconda pandas hmmlearn

It is an open source BSD-licensed library which consists of simple algorithms and models to learn Hidden Markov Models(HMM) in Python. You can install it with the help of the following command −

pip install hmmlearn

If you are using Anaconda and want to install by using the conda package manager, then you can use the following command −

conda install -c omnia hmmlearn PyStruct

It is a structured learning and prediction library. Learning algorithms implemented in PyStruct have names such as conditional random fields(CRF), Maximum-Margin Markov Random Networks (M3N) or structural support vector machines. You can install it with the help of the following command −

pip install pystruct CVXOPT

It is used for convex optimization based on Python programming language. It is also a free software package. You can install it with the help of following command −

pip install cvxopt

If you are using Anaconda and want to install by using the conda package manager, then you can use the following command −

conda install -c anaconda cvdoxt Pandas: Handling, Slicing and Extracting Statistic from Time Series Data

Pandas is a very useful tool if you have to work with time series data. With the help of Pandas, you can perform the following −

Create a range of dates by using the pd.date_range package

Index pandas with dates by using the pd.Series package

Perform re-sampling by using the ts.resample package

Change the frequency

Example

The following example shows you handling and slicing the time series data by using Pandas. Note that here we are using the Monthly Arctic Oscillation data, which can be downloaded from monthly.ao.index.b50.current.ascii and can be converted to text format for our use.

Handling time series data

For handling time series data, you will have to perform the following steps −

The first step involves importing the following packages −

import numpy as np import matplotlib.pyplot as plt import pandas as pd

Next, define a function which will read the data from the input file, as shown in the code given below −

def read_data(input_file): input_data = np.loadtxt(input_file, delimiter = None)

Now, convert this data to time series. For this, create the range of dates of our time series. In this example, we keep one month as frequency of data. Our file is having the data which starts from January 1950.

dates = pd.date_range('1950-01', periods = input_data.shape[0], freq = 'M')

In this step, we create the time series data with the help of Pandas Series, as shown below −

output = pd.Series(input_data[:, index], index = dates) return output if __name__=='__main__':

Enter the path of the input file as shown here −

input_file = "/Users/admin/AO.txt"

Now, convert the column to timeseries format, as shown here −

timeseries = read_data(input_file)

Finally, plot and visualize the data, using the commands shown −

plt.figure() timeseries.plot() plt.show()

You will observe the plots as shown in the following images −

Slicing time series data

Slicing involves retrieving only some part of the time series data. As a part of the example, we are slicing the data only from 1980 to 1990. Observe the following code that performs this task −

timeseries['1980':'1990'].plot() plt.show()

When you run the code for slicing the time series data, you can observe the following graph as shown in the image here −

Extracting Statistic from Time Series Data

You will have to extract some statistics from a given data, in cases where you need to draw some important conclusion. Mean, variance, correlation, maximum value, and minimum value are some of such statistics. You can use the following code if you want to extract such statistics from a given time series data −

Mean

You can use the mean() function, for finding the mean, as shown here −

timeseries.mean()

Then the output that you will observe for the example discussed is −

-0.11143128165238671 Maximum

You can use the max() function, for finding maximum, as shown here −

timeseries.max()

Then the output that you will observe for the example discussed is −

3.4952999999999999 Minimum

You can use the min() function, for finding minimum, as shown here −

timeseries.min()

Then the output that you will observe for the example discussed is −

-4.2656999999999998 Getting everything at once

If you want to calculate all statistics at a time, you can use the describe() function as shown here −

timeseries.describe()

Then the output that you will observe for the example discussed is −

count 817.000000 mean -0.111431 std 1.003151 min -4.265700 25% -0.649430 50% -0.042744 75% 0.475720 max 3.495300 dtype: float64 Re-sampling

You can resample the data to a different time frequency. The two parameters for performing re-sampling are −

Time period

Method

Re-sampling with mean()

You can use the following code to resample the data with the mean()method, which is the default method −

timeseries_mm = timeseries.resample("A").mean() timeseries_mm.plot(style = 'g--') plt.show()

Then, you can observe the following graph as the output of resampling using mean() −

Re-sampling with median()

You can use the following code to resample the data using the median()method −

timeseries_mm = timeseries.resample("A").median() timeseries_mm.plot() plt.show()

Then, you can observe the following graph as the output of re-sampling with median() −

Rolling Mean

You can use the following code to calculate the rolling (moving) mean −

timeseries.rolling(window = 12, center = False).mean().plot(style = '-g') plt.show()

Then, you can observe the following graph as the output of the rolling (moving) mean −

Analyzing Sequential Data by Hidden Markov Model (HMM)

HMM is a statistic model which is widely used for data having continuation and extensibility such as time series stock market analysis, health checkup, and speech recognition. This section deals in detail with analyzing sequential data using Hidden Markov Model (HMM).

Hidden Markov Model (HMM)

HMM is a stochastic model which is built upon the concept of Markov chain based on the assumption that probability of future stats depends only on the current process state rather any state that preceded it. For example, when tossing a coin, we cannot say that the result of the fifth toss will be a head. This is because a coin does not have any memory and the next result does not depend on the previous result.

Mathematically, HMM consists of the following variables −

States (S)

It is a set of hidden or latent states present in a HMM. It is denoted by S.

Output symbols (O)

It is a set of possible output symbols present in a HMM. It is denoted by O.

State Transition Probability Matrix (A)

It is the probability of making transition from one state to each of the other states. It is denoted by A.

Observation Emission Probability Matrix (B)

It is the probability of emitting/observing a symbol at a particular state. It is denoted by B.

Prior Probability Matrix (Π)

It is the probability of starting at a particular state from various states of the system. It is denoted by Π.

Hence, a HMM may be defined as 𝝀 = (S,O,A,B,𝝅),

where,

S = {s1,s2,…,sN} is a set of N possible states,

O = {o1,o2,…,oM} is a set of M possible observation symbols,

A is an N𝒙N state Transition Probability Matrix (TPM),

B is an N𝒙M observation or Emission Probability Matrix (EPM),

π is an N dimensional initial state probability distribution vector.

Example: Analysis of Stock Market data

In this example, we are going to analyze the data of stock market, step by step, to get an idea about how the HMM works with sequential or time series data. Please note that we are implementing this example in Python.

Import the necessary packages as shown below −

import datetime import warnings

Now, use the stock market data from the matpotlib.finance package, as shown here −

import numpy as np from matplotlib import cm, pyplot as plt from matplotlib.dates import YearLocator, MonthLocator try: from matplotlib.finance import quotes_historical_yahoo_och1 except ImportError: from matplotlib.finance import ( quotes_historical_yahoo as quotes_historical_yahoo_och1) from chúng tôi import GaussianHMM

Load the data from a start date and end date, i.e., between two specific dates as shown here −

start_date = datetime.date(1995, 10, 10) end_date = datetime.date(2024, 4, 25) quotes = quotes_historical_yahoo_och1('INTC', start_date, end_date)

In this step, we will extract the closing quotes every day. For this, use the following command −

closing_quotes = np.array([quote[2] for quote in quotes])

Now, we will extract the volume of shares traded every day. For this, use the following command −

volumes = np.array([quote[5] for quote in quotes])[1:]

Here, take the percentage difference of closing stock prices, using the code shown below −

diff_percentages = 100.0 * np.diff(closing_quotes) / closing_quotes[:-] dates = np.array([quote[0] for quote in quotes], dtype = np.int)[1:] training_data = np.column_stack([diff_percentages, volumes])

In this step, create and train the Gaussian HMM. For this, use the following code −

hmm = GaussianHMM(n_components = 7, covariance_type = 'diag', n_iter = 1000) with warnings.catch_warnings(): warnings.simplefilter('ignore') hmm.fit(training_data)

Now, generate data using the HMM model, using the commands shown −

num_samples = 300 samples, _ = hmm.sample(num_samples)

Finally, in this step, we plot and visualize the difference percentage and volume of shares traded as output in the form of graph.

Use the following code to plot and visualize the difference percentages −

plt.figure() plt.title('Difference percentages') plt.plot(np.arange(num_samples), samples[:, 0], c = 'black')

Use the following code to plot and visualize the volume of shares traded −

plt.figure() plt.title('Volume of shares') plt.plot(np.arange(num_samples), samples[:, 1], c = 'black') plt.ylim(ymin = 0) plt.show()

Advertisements

Ml Interpretability Using Lime In R

Overview

Merely building the model is not enough without stakeholders not being to interpret the outputs of your model

In this article, understand how to interpret your model using LIME in R

Introduction

I thought spending hours preprocessing the data is the most worthwhile thing in Data Science. That is what my misconception was, as a beginner. Now, I realize, that even more rewarding is being able to explain your predictions and model to a layman who does not understand much about machine learning or other jargon of the field.

Consider this scenario – your problem statement deals with predicting if a patient has cancer or not. Painstakingly, you obtain and clean the data, build a model on it, and after much effort, experimentation, and hyperparameter tuning, you arrive at an accuracy of over 90%. That’s great You walk up to a doctor and tell him that you can predict with 90% certainty that a patient has cancer or not.

However, one question the doctor asks that leaves you stumped – “How can I and the patient trust your prediction when each patient is different from the other and multiple parameters can decide between a malignant and a benign tumor?”

This is where model interpretability comes in – nowadays, there are multiple tools to help you explain your model and model predictions efficiently without getting into the nitty-gritty of the model’s cogs and wheels. These tools include SHAP, Eli5, LIME, etc. Today, we will be dealing with LIME.

In this article, I am going to explain LIME and how it makes interpreting your model easy in R.

What is LIME?

LIME stands for Local Interpretable Model-Agnostic Explanations. First introduced in 2024, the paper which proposed the LIME technique was aptly named “Why Should I Trust You?” Explaining the Predictions of Any Classifier by its authors, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.

Built on this basic but crucial tenet of trust, the idea behind LIME is to answer the ‘why’ of each prediction and of the entire model. The creators of LIME outline four basic criteria for explanations that must be satisfied:

The explanations for the predictions should be understandable, i.e. interpretable by the target demographic.

We should be able to explain individual predictions. The authors call this local fidelity

The method of explanation should be applicable to all models. This is termed by the authors as the explanation being model-agnostic

Along with the individual predictions, the model should be explainable in its entirety, i.e. global perspective should be considered

How does LIME work?

Expanding more on how LIME works, the main assumption behind it is that every model works like a simple linear model at the local scale, i.e. at individual row-level data. The paper and the authors do not set out to prove this, but we can go by the intuition that at an individual level, we can fit this simple model on the row and that its prediction will be very close to our complex model’s prediction for that row. Interesting isn’t it?

Further, LIME extends this phenomenon by fitting such simple models around small changes in this individual row and then extracting the important features by comparing the simple model and the complex model’s predictions for that row.

LIME works both on tabular/structured data and on text data as well.

You can read more on how LIME works using Python here, we will be covering how it works using R.

So fire up your Notebooks or R studio, and let us get started!

Using LIME in R

Step 1: The first step is to install LIME and all the other libraries which we will need for this project. If you have already installed them, you can skip this and start with Step 2

install.packages('lime') install.packages('MASS') install.packages("randomForest") install.packages('caret') install.packages('e1071')

Step 2: Once you have installed these libraries, we will first import them:

library(lime) library(MASS) library(randomForest) library(caret) library(e1071)

Since we took up the example of explaining the predictions of whether a patient has cancer or not, we will be using the biopsy dataset. This dataset contains information on 699 patients and their biopsies of breast cancer tumors.

Step 3: We will import this data and also have a look at the first few rows:

data(biopsy)

Step 4: Data Exploration

4.1) We will first remove the ID column since it is just an identifier and of no use to us

biopsy$ID <- NULL

4.2) Let us rename the rest of the columns so that while visualizing the explanations we have a clearer idea of the feature names as we understand the predictions using LIME.

names(biopsy) <- c('clump thickness', 'uniformity cell size', 'uniformity cell shape', 'marginal adhesion','single epithelial cell size', 'bare nuclei', 'bland chromatin', 'normal nucleoli', 'mitoses','class')

4.3) Next, we will check if there are any missing values. If so, we will first have to deal with them before proceeding any further.

sum(is.na(biopsy))

4.4) Now, here we have 2 options. We can either impute these values, or we can use the chúng tôi function to drop the rows containing missing values. We will be using the latter option since cleaning the data is beyond the scope of the article.

biopsy <- na.omit(biopsy) sum(is.na(biopsy))

Finally, let us confirm our dataframe by looking at the first few rows:

head(biopsy, 5)

Step 5: We will divide the dataset into train and test. We will check the dimensions of the data

## 75% of the sample size smp_size <- floor(0.75 * nrow(biopsy)) ## set the seed to make your partition reproducible - similar to random state in Python set.seed(123) train_ind <- sample(seq_len(nrow(biopsy)), size = smp_size) train_biopsy <- biopsy[train_ind, ] test_biopsy <- biopsy[-train_ind, ]

Let us check the dimensions:

cat(dim(train_biopsy), dim(test_biopsy))

Thus, there are 512 rows in the train set and 171 rows in the test set.

Step 6: We will be using a random forest model using the caret library. We will also not be performing any hyperparameter tuning, just a 10-fold CV repeated 5 times and a basic Random Forest model. So sit back, while we train and fit the model on our training set.

I encourage you to experiment with these parameters using other models as well

model_rf <- caret::train(class ~ ., data = train_biopsy,method = "rf", #random forest trControl = trainControl(method = "repeatedcv", number = 10,repeats = 5, verboseIter = FALSE))

Let us view the summary of our model

model_rf

Step 7: We will now apply the predict function of this model on our test set and build a confusion matrix

biopsy_rf_pred <- predict(model_rf, test_biopsy) confusionMatrix(biopsy_rf_pred, as.factor(test_biopsy$class))

Step 8: Now that we have our model, we will use LIME to create an explainer object. This object is associated with the rest of the LIME functions we will be using for viewing the explanations as well.

Just like we train the model and fit it on the data, we use the lime() function to train this explainer, and then new predictions are made using the explain() function

explainer <- lime(train_biopsy, model_rf)

Let us explain 5 new observations from the test set using only 5 of the features. Feel free to experiment with the n_features parameter. You can also pass

the entire test set, or

a single row of the test set

explanation <- explain(test_biopsy[15:20, ], explainer, n_labels = 1, n_features = 5)

The other parameters you can experiment with are:

n_permutations: The number of permutations to use for each explanation.

feature_select: The algorithm to use for selecting features. We can choose among

“auto”: If n_features <= 6 use "forward_selection" else use "highest_weights"

“none”: Ignore n_features and use all features.

“forward_selection”: Add one feature at a time until n_features is reached, based on the quality of a ridge regression model.

“highest_weights”: Fit a ridge regression and select the n_features with the highest absolute weight.

“lasso_path”: Fit a lasso model and choose the n_features whose lars path converge to zero at the latest.

“tree”: Fit a tree to select n_features (which needs to be a power of 2). It requires the last version of XGBoost.

dist_fun: The distance function to use. We will use this to compare our local model prediction for a row and the global model(random forest) prediction for that row. The default is Gower’s distance but we can also use euclidean, manhattan, etc.

kernel_width: The distances of the predictions of individual permutations with the global predictions are calculated from above, and converted to a similarity score.

Step 9: Let us visualize this explanation for a better understanding:

How to interpret and explain this result?

Blue/Red color: Features that have positive correlations with the target are shown in blue, negatively correlated features are shown in red.

Uniformity cell shape <=1.5: lower values positively correlate with a benign tumor.

Bare nuclei <= 7: lower bare nuclei values negatively correlate with a malignant tumor.

Cases 65, 67, and 70 are similar, while the benign case 64 has unusual parameters

The uniformity of cell shape and the single epithelial cell size are unusual in this case.

Despite these deviating values, the tumor is still benign, indicating that the other parameter values of this case compensate for this abnormality.

Let us visualize a single case as well with all the features:

explanation <- explain(test_biopsy[93, ], explainer, n_labels = 1, n_features = 10) plot_features(explanation)

On the contrary, uniformity of cell size <= 5.0 and marginal adhesion <= 4: low values of these 2 parameters contribute negatively to the malignancy with a malignant tumor. Thus, the lower these values are, the lesser the chances of the tumor being malignant.

Thus, from the above, we can conclude that higher values of the parameters would indicate that a tumor has more chances of being malignant.

We can confirm the above explanations by looking at the actual data in this row:

End Notes

Concluding, we explored LIME and how to use it to interpret the individual results of our model. These explanations make for better storytelling and help us to explain why certain predictions were made by the model to a person who might have domain expertise, but no technical know-how of model building. Moreover, using it is pretty much effortless and requires only a few lines of code after we have our final model.

However, this is not to say that LIME has no drawbacks. The LIME Cran package we have used is not a direct replication of the original Python implementation that we were presented with the paper and thus, does not support image data like its Python counterpart. Another drawback could be that the local model might not always be accurate.

I look forward to exploring more on LIME using different datasets and models, as well, exploring other techniques in R. Which tools have you used to interpret your model in R? Do share how you used them and your experiences with LIME below!

Related

How To Common Keys In List And Dictionary Using Python

In this article, we will learn how to find common keys in a list and dictionary in python.

Methods Used

The following are the various methods to accomplish this task −

Using the ‘in’ operator and List Comprehension

Using set(), intersection() functions

Using keys() function & in operator

Using the Counter() function

Example

Assume we have taken an input dictionary and list. We will find the common elements in the input list and keys of a dictionary using the above methods.

Input inputDict = {"hello": 2, "all": 4, "welcome": 6, "to": 8, "tutorialspoint": 10} inputList = ["hello", "tutorialspoint", "python"] Output Resultant list: ['hello', 'tutorialspoint']

In the above example, ‘hello‘ and ‘tutorialspoint‘ are the common elements in the input list and keys of a dictionary. Hence they are printed.

Method 1: Using the ‘in’ operator and List Comprehension List Comprehension

When you wish to build a new list based on the values of an existing list, list comprehension provides a shorter/concise syntax.

Python ‘in’ keyword

The in keyword works in two ways −

The in keyword is used to determine whether a value exists in a sequence (list, range, string, etc).

It is also used to iterate through a sequence in a for loop

Algorithm (Steps)

Following are the Algorithm/steps to be followed to perform the desired task –.

Create a variable to store the input dictionary.

Create another variable to store the input list.

Traverse through the input list and check whether any input list element matches the keys of a dictionary using list comprehension.

Print the resultant list.

Example

The following program returns common elements in the input list and dictionary keys using the ‘in’ operator and list comprehension –

# input dictionary inputDict = {"hello": 2, "all": 4, "welcome": 6, "to": 8, "tutorialspoint": 10} # printing input dictionary print("Input dictionary:", inputDict) # input list inputList = ["hello", "tutorialspoint", "python"] # printing input list print("Input list:", inputList) # checking whether any input list element matches the keys of a dictionary outputList = [e for e in inputDict if e in inputList] # printing the resultant list print("Resultant list:", outputList) Output

On executing, the above program will generate the following output –

Input dictionary: {'hello': 2, 'all': 4, 'welcome': 6, 'to': 8, 'tutorialspoint': 10} Input list: ['hello', 'tutorialspoint', 'python'] Resultant list: ['hello', 'tutorialspoint'] Method 2: Using set(), intersection() functions

set() function − creates a set object. A set list will appear in random order because the items are not ordered. It removes all the duplicates.

intersection() function − A set containing the similarity between two or more sets is what the intersection() method returns.

It means Only items that are present in both sets, or in all sets if more than two sets are being compared, are included in the returned set.

Example

The following program returns common elements in the input list and dictionary keys using set() and intersection() –

# input dictionary inputDict = {"hello": 2, "all": 4, "welcome": 6, "to": 8, "tutorialspoint": 10} # printing input dictionary print("Input dictionary:", inputDict) # input list inputList = ["hello", "tutorialspoint", "python"] # printing input list print("Input list:", inputList) # Converting the input dictionary and input List to sets # getting common elements in input list and input dictionary keys # from these two sets using the intersection() function outputList = set(inputList).intersection(set(inputDict)) # printing the resultant list print("Resultant list:", outputList) Output

On executing, the above program will generate the following output –

Input dictionary: {'hello': 2, 'all': 4, 'welcome': 6, 'to': 8, 'tutorialspoint': 10} Input list: ['hello', 'tutorialspoint', 'python'] Resultant list: {'hello', 'tutorialspoint'} Method 3: Using keys() function & in operator

keys() function − the dict. keys() method provides a view object that displays a list of all the keys in the dictionary in order of insertion.

Example

The following program returns common elements in the input list and dictionary keys using the keys() function and in operator–

# input dictionary inputDict = {"hello": 2, "all": 4, "welcome": 6, "to": 8, "tutorialspoint": 10} # printing input dictionary print("Input dictionary:", inputDict) # input list inputList = ["hello", "tutorialspoint", "python"] # printing input list print("Input list:", inputList) # empty list for storing the common elements in the list and dictionary keys outputList = [] # getting the list of keys of a dictionary keysList = list(inputDict.keys()) # traversing through the keys list for k in keysList: # checking whether the current key is present in the input list if k in inputList: # appending that key to the output list outputList.append(k) # printing the resultant list print("Resultant list:", outputList) Output Input dictionary: {'hello': 2, 'all': 4, 'welcome': 6, 'to': 8, 'tutorialspoint': 10} Input list: ['hello', 'tutorialspoint', 'python'] Resultant list: ['hello', 'tutorialspoint'] Method 4: Using the Counter() function

Counter() function − a sub-class that counts the hashable objects. It implicitly creates a hash table of an iterable when called/invoked.

Here the Counter() function is used to get the frequency of input list elements.

Example

The following program returns common elements in the input list and dictionary keys using the Counter() function –

# importing a Counter function from the collections module from collections import Counter # input dictionary inputDict = {"hello": 2, "all": 4, "welcome": 6, "to": 8, "tutorialspoint": 10} # printing input dictionary print("Input dictionary:", inputDict) # input list inputList = ["hello", "tutorialspoint", "python"] # printing input list print("Input list:", inputList) # getting the frequency of input list elements as a dictionary frequency = Counter(inputList) # empty list for storing the common elements of the list and dictionary keys outputList = [] # getting the list of keys of a dictionary keysList = list(inputDict.keys()) # traversing through the keys list for k in keysList: # checking whether the current key is present in the input list if k in frequency.keys(): # appending/adding that key to the output list outputList.append(k) # printing the resultant list print("Resultant list:", outputList) Output Input dictionary: {'hello': 2, 'all': 4, 'welcome': 6, 'to': 8, 'tutorialspoint': 10} Input list: ['hello', 'tutorialspoint', 'python'] Resultant list: ['hello', 'tutorialspoint'] Conclusion

We studied four different methods in this article for displaying the Common keys in the given list and dictionary. We also learned how to get a dictionary of the iterables’ frequencies using the Counter() function.

Iphone 14 Series Preview: A Total Of 4 Models – May Rise In Price

iPhone 14 series speculated prices

From the rumours so far, below are the prices of the iPhone 14 series. The prices below are the starting price for the base model of the smartphone

Apple iPhone 14 – $799

iPhone 14 Max – $899

iPhone 14 Pro – $1,099

Apple iPhone 14 Pro Max – $1199

In contrast, below are the starting prices of the iPhone 13 series

Apple iPhone 13 mini – $699

iPhone 13 – $799

iPhone 13 Pro – $999

Apple iPhone 13 Pro Max – $1099

Comparing the starting price of the iPhone 13 series and that of the iPhone 14 series, we can see a pattern. The iPhone 14 series is only $100 more expensive than its predecessor.

iPhone 14 series four models

We already know that the iPhone 14 series will have four versions. They include iPhone 14, iPhone 14 Max, iPhone 14 Pro, and iPhone 14 Pro Max. There will be small changes from one model to another. The Pro models (iPhone 14 Pro and 14 Pro Max) will come with the best features that Apple has to offer.

Gizchina News of the week

Join GizChina on Telegram

In terms of hardware, the whole series may come with the latest A16 Bionic processor. However, there are speculations that the iPhone 14 and 14 Max will use the A15 Bionic while the Pro models will use the A16 Bionic chip. This will most likely not be the case when the series eventually arrives. This is because the iPhone SE 3 already uses the A15 Bionic chip. This will put a lot of pressure on the chip inventory.

In this context, Apple started the work of self-developed baseband chips. In 2023, it acquired Intel’s mobile phone baseband chip division for $1 billion and obtained 8,500 related patents and 2,200 Intel employees. According to media sources, on the iPhone 15 series, we may see signal issues resolved to some extent.

The Pro models of the iPhone 14 series will also upgrade to a 48MP main camera and support multiple zoom functions. The iPhone 14 series will support IP68 waterproof, 3D structured light face unlock, 5G network, X-axis linear motor, dual speakers, NFC, high refresh rate and other functions. There are many highlights in functionality.

iPhone 14 series: no suspense in industrial design

However, all four models will not use the same design. While the Pro versions will come with a new pill and punch hole screen solution, the regular models will use the small notch design. Furthermore, the Pro models will significantly reduce their side bezels. This will make these iPhones have the highest screen ratio in Apple’s history.

As for the display, the regular models will use a 6.1-inch screen while the Pro models will have 6.7-inch displays. Some report claims that the iPhone 14 and 14 Pro will use a 6.1-inch display. Thus, the 14 Max and 14 Pro Max will come with a 6.7-inch display. Unfortunately, there are reports that only the Pro models will support the 120Hz high refresh rate. As for the regular models, they will have to make do with the 60Hz refresh rate.

Although Apple no longer reports details on device sales, iPhone revenue in the first quarter of 2023 hit $50.57 billion. This is according to its latest earnings call. That’s up from $47.9 billion in the first quarter of 2023. This means that only Apple’s revenue has grown year-over-year.

Algo Trading With Python: Build Indicators And Manage Risks

Are you a trader who is interested in learning how to build their own trading algorithm?

If so, this course will teach you the basics of Python – you will learn about all the native data types in Python, and know how to work with control flow structure so that you can have decision logic built into your code.

Trading is hard, but it is also highly rewarding. With Python, you can put in a methodical system to build your own rule-based algorithm in order to get to your goal efficiently.

Python is also the perfect language for AI-based algorithms using a variety of machine learning techniques. The AI-based courses are coming soon.

This complete course of algo trading with Python will teach all the Python syntax you need to know, and get you ready for working with Pandas, taking care of the DateTime objects, and handling errors. You will then immediately learn about how to use Python to connect to the MetaTrader5 terminal, and get market data as well as account information programmatically from your broker directly. You will also learn about constructing indicators that work for your style of trading, by using Pandas, Ta-Lib, or writing your own user-defined functions. You will learn how to programmatically compute position size but first decide where you would like to place your stop loss level, and then query some account information to complete the computation. You will learn about the specific ways that you will communicate, using Python, with your brokers to enter and/or exit trades with the computed size for your risk tolerance. Further, you will also learn about how to set the stop loss and take profit, how to move stop loss to breakeven, and how to update (trail) your stop-loss level after that point. Once you have learned all about these topics, you are able to build as many elaborate risk management strategies as you want in a way that you like to trade.

At the end of this course, you will have a firm understanding of what goes into building an algorithmic trading strategy from scratch. You will have not only all the tools necessary to create your own algorithmic strategies, but you will also know how to manage your positions, as well as take what you already know and set up a backtesting environment for yourself, so that you are able to systematically build and test strategies on an on-going basis.

The goal of the series is to give you an understanding of what goes into building an algorithmic trading strategy from scratch. By the end of the three-part series, you should have not only all the tools necessary to create your own algorithmic strategies, but you will also know how to manage your positions.

Goals

Build a complete algorithm, from scanning market, placing trades, to managing trades

Learn how to build indicators that work for you

Learn how to interact with your brokers programmatically and directly

Use code to be systematic in trading and get your time back

Learn how to enter a trade, exit a trade, and get other account data programmatically

Learn the Python skills to protect your accounts and profits

Learn all the skills so that you are able to build a backtesitng system for your algorithm from this course

Prerequisites

No programming experience is required. I will teach you everything you need to know.

Update the detailed information about Build High Performance Time Series Models Using Auto Arima In Python And R on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!