You are reading the article Binary Search Algorithm With Example updated in December 2023 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Binary Search Algorithm With Example
Before we learn Binary search, let’s learn:
What is Search?Search is a utility that enables its user to find documents, files, media, or any other type of data held inside a database. Search works on the simple principle of matching the criteria with the records and displaying it to the user. In this way, the most basic search function works.
What is Binary Search?In this algorithm tutorial, you will learn:
How Binary Search Works?The binary search works in the following manner:
The search process initiates by locating the middle element of the sorted array of data
After that, the key value is compared with the element
If the key value is smaller than the middle element, then searches analyses the upper values to the middle element for comparison and matching
In case the key value is greater than the middle element then searches analyses the lower values to the middle element for comparison and matching
Example Binary Search
Let us look at the example of a dictionary. If you need to find a certain word, no one goes through each word in a sequential manner but randomly locates the nearest words to search for the required word.
The above image illustrates the following:
You have an array of 10 digits, and the element 59 needs to be found.
All the elements are marked with the index from 0 – 9. Now, the middle of the array is calculated. To do so, you take the left and rightmost values of the index and divide them by 2. The result is 4.5, but we take the floor value. Hence the middle is 4.
The algorithm drops all the elements from the middle (4) to the lowest bound because 59 is greater than 24, and now the array is left with 5 elements only.
Now, 59 is greater than 45 and less than 63. The middle is 7. Hence the right index value becomes middle – 1, which equals 6, and the left index value remains the same as before, which is 5.
At this point, you know that 59 comes after 45. Hence, the left index, which is 5, becomes mid as well.
These iterations continue until the array is reduced to only one element, or the item to be found becomes the middle of the array.
Example 2Let’s look at the following example to understand the binary search working
You have an array of sorted values ranging from 2 to 20 and need to locate 18.
The average of the lower and upper limits is (l + r) / 2 = 4. The value being searched is greater than the mid which is 4.
The array values less than the mid are dropped from search and values greater than the mid-value 4 are searched.
This is a recurrent dividing process until the actual item to be searched is found.
Why Do We Need Binary Search?The following reasons make the binary search a better choice to be used as a search algorithm:
Binary search works efficiently on sorted data no matter the size of the data
Instead of performing the search by going through the data in a sequence, the binary algorithm randomly accesses the data to find the required element. This makes the search cycles shorter and more accurate.
Binary search performs comparisons of the sorted data based on an ordering principle than using equality comparisons, which are slower and mostly inaccurate.
After every cycle of search, the algorithm divides the size of the array into half hence in the next iteration it will work only in the remaining half of the array
Learn our next tutorial of Linear Search: Python, C++ Example
Summary
Binary search is commonly known as a half-interval search or a logarithmic search
It works by dividing the array into half on every iteration under the required element is found.
The binary algorithm takes the middle of the array by dividing the sum of the left and rightmost index values by 2. Now, the algorithm drops either the lower or upper bound of elements from the middle of the array, depending on the element to be found.
The algorithm randomly accesses the data to find the required element. This makes the search cycles shorter and more accurate.
Binary search performs comparisons of the sorted data based on an ordering principle than using equality comparisons that are slow and inaccurate.
A binary search is not suitable for unsorted data.
You're reading Binary Search Algorithm With Example
Sieve Of Eratosthenes Algorithm: Python, C++ Example
The Sieve of Eratosthenes is the simplest prime number sieve. It is a Prime number algorithm to search all the prime numbers in a given limit. There are several prime number sieves. For example- the Sieve of Eratosthenes, Sieve of Atkin, Sieve of Sundaram, etc.
The word “sieve” means a utensil that filters substances. Thus, the sieve algorithm in Python and other languages refers to an algorithm to filter out prime numbers.
This algorithm filters out the prime number in an iterative approach. The filtering process starts with the smallest prime number. A Prime is a natural number that is greater than 1 and has only two divisors, viz., 1 and the number itself. The numbers that are not primes are called composite numbers.
In the sieve of the Eratosthenes method, a small prime number is selected first, and all the multiples of it get filtered out. The process runs on a loop in a given range.
For example:
Let’s take the number range from 2 to 10.
After applying the Sieve of Eratosthenes, it will produce the list of prime numbers 2, 3, 5, 7
Algorithm Sieve of Eratosthenes
Here is the algorithm for the Sieve of Eratosthenes:
Step 1) Create a list of numbers from 2 to the given range n. We start with 2 as it is the smallest and first prime number.
Step 2) Select the smallest number on the list, x (initially x equals 2), traverse through the list, and filter the corresponding composite numbers by marking all the multiples of the selected numbers.
Step 3) Then choose the next prime or the smallest unmarked number on the list and repeat step 2.
Step 4) Repeat the previous step until the value of x should be lesser than or equal to the square root of n (x<=
).
).
Note: The mathematical reasoning is quite simple. The number range n can be factorized as-
n=a*b
Again, n =
*
= (factor smaller than
) * (factor larger than )
) * (factor larger than
So at least one of the prime factors or both must be <=
. Thus, traversing to will be enough.
. Thus, traversing towill be enough.
Step 5) After those four steps, the remaining unmarked numbers would be all the primes on that given range n.
Example:
Let’s take an example and see how it works.
For this example, we will find the list of prime numbers from 2 to 25. So, n=25.
Step 1) In the first step, we will take a list of numbers from 2 to 25 as we selected n=25.
Step 2) Then we select the smallest number on the list, x. Initially x=2 as it is the smallest prime number. Then we traverse through the list and mark the multiples of 2.
The multiples of 2 for the given value of n is: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24.
Note: Blue color denotes the selected number, and pink color denotes the eliminated multiples
Step 3) Then we choose the next smallest unmarked number, which is 3, and repeat the last step by marking the multiples of 3.
Step 4) We repeat step 3 in the same way until x=
or 5.
Step 5) The remaining non-marked numbers would be the prime numbers from 2 to 25.
Pseudo-Code Begin Declare a boolean array of size n and initialize it to true For all numbers i : from 2 to sqrt(n) IF bool value of i is true THEN i is prime For all multiples of i (i<n) mark multiples of i as composite Print all unmarked numbers End Sieve of Eratosthenes C/C++ code Exampleusing namespace std; void Sieve_Of_Eratosthenes(int n) { bool primeNumber[n + 1]; memset(primeNumber, true, sizeof(primeNumber)); for (int j = 2; j * j <= n; j++) { if (primeNumber[j] == true) { for (int k = j * j; k <= n; k += j) primeNumber[k] = false; } } for (int i = 2; i <= n; i++) if (primeNumber[i]) cout << i << ” “; } int main() { int n = 25; Sieve_Of_Eratosthenes(n); return 0; }
Output:
2 3 5 7 11 13 17 19 23 Sieve of Eratosthenes Python Program Example def SieveOfEratosthenes(n): # Create a boolean array primeNumber = [True for i in range(n+2)] i = 2 while (i * i <= n): if (primeNumber[i] == True): # Update all multiples of i as false for j in range(i * i, n+1, i): primeNumber[j] = False i += 1 for i in range(2, n): if primeNumber[i]: print(i) n = 25 SieveOfEratosthenes(n)Output:
2 3 5 7 11 13 17 19 23 Segmented SieveWe have seen that the Sieve of Eratosthenes is required to run a loop through the whole number range. Thus, it needs O(n) memory space to store the numbers. The situation becomes complicated when we try to find primes in a huge range. It is not feasible to allocate such a large memory space for a bigger n.
The algorithm can be optimized by introducing some new features. The idea is to divide the number range into smaller segments and compute prime numbers in those segments one by one. This is an efficient way to reduce space complexity. This method is called a segmented sieve.
The optimization can be achieved in the following manner:
Use a simple sieve to find prime numbers from 2 to and store them in an array.
Divide the range [0…n-1] into multiple segments of size at most .
For every segment, iterate through the segment and mark the multiple of prime numbers found in step 1. This step requires O() at max.
The regular sieve requires O(n) auxiliary memory space, whereas the segmented sieve requires O(
), which is a big improvement for a large n. The method has a downside also because it does not improve time complexity.
Complexity Analysis), which is a big improvement for a large n. The method has a downside also because it does not improve time complexity.
Space Complexity:
O(
) auxiliary space.
) auxiliary space.
Time Complexity:
The time complexity of a regular sieve of eratosthenes algorithm is O(n*log(log(n))). The reasoning behind this complexity is discussed below.
For a given number n, the time required to mark a composite number (i.e., nonprime numbers) is constant. So, the number of times the loop runs is equal to-
n/2 + n/3 + n/5 + n/7 + ……∞
= n * (1/2 + 1/3 + 1/5 + 1/7 +…….∞)
The harmonic progression of the sum of the primes can be deducted as log(log(n)).
(1/2 + 1/3 + 1/5 + 1/7 +…….∞) = log(log(n))
So, the time complexity will be-
T(n) = n * (1/2 + 1/3 + 1/5 + 1/7 + ……∞)
= n * log(log(n))
Thus the time complexity O(n * log(log(n)))
Next, you’ll learn about Pascal’s Triangle
Summary:
The Sieve of Eratosthenes filter out the prime numbers in a given upper limit.
Filtering a prime number starts from the smallest prime number, “2”. This is done iteratively.
The iteration is done up to the square root of n, where n is the given number range.
After these iterations, the numbers that remain are the prime numbers.
Python Zip File With Example
Python allows you to quickly create zip/tar archives.
Following command will zip entire directory
shutil.make_archive(output_filename, 'zip', dir_name)Following command gives you control on the files you want to archive
ZipFile.write(filename)Here are the steps to create Zip File in Python
Step 1) To create an archive file from Python, make sure you have your import statement correct and in order. Here the import statement for the archive is from shutil import make_archive
Code Explanation
Import make_archive class from module shutil
Use the split function to split out the directory and the file name from the path to the location of the text file (guru99)
Then we call the module “shutil.make_archive(“guru99 archive, “zip”, root_dir)” to create archive file, which will be in zip format
After then we pass in the root directory of things we want to be zipped up. So everything in the directory will be zipped
When you run the code, you can see the archive zip file is created on the right side of the panel.
Now your chúng tôi file will appear on your O.S (Windows Explorer)
Step 4) In Python we can have more control over archive since we can define which specific file to include under archive. In our case, we will include two files under archive “guru99.txt” and “guru99.txt.bak”.
Code Explanation
Import Zipfile class from zip file Python module. This module gives full control over creating zip files
We create a new Zipfile with name ( “testguru99.zip, “w”)
Creating a new Zipfile class, requires to pass in permission because it’s a file, so you need to write information into the file as newzip
We used variable “newzip” to refer to the zip file we created
Using the write function on the “newzip” variable, we add the files “guru99.txt” and “guru99.txt.bak” to the archive
When you execute the code you can see the file is created on the right side of the panel with name “guru99.zip”
Note: Here we don’t give any command to “close” the file like “newzip.close” because we use “With” scope lock, so when program falls outside of this scope the file will be cleaned up and is closed automatically.
Here is the complete code
Python 2 Example
import os import shutil from zipfile import ZipFile from os import path from shutil import make_archive def main(): # Check if file exists if path.exists("guru99.txt"): # get the path to the file in the current directory src = path.realpath("guru99.txt"); # rename the original file os.rename("career.guru99.txt","guru99.txt") # now put things into a ZIP archive root_dir,tail = path.split(src) shutil.make_archive("guru99 archive", "zip", root_dir) # more fine-grained control over ZIP files with ZipFile("testguru99.zip","w") as newzip: newzip.write("guru99.txt") newzip.write("guru99.txt.bak") if __name__== "__main__": main()Python 3 Example
import os import shutil from zipfile import ZipFile from os import path from shutil import make_archive # Check if file exists if path.exists("guru99.txt"): # get the path to the file in the current directory src = path.realpath("guru99.txt"); # rename the original file os.rename("career.guru99.txt","guru99.txt") # now put things into a ZIP archive root_dir,tail = path.split(src) shutil.make_archive("guru99 archive","zip",root_dir) # more fine-grained control over ZIP files with ZipFile("testguru99.zip", "w") as newzip: newzip.write("guru99.txt") newzip.write("guru99.txt.bak") Summary
To zip entire directory use command “shutil.make_archive(“name”,”zip”, root_dir)
To select the files to zip use command “ZipFile.write(filename)”
Beginner’s Guide To Automl With An Easy Autogluon Example
This article was published as a part of the Data Science Blogathon
You will find two sections in this guide for easier understanding. The first section deals with the background information on AutoML while the second section covers an end-to-end example use case for AutoGluon – one of the AutoML frameworks.
Follow along this guide to familiarize yourself with the concepts, get to know some existing AutoML frameworks, and try out an example based on AutoGluon. Additionally, find the answers to some interesting questions such as-
Why is AutoML required?
What benefits does AutoML offer over the normal method of selecting conventional ML models?
Who can use AutoML?
Where can it be used?
How does AutoML work?
When to use and when to avoid using AutoML?
Understanding the Basics of AutoML Need for AutoMLSo beginning with the first question, do we need AutoML and why? Machine learning algorithms have become increasingly popular in recent years and their success in assisting the decision-making process in a wide range of applications is well-known. A typical machine learning model includes the following steps:
Data Collection
Data Preparation
Algorithm selection
Model Training
Model Evaluation
Parameter Tuning
Model Prediction and Interpretation
The figure below shows a generic representation of the steps carried out in a traditional Machine Learning process from Data collection to Model Predictions.
Steps in Traditional Machine Learning (Image Source: Author)
From the above image, we can guess why many businesses find it difficult to implement traditional ML models. This is due to the complexity and the amount of learning involved in implementing the machine learning model. It also requires extensive domain expertise to generate and compare multiple models before selecting the best one. AutoML promises to simplify these challenges.
Thus, in a broader sense, AutoML has been introduced for automating the process of building an entire Machine Learning pipeline, with minimal human intervention.
Benefits of using AutoMLIn general, an AutoML model aims to automate all time-consuming operations like the selection of algorithms, writing the code, pipeline development, and hyperparameter tuning thereby allowing the data scientists to focus more on speedily resolving the business challenges at hand. Within the pipeline, the AutoML framework
Considers and selects multiple machine learning algorithms from the available ones like the random forest, k-Nearest Neighbor, SVMs, etc.
Performs data preprocessing steps like missing value imputation, feature scaling, feature selection, etc..,
Optimization or the hyperparameter tuning for all of the models
Decides/Tries multiple ways to ensemble or stack the algorithms
Currently available AutoML frameworksThe AutoML technology and the AutoML frameworks are quite new. Currently, there are various AutoML frameworks that can work with a variety of data, available in both open-source and paid versions.
Some popular AutoML packages are:
AutoGluon (2023): This popular AutoML open-source toolkit developed by AWS helps in getting a strong predictive performance in various machine learning and deep learning models on text, image, and tabular data. Installation is supported for the Linux & Mac operating systems whereas Windows is not an officially supported OS for this toolkit.
MLBox (2023): Another well-known open-source Python-based AutoML library is MLBox. However, it is a powerful library that offers three sub-packages related to Pre-processing (to read and pre-process data), Optimization (to test and/or optimize the models) and Prediction (to predict the outcomes on a test dataset). Additionally, it can perform feature selection, hyper-parameter optimization, automatic model selection for classification and regression as well as predicting the target variables for selected models.
AutoWEKA (2013): Auto-WEKA uses a fully automated approach and leverages the Bayesian optimization to select a machine learning algorithm and set its hyperparameters. In other words, it helps in applying the best parameter settings for a given classification or regression task automatically to get a nice model.
Auto-sklearn (2023): This AutoML framework has been developed by Matthias Feurer, et al and according to the official documentation is based on Bayesian optimization, meta-learning, and ensemble construction. Algorithm selection and hyperparameter tuning are automated using this framework. This framework only supports sklearn based models i.e. this is not suitable for graphical models or sequence prediction problems.
Auto-PyTorch (2023): This framework has been developed by the AutoML Groups of the University of Freiburg and Hannover. It is based on the PyTorch deep learning framework and it supports the tabular data of classification and regression. Also, it can be applied to image data for classification.
Autokeras (2023): It is an open-source library used in deep learning for automating tasks. It helps you in getting good neural network models for classification and regression tasks. The AutoML Toolkit has been built on top of the deep learning framework Keras, developed by the Datalab team at Texas A&M University. Since Auto-Keras follows the classic Scikit-Learn API design, the syntax is similar and easy to use.
TPOT (2023): Tree-based Pipeline Optimization Tool (TPOT) is also one of the popular open-source AutoML frameworks which use sci-kit learn library as a part of its ML menu. According to the official documentation, it uses genetic programming to intelligently explore thousands of possible pipelines to find out a top-performing model pipeline for a given dataset. It is important to note here that TPOT does not perform any pre-processing of the dataset and hence, expects the fed dataset to be clean. However, it can perform feature processing, model selection, and hyperparameter optimization to return the best-performing model. It is a good choice for regression and classification problems but it is not suitable for NLP.
H2O AutoML (2023): The H2O 3 AutoML framework is an open-source toolkit best suited to both traditional neural networks and machine learning models. It can be used to automate the machine learning workflow i.e. model training and hyperparameter tuning of models within a specified time duration. It can perform data preprocessing, model selection, and hyperparameter tuning. Additionally, it returns with a leaderboard view of the model along with its performance. However, it is important to note that a java runtime environment is required since H2O AutoML has been developed in java.
MLJAR (2023): This toolkit works with tabular datasets and offers transparency to the users in each step of the AutoML training. All information about the trained models is saved in the hard drive and is accessible to the ML user.
There are some more AutoML libraries that are not mentioned in the article like –
Open-source: AdaNet, TransmogrifAI, Azure Machine Learning, Ludwig
Commercially available:Darwin,DataRobot,Google AutoML
Who can use AutoML and where can it be used?In a broader sense, the idea is to reduce the need for human interaction with the help of AutoML. During training, AutoML focuses on optimizing not only the model weights but also the architecture. The goal is to automate the process of selecting an architecture, which is currently done by experienced Data Scientists. Auto-ML will perform all the above-mentioned tasks without asking many questions and that too in a shorter time. Thus, making the Machine Learning tasks easier. Automated Machine Learning offers different processes and techniques to make Machine Learning easily available and makes it simple for non-Machine Learning experts. That is why you will find that most of the AutoML frameworks mentioned earlier in this guide have been developed by the tech giants for the greater good. Thus, AutoML can be used by Software Engineers to develop applications without the need to know the details on working of ML algorithms, Data Scientists to build ML pipelines in a low-code environment, ML Engineers to speed up their work, and last but not least AI Enthusiasts to explore the capabilities of AutoML.
How does AutoML work?AutoML frameworks begin by connecting to the provided dataset. It is important to note that the selected dataset contains enough data to develop a supervised machine learning model for classification or regression. This dataset should particularly include the target variable as well as any other data that will be used as features for the model to use as input for its predictions. It is possible to drop non-relevant attributes when feeding the dataset to the AutoML framework. Users also need to specify the target column as well when using an AutoML tool. Further, the AutoML framework produces a data profile similar to the outcome of an EDA after the input dataset has been set up. We can find the descriptive statistics for each variable in the dataset, such as mean, median, quartiles, and so on, in this data profile. The AutoML tool determines whether variables are numeric vs. categorical and counts missing values for each variable as part of this data profiling process. Next, AutoML tools experiment with multiple models and perform optimization as well. Most hyperparameter tuning begins with some random sampling. So, most of the AutoML tools tend to use a strategy for intelligently refining samples. Such a trained and optimized model can then be deployed in a production environment using Rest APIs.
When to use and when to skip using AutoML?AutoML performs well for structured data i.e., when the columns are clearly labeled and the data is well-formatted. As these tools perform imputation and normalization, they can easily handle missing values or skewness in the dataset.
AutoML performs extremely well when we need a quick assessment of the model. Next, small to medium size datasets would obviously be trained quicker using AutoML as compared to larger datasets. Choosing AutoML for larger and complex datasets might be a tough choice as it can prove expensive due to the use of more resources or it could be extremely slow due to multiple experiments for hyperparameter tuning and model optimization.
Steps covered in an AutoML framework (Image Source: Author)
Section II: End-to-End AutoML example using AutoGluonIn this section of the guide, we will explore AutoGluon’s different features which automate the machine learning tasks. Additionally, we will experience the implementation of tabular prediction using AutoGluon along with the other prediction categories it supports. Further, we will try to find out how to get the best suitable model for a particular machine learning task when using AutoGluon. Let’s start exploring this AutoML tool.
AutoGluon Logo (Image Source: Official Website)
The following are the benefits of using the ‘AutoGluon’ library:
Simplicity: Training of classification and regression models and deployment can be achieved with a few lines of code.
Robustness: Without doing any feature engineering or data manipulation, users should be able to use raw data.
Predictable-timing: Getting the best model under a specified time constraint.
Fault-tolerance: Training can be resumed even if interrupted and the users can inspect all the intermediate steps.
AutoGluon is designed for both beginners and experts in machine learning. Deep learning, automated stack ensembling, and real-world applications for text, image, and tabular data are covered by this tool.
Tool installation requirementsAutoGluon requires Python version 3.6 or higher. Currently, Linux and Mac are the only operating systems that are fully supported. All the details on the installation of AutoGluon and its different versions are here. AutoGluon can be used for the following categories:
Tabular Prediction
Image Prediction
Object Detection
Text Prediction
Multimodal Prediction
Example#1- TabularPrediction(Classification) with AutoGluonIn this example, we will use the Stroke prediction dataset. You can download the dataset from Kaggle.
We start by importing all the necessary packages
Python Code:
from sklearn.model_selection import train_test_split #splitting the dataset from autogluon.tabular import TabularDataset, TabularPredictor #to handle tabular data and train modelsNext, we split the dataset into train and test sets.
# split into train and test sets df_train,df_test=train_test_split(df,test_size=0.33,random_state=1) df_train.shape,df_test.shape df.head()
We need to drop the outcome column from the newly created test set
test_data=df_test.drop(['stroke'],axis=1) test_data.head()
Now, we build a predictor to train for classifying whether an individual with a given set of conditions will probably be at risk of a stroke. For this, we specify the outcome column as ‘stroke’ and ask the predictor to fit the algorithms on the train dataset. Arguments (optional) ‘verbosity=2’ will display all the steps the predictor is taking to arrive at the best model while ‘presets= best quality’ will ensure that the best model is selected from the trained ones. There are other additional arguments mentioned in the official documentation which can be used to fine-tune the model.
predictor= TabularPredictor(label =’stroke’).fit(train_data = df_train, verbosity = 2,presets='best_quality')Let us scroll through the log from the above command. (As the log is quite long, only the snippets from the log wherever required have been included in this section).
We can notice that even though we did not specify the type of problem, AutoGluon perfectly understands that this is a binary classification problem based on the two unique labels ‘0’ & ‘1’ in the outcome column.
Further, we can also see that AutoGluon aptly selects the ‘accuracy’ metric for this classification task.
Once the classifier training is complete, we can print a summary of the models it has trained using the following command
predictor.fit_summary()
In this case, AutoGluon trained 24 models but we would be more interested to find out which is the best model as selected by AutoGluon. To display this, simply use the leaderboard() command which ranks the trained models in order.
predictor.leaderboard(df_train, silent=True)
Additionally, we can also check for the feature importance using
predictor.feature_importance(data=df_train)
Here we can see that it has identified age and bmi to be the most important factors in the prediction of the outcome.
Next, we feed the test data to the classifier for prediction and we can store it in a DataFrame
y_pred = predictor.predict(test_data) y_pred=pd.DataFrame(y_pred,columns=['stroke']) y_pred #print the DataFrame
To understand the evaluation metric ‘accuracy’, let us print the details for it.
predictor.evaluate(df_test)
Data preprocessing and Feature Engineering were carried out by AutoGluon. The trained model includes cross-validation as well. So, we got the trained classifier at 95% accuracy with just two lines of code (for the classifier to train and predict). Now, that’s impressive! If it were a traditional ML model, we would be spending a long time completing the entire process including EDA, data cleaning as well as coding to set up multiple models. AutoGluon made this quite simple for us.
Example#2- TabularPrediction(Regression) with AutoGluonLet us try another example to explore how AutoGluon’s TabularPrediction handles a regression problem. For this, we will use the ‘Boston prices’ dataset from the sk-learn dataset library. We follow the same set of steps from the previous example
#importing the dataset import numpy as np import pandas as pd from sklearn.datasets import load_boston boston= load_boston() boston.keys() print(boston.DESCR)#use this command to know more about the datasetCreating a DataFrame from the loaded dataset
#create a dataframe from the dataset df=pd.DataFrame(data=boston.data,columns=boston.feature_names) df.head()Append the target column to the dataframe
#adding price column to the dataframe df['PRICE'] = boston.target df.head()
Splitting the dataset
# split into train test sets df_train,df_test = train_test_split(df, test_size=0.33, random_state=1) df_train.shape,df_test.shapeWe will drop the target column from the test dataset.
test_data=df_test.drop(['PRICE'],axis=1) test_data.head()Setup the predictor (regressor)
predictor= TabularPredictor(label =’PRICE’).fit(train_data = df_train, verbosity = 2,presets='best_quality') predictor.leaderboard(df_train, silent=True)# Making Predictions y_pred = predictor.predict(test_data)
In this example too, AutoGluon correctly identified the type of problem as Regression based on the dtype=float for the column and the presence of multiple unique values. Next, it also aptly selected the evaluation metric as ‘root_mean_squared_error’
For the regression problem, AutoGluon trained 11 models and recommended kNN (KNeighborsDist_BAG_L1) as the best model followed by XGBoost (XGBoost_BAG_L1).
The syntax for the predictor is the same in both classification and regression problems. AutoGluon TabularPrediction task seems to work nicely on different datasets. To keep the tutorial simple we selected smaller datasets, but it would be interesting to see how it performs when a bigger dataset is used.
Other use cases in AutoGluonBefore we wrap up the guide, let us briefly look at the other available options in AutoGluon.
Image Prediction: Like Tabular prediction, AutoGluon uses a simple ‘fit()’ command for classifying images based on their content which automatically produces high-quality image classification models.
Object Detection: Object detection is an important task in computer vision involving the process of detecting and localizing objects in an image. Here too, AutoGluon gives an option of calling a simple ‘fit()’ command which will automatically generate a high-quality object detection model for identifying the presence and location of objects in images.
Text Prediction: Likewise for the prediction of text data in supervised learning, we can use a simple ‘fit()’ command to automatically generate high-quality text prediction models. Each training example in the data may be a sentence, a short paragraph, some additional numeric/categorical features present in the text. A single call to ‘predictor.fit()’ command can train highly accurate neural networks on the given text dataset where the target values or labels used to predict may be continuous values or individual categories. Even though the TextPredictor is designed for classification and regression tasks only, it can directly be used for other NLP tasks also if the data is properly formatted into a data table. The TextPredictor uses only Transformer neural network models. These are fit to the provided data via transfer learning from a pre-trained list of NLP models like BERT, ALBERT, and ELECTRA. It also allows training on multi-modal data tables which contain text, numeric and categorical columns, and the neural network hyperparameter which can be automatically tuned with Hyperparameter Optimization (HPO).
Multimodal Prediction: Multimodal tabular data consisting of text, numeric, and categorical columns can also be handled by AutoGluon. Raw text data is observed as a first-class citizen of data tables in AutoGluon. It can help you train and match a wide variety of models including classical tabular models like LightGBM, RF, CatBoost as well as the pre-trained NLP model-based multimodal network.
ConclusionIt is quite interesting to see the effectiveness of AutoML frameworks. It could be used to reduce the time it takes to create production-ready ML models with remarkable simplicity and efficiency. This expedites the overall ML process; thereby freeing up time for data scientists so that they focus on finding the solution to real-life problems. The biggest benefit of using AutoML could be attributed to its ability of training and test multiple existing machine learning algorithms on a variety of data sets autonomously. Further, it is to be noted that using AutoML does not remove the need for training and some basic understanding of data, data annotation, and the desired outcome. Thus, AutoML’s success would likely depend on how soon it is accepted, adopted and the tangible benefits it brings to a certain industry. Nevertheless, we can say that AutoML is there to stay.
Author Bio:Devashree has an chúng tôi degree in Information Technology from Germany and a Data Science background. As an Engineer, she enjoys working with numbers and uncovering hidden insights in diverse datasets from different sectors to build beautiful visualizations to try and solve interesting real-world machine learning problems.
In her spare time, she loves to cook, read & write, discover new Python-Machine Learning libraries or participate in coding competitions.
You can follow her on LinkedIn, GitHub, Kaggle, Medium, Twitter.
The media shown in this article on Ensemble Modeling for Neural Networks is not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
How To Use Offset Function In Excel Vba With Example?
Excel VBA OFFSET Function
As there are two things in this word, one is VBA and other is OFFSET. In this, I’ll be explaining how to use OFFSET function using VBA (Visual Basic for Applications).
VBA – It is a programming language for those who work in Excel and other Office programs, so one can automate tasks in Excel by writing Macros.
Watch our Demo Courses and Videos
Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.
OFFSET – It is a reference function in Excel. The OFFSET function returns a reference to a range that is a specific number of rows and columns from another range or cell. It is one of the most important notions in Excel.
Let’s consider we have a dataset which consists of columns Customer Name, Product, sales, Quantity, Discount.
Suppose on the chance that we need to move down from a particular cell to the particular number of rows and to choose that cell at that point of time OFFSET function is very useful. For example, from cell B1 we want to move down 5 cells and want to select 5th cell i.e. B6. Suppose, if you want to move down from B1 cell 2 rows and goes 2 columns to the right and select that cell i.e. cell D3.
To use OFFSET function in VBA, we have to use VBA RANGE object because OFFSET refers cells and from that RANGE object we can use OFFSET function. In Excel, RANGE refers to the range of cells.
Let’s take a look at how OFFSET is used with RANGE.
Range(“A1”).offset(5).select
How to Use the OFFSET Function in Excel VBA?Below are the different examples to use OFFSET Function in Excel using VBA Code.
You can download this VBA OFFSET Excel Template here – VBA OFFSET Excel Template
VBA OFFSET – Example #1Step 2: Drag the arrow at any cell to create a Command Button.
Code:
End Sub
Step 4: Inside this function, we have to write our code of OFFSET for selecting cells. As mentioned in the previously we have to use OFFSET function with RANGE in VBA.
Range(
End Sub
Step 5: In this code, we have to select the 5th cell of column Product i.e. B6. Cell1 in Range is B1 because we have to move down 5 cells from cell B1 to B6 i.e 5 cells down.
Code:
Range(“B1”).Offset(
End Sub
OFFSET function has two arguments:
RowOffset: How many rows we want to move from the selected row. We have to pass the number as an argument.
ColumnOffset: How many columns we want to move from the selected row.
Step 6: Now I want to select cell B6 i.e I have to move down 5 cells. So, we have to enter 5 as the parameter for Row Offset.
Code:
Range(“B1”).Offset(5)
End Sub
Step 7: After closing the bracket we have to put a (.) dot and write the Select method.
Code:
Range(“B1”).Offset(5).Select
End Sub
VBA OFFSET – Example #2In this example, we will see how to use Column OFFSET argument. We will be working on the same data. All the above steps will be the same but we need to make a change in code.
Since I want to move down 5 cells and take the right 3 columns to reach the cell E6.
Code:
Range(“B1”).Offset(5, 3).Select
End Sub
Things to Remember
It is a reference function in Excel. The OFFSET function returns a reference to a range that is a specific number of rows and columns from another range or cell.
VBA OFFSET is used with RANGE object in VBA.
Recommended ArticlesThis is a guide to VBA OFFSET. Here we discuss how to use OFFSET function in Excel using VBA code along with practical examples and downloadable excel template. You may also look at the following articles to learn more –
Queries On Number Of Binary Sub
In this problem, we are given a binary matrix bin[][] of size nXm. Our task is to solve all q queries. For query(x, y), we need to find the number of submatrix of size x*x such that all the elements of array y (binary number).
Problem descriptionHere, we need to count the total number of sub-matrix of a given size that consists of only one of the two bits i.e. sub-matrix will all elements 0/1.
Let’s take an example to understand the problem, Input n = 3 , m = 4 bin[][] = {{ 1, 1, 0, 1} { 1, 1, 1, 0} { 0, 1, 1, 1}} q = 1 q1 = (2, 1) Output 2 ExplanationAll sub-matrix of size 2 with all element 1 are −
{{ 1, 1, 0, 1} { 1, 1, 1, 0} { 0, 1, 1, 1}} {{ 1, 1, 0, 1} { 1, 1, 1, 0} { 0, 1, 1, 1}}The solution to this problem is using the Dynamic programming approach. To solve, we will maintain a 2D matric DP[][] to store the largest submatrix of the same bit. i.e. DP[i][j] will store the value of sub-matrix whose end index is (i, j), and all the elements are the same.
For your understanding, if DP[4][5] = 2, elements at bin[3][4], bin[3][5], bin[4][4] and bin[4][5] are same.
So, for finding DP[i][j], we have two cases −
Case 1 − if i = 0 orj = 0 : DP[i][j] = 1, as only one sub-matrix can be possible.
Case 2 − otherwise, bin[i-(k-1)][j] = bin[i][j – (k-1)] …: in this casae DP[i][j] = min(DP[i][j-1] , DP[i -1][j], DP[i-1][j-1] ) + 1. This will contribute to the sub-matrix like, we require. Let’s generalise, the case with k = 2, i.e. if we consider a sub-matrix of size 2X2. Then we need to check if bin[i][j] = bin[i] [j – 1] = bin[i- 1][j] = bin[i -1 ][j -1 ], if it is possible then, we will find the DP[i][j] for k =2 .
If the case 2 conditions are not met, we will set DP[i][j] = 1, which can be treated as a default value.
This value of DP[i][j] can be for a set bit or unset bit. We will check the using namespace std; #define N 3 #define M 4
int min(int a, int b, int c) { if (a <= b && a <= c) return a; else if (b <= a && b <= c) return b; else return c; }
int solveQuery(int n, int m, int bin[N][M], int x, int y){ int DP[n][m], max = 1; for (int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { DP[i][j] = 1; else if ((bin[i][j] == bin[i – 1][j]) && (bin[i][j] == bin[i][j – 1]) && (bin[i][j] == bin[i – 1][j – 1])) { DP[i][j] = min(DP[i – 1][j], DP[i – 1][j – 1], DP[i][j – 1]) + 1; if (max < DP[i][j]) max = DP[i][j]; } else DP[i][j] = 1; } } int zeroFrequency[n+m] = { 0 }, oneFrequency[n+m] = { 0 }; for (int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { if (bin[i][j] == 0) zeroFrequency[DP[i][j]]++; else oneFrequency[DP[i][j]]++; } } zeroFrequency[i] += zeroFrequency[i + 1]; oneFrequency[i] += oneFrequency[i + 1]; } if (y == 0) return zeroFrequency[x]; else return oneFrequency[x]; } int main(){ int n = 3, m = 4; int mat[N][M] = {{ 1, 1, 0, 1}, { 1, 1, 1, 0}, { 0, 1, 1, 1}}; int Q = 2; int query[Q][2] = {{ 2, 1}, { 1, 0}}; for(int i = 0; i < Q; i++){ cout<<“For Query “<<(i+1)<<“: The number of Binary sub-matrices of Given size is ” <<solveQuery(n, m, mat, query[i][0], query[i][1])<<“n”; } return 0; }
Output For Query 1: The number of Binary sub-matrices of Given size is 2 For Query 2: The number of Binary sub-matrices of Given size is 3Update the detailed information about Binary Search Algorithm With Example on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!