Trending March 2024 # Lean Manufacturing Tools And Techniques (Latest) # Suggested April 2024 # Top 6 Popular

You are reading the article Lean Manufacturing Tools And Techniques (Latest) updated in March 2024 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Lean Manufacturing Tools And Techniques (Latest)

Lean Manufacturing Tools and Techniques – As companies and manufacturing units upgrade to newer and more efficient practices, it becomes evident that there is a great need for manufacturing tools and techniques to improve this situation. Manufacturing is gaining momentum as the world market progresses each day. With newer businesses making the cut and the demand within global markets increasing, companies must put their best foot forward to meet ever-increasing needs and demands.

Start Your Free Project Management Course

Project scheduling and management, project management software & others

What is Lean Manufacturing?

Often referred to as just “Lean,” Lean Manufacturing is essentially a method used to eliminate waste (or Muda) along the value stream of a particular manufacturing workflow. Lean Manufacturing aims to bring in an even distribution of work, compensating the time for work and ensuring that production doesn’t waste any of the main business factors of quality, time, cost, and resources. A peculiar feature of Lean Manufacturing is the fact that it takes into account the overburdening of resources and also the unevenness of the workload through the value stream.

Devised by Toyota Production System, Lean Manufacturing aims to enhance the overall customer experience and focuses on reducing or eliminating the 7 types of waste. These wastes are as follows: (TIMWOOD)

Transport

Inventory

Motion

Waiting

Overproduction

Over-processing

Defects

6 Lean Manufacturing Tools and Techniques

Now that you are updated with the best way to enhance your manufacturing and production process, it becomes essential to know the different tools and techniques used within Lean Manufacturing, how to use them, and where to use them. So, let’s dive into the 6 most-used, basic, yet effective tools and techniques you need to know.

1. 5S

Within the paradigm of Lean Manufacturing, 5S is a simple yet powerful Japanese tool to systematically organize a workplace, keeping it clean and safe. This organizing enhances your productivity and work standardization efforts and helps visual management.

5S ensures that a manufacturing or production unit experiences standardization throughout its workflow at all levels of the process. This way, iterations can take place at a higher speed, thus, promoting higher levels of production. With standard operational practices in tow, it becomes easier for work to proceed efficiently, safely, and repeatably.

For an organization implementing 5S, this tool became the foundation for all the other Lean Manufacturing tools to use and organize effectively. The 5S tool works methodically in 5 phases. These 5 phases are termed in Japanese and are transliterated in English to form 5 “S” terms. They are as follows:

5S Seiri – or, Sort, is the first step of the 5S and involves sorting all the mess and clutter within the workplace while keeping only the important and extremely useful items within the work area.

5S Seiton – or, Straighten, is the next step that dictates the process of arranging the decluttered items efficiently to be used using the principles of ergonomics. This step ensures that every item has its place and those items return to their place.

5S Seiso – or Sweep, is the step that involves a thorough cleaning of the work area, the tools to use, and all the systems, machines, and equipment to use in the manufacturing unit of the company. It will ensure that all the apparatus used during production and assembly are as good as new to eliminate any non-conformity that may arise due to technical difficulties.

Standardization is a key component within Lean Manufacturing. 5S Seiketsu – or, Standardize, ensures that whatever work was conducted in the first 3 steps is now standardized accordingly. Which builds in the common standards and how we must work among the team. Thus, this becomes a crucial phase.

5S Shitsuke – or, Sustain, is the final stage that ensures that the company keeps up to the standards adhered to and conformed to. This stage involves housekeeping and auditing the processes, tools, and equipment. It is during this stage that the work routine becomes a culture.

2. Cellular Manufacturing

Going to the basics of this technique, we need to understand what a cell is. The concept of cellular manufacturing increases the different mix of products in a single manufacturing unit while dealing with minimum waste. A cell can consist of work areas/workstations and equipment arranged suitably to facilitate the smooth operation of the workflow. It would mean the smooth processing of the materials and elements through a process. This cell even boasts of trained operators qualified to work within it.

Even the pace at which to set the process would be defined and slated according to the customer’s needs and demand rate.

Cellular manufacturing addresses the issue of catering to the multiple product lines required by customers. This technique groups similar products together to process them in the same sequence and on the same equipment. It reduces the time lost in the changeover between the different products and offers the production line smaller, containable units of products. Cellular manufacturing also ensures that space is effectively utilized in all production instances. Apart from these, cellular manufacturing also contributes to reducing the lead time and improving the productivity of the production line. With a lot of clarity, this technique, within Lean Manufacturing, also enhances flexibility and transparency between different product lines and enhances teamwork and communication between various departments.

3. Continuous Improvement

Continuous Improvement follows the proceeding quality cycle, called the Deming Cycle, or PDCA cycle, which comprises 4 phases the product or process must go through. They are as follows:

Plan -During this stage, a change opportunity occurs, and planning occurs to implement the change within the system.

Do – Once the planning is completed and verified, the plan is executed to implement the change within the system.

Check – In this stage, and data is collected and viewed to check the success of the implemented change. The team analyzes the results to determine the success of the change brought about.

Act – Once the change is determined to be successful, the plan is implemented on a much wider scale, and continuous assessment takes place. Again, the check stage follows after large-scale implementation.

4. Jidoka

In Japanese, the coined-in term “Jidoka” can be defined as “automation with human intervention”. This term gained importance during the 19th century when Sakichi started to operate the autonomous, self-powered loom Toyoda, founder of the Toyota group of companies. This mechanical loom would stop if it detected a break in the thread during the looming process. The operator handling the loom would then intervene and fix the line before resuming the function.

This would mean that the production process would temporarily halt until the breakage fixes upon every detection.

This way, 100% quality was ensured to customers as no defective product was even produced. Also, it took only a single operator to handle this entire operation which was essentially cost-effective—an improvement in the productivity of the process. In short, the process puts into effect all the principles and philosophies of Lean Manufacturing, and the process looks something like this:

The system detects abnormality and communicates this to the main system

↳ Detection of deviation from the normal workflow

↳ Production halts

↳ All changes made are incorporated to reflect in the standard workflow

This way, you can feed in all the defects and abnormalities. When a workflow deviates from this standardized flow, the system can immediately notify you to rectify and provide the next anomaly.

5. Total Productive Maintenance

Machine downtime is a serious concern on a production line and can cause detrimental issues if the problem isn’t resolved on time. Lean Manufacturing addresses the concern of machine and equipment reliability on the manufacturing line with the help of the tool Total Productive Maintenance. Setting up a Total Productive Maintenance program is necessary for a Lean Manufacturing environment.

The Total Productive Maintenance program comprises 3 components, which boost the working of the production/manufacturing line. They are as follows:

Preventive Maintenance –

These are regular planned and executed maintenance activities, not mere random checks conducted by the workers. The crew performs periodic and complete equipment maintenance on all the machines to check for any anomalies in the functioning, as expected. This will ensure that sudden breakdowns do not occur and the throughput for each piece of equipment increases.

Corrective Maintenance – This kind of maintenance revolves around deciding whether there is a need for fixing or purchasing new equipment altogether. It makes sense to examine and completely replace some machines that experience frequent breakdowns to avoid further loss of money, resources, or even quality.

Maintenance Prevention – This component ensures that the machines purchased are right. A device that is hard to maintain will only cause more trouble and a loss of investment for the organization. Workers will find it difficult to hold it continuously, resulting in serious loss.

6. Total Quality Management

An important Lean Manufacturing technique, Total Quality Management is a continuous quality program aimed at bringing about teamwork among departments to come together and ensure a self-reliant workflow, outputting optimum quality of products. TQM deals with participative management and focuses on the customer needs and demands, aligning the production process and timelines. Total Quality Management looks at the following key components as part of its technique definition:

Employee involvement and training

Problem-solving teams

Statistical methods

Process and not people

Focus on long-term goals

The needs of the customers define quality.

Direct participation of the top management is essential to bring about change and increase steps taken toward quality

The quality increment is a continuous effort and one that needs to be continued as a long-term plan

Improvement in the work process and the maintenance of the production line

Systematic analysis after requirement gathering is essential

Requirement gathering should take place with each department involved and all the employees within that department

Summary

These tools and techniques offer a complete and wholesome Lean Manufacturing system. While 5S and Continuous Improvement, along with other devices, such as Kaizen, promote the foundation of Lean Manufacturing, Jidoka and tools such as JIT (Just-In-Time) are the pillars of Lean Manufacturing, providing the necessary support to the qualitative structure that it promotes.

Cellular Manufacturing comes across as a solid methodology within the Lean Manufacturing world. It offers a great tool for the production line to reduce time and cost and effectively utilize resources and space. Lastly, TQM ensures that machines and processes increase their throughput while addressing quality.

Lean Manufacturing is an important way of management within the production/manufacturing world whose concepts have slowly and steadily entered the world of business and have proved beneficial in all strata of these businesses. Using Lean Manufacturing is all about understanding the concepts behind these tools and techniques. Once you’re familiar with these concepts, implementation can be based on your work culture and production style, as Lean Manufacturing has succeeded in all different sectors and forms of business.

You're reading Lean Manufacturing Tools And Techniques (Latest)

The 5 Phases Of Lean Sigma (Dmaic Process)

DMAIC stands for Define, Measure, Analyze, Improve, and Control – the five critical phases of Six Sigma that aim to bring efficiency to business operations. No matter the size and type of your organization, the final goal of every company is to boost its sales and profits and improve the quality of its services. With the competition growing at an exponential rate, businesses are keen on adopting any new methodology or technology that could speed up their growth.

What is the DMAIC Process?

Before getting started with Six Sigma, companies conduct an additional step called “recognize” to figure out whether DMAIC is the right approach. Even though it’s not formally part of the process, it’s often applied to lean Sigma because DMAIC is quite a comprehensive procedure. It is not applicable to all situations. Businesses need to carefully recognize the situation before implementing this approach. The question is, how do you decide if Lean Sigma is an appropriate choice? It depends on three crucial factors −

You notice defects in your existing products or inefficiency in the processes.

There is room for improvement − you can increase the profitability and reduce the cost of operation simultaneously if you adopt a different method for business operations.

You can measure the output in numbers, data, or other quantifiable terms.

You need to compare your processes with these factors to determine if you should apply DMAIC to your business and how effective the process will be.

Five Critical Phases of DMAIC

Define − The first step is to identify the problems in the current processes, products, and technology. During this step, you are supposed to identify the scope of the project and how certain things might affect the stakeholders and your project’s outcome. Experts also identify the areas they should improve. Once you identify the stakeholders, objectives, and potential challenges, it will be easier to proceed.

Measure − This is where you measure the performance of the processes. In this stage, you have to think of the data collection methods to compare different processes, identify input and output indicators, analyze the current data, determine the failure rates, and use various measurement tools (charts, comparison tables, graphs, etc.) to get a clear picture of the performance metrics.

Analyze − It’s impossible to find an accurate solution to a problem when you don’t look into the root cause of the issue. The third phase of the DMAIC process is about analyzing the underlying cause of the problem so that you can figure out an effective method to resolve the issue. The analysis should start with a complete root-cause analysis. Experts use a multi-vari chart to collect a visual representation of the problems in the processes. Once you are done with the analysis, you will have a clear understanding of the areas you can improve. Based on this, you can deploy an effective plan.

Improve − This is where the magic happens. After you have identified the areas you need to improve and collected the data, the next step is to improve. Start with brainstorming ideas that can bring innovation or change to your processes. Deploy new solutions and inform stakeholders about the same. You can also use the improvement management software to streamline this part of the project management and ensure accuracy in all operations.

Control − Once you have implemented the new processes and you see the improvement in your business performance, now is the time to think of a long-term control strategy so that your operations remain effective. You need to come up with a quality control plan to ensure your team stays on the right track and follows the same techniques.

After these five steps of the DMAIC process, you will see a change in your business operations. You can also document these changes and measure them in quantifiable terms. This includes the change in efficiency, customer satisfaction, output, cost reduction, profitability, etc.

Benefits of the DMAIC Approach

Suppose your existing processes or management operations seem ineffective, or you see scope for cost reduction and increased profit. In that case, it’s always better to revisit your management strategies and identify the problems. DMAIC shows you the right way to achieve your objectives by improving the problematic areas. Along with being a simple concept, it is structured well and is quite comprehensive. You might implement as many processes as you like, but without a DMAIC approach, it’s hard to keep track of the progress and problems. There’s no way to figure out which process is generating results and which should be eliminated.

As it’s a detailed and structured approach that requires every little detail to be documented, the DMAIC will eventually increase productivity. In addition, the data gathered during the above phases can be used to improve other processes. Teams can use this data (especially lessons learned reports) to improve their performance in current or future projects.

DMAIC vs. DMADV

DMADV is another methodology used in Six Sigma. Although both are similar on some level, the purpose of each is different. DMAIC focuses on improving profits and reducing costs by implementing the most reliable techniques or making changes in the current procedures.

DMADV, on the other hand, is concerned with designing a new product, process, or service while incorporating relevant phases −including Define, Measure, Analyze, Design, and Verify. While DMAIC depicts specific solutions, DMADV is regarded as a part of the solution design process.

DMAIC is most suitable in areas that need improvement and where it’s clear that an improvement in the processes can generate better results. DMADV is responsible for addressing the design process.

Conclusion

A successful DMAIC approach gives you a better understanding of the problems in your current approach, tools, and techniques. This helps you establish an appropriate management plan that could produce better results in the future and help you achieve your targets efficiently. It is not a one-time process but a continuous management approach that helps you reduce operating costs and increase your profit by implementing the right processes.

Finding Discrepancies In Excel – 5 Easy Techniques Explained

When working with large datasets in Microsoft Excel, encountering discrepancies in the data is inevitable. Identifying and fixing these inconsistencies is critical for ensuring accurate reporting and analysis.

By incorporating these techniques into your everyday workflow, you’ll become more efficient in managing and analyzing your data. Ultimately leading to better decision-making and improved outcomes for your projects.

Discrepancies in rows and columns can result from human input errors, duplicate entries, or irregularities in data formatting.

To find and manage these discrepancies, you can use the following Excel features:

Filters: Apply filters to specific columns to view unique values and quickly spot inconsistent data.

Conditional Formatting: Use formatting rules to highlight cells that meet specific criteria or mismatched data between columns.

Advanced Excel Functions: Utilize functions such as MATCH and VLOOKUP to compare data sets and identify discrepancies between them.

By understanding how to use these features in Excel, you can efficiently spot and rectify discrepancies in your data sets.

There are 5 main ways we like to tackle discrepancies in Excel:

Manual Review

Filtering

Conditional Formatting

Advanced Excel Functions

Excel Add-ins

Let’s run through each of them so that you can work out the best way that suits you.

When working with Excel, a manual review can be a helpful method for identifying discrepancies in your data. This process requires careful examination of the data sets to spot any inconsistencies or errors.

Here are some steps you can follow to perform a manual review:

Begin by clearly understanding the data sets you are working with. Make yourself familiar with the rows, columns, and formatting.

Scroll through the data and pay close attention to the information presented in each cell. You can use tools like cell highlighting with fill color, text formatting, or conditional formatting to make data easier to read.

Compare one row against another or one column against another. Look for mismatched information, duplicate entries, etc. that may cause inaccuracies in your Excel sheets.

One of the easiest ways to detect inconsistent data is by using Excel’s Filter.

With Filter, we’ll isolate the incorrect data in the Gender column in the following data set:

1. Select the Sort & Filter in the Home tab.

2. Choose Filter – ensure your cursor is in the range of the data.

Another way to identify column and row differences is by using conditional formatting.

You can access rules like Highlight Cells, Top/Bottom, Data Bars, Color Scales & Iron Sets. Moreover, you can create a new rule and manage or clear previous rules.

We’ll use a sample table to show you the power of conditional formatting:

1. Select your table or preferred rows and columns. We are choosing Column A (Company) and Column B (Country). Then, select Conditional Formatting on the Home tab.

2. Select a rule type and then the rule. We are picking Highlight Cell Rules and Duplicate Values…

4. All the duplicate values are now highlighted with Light Red Fill with Dark Red Text.

To compare cells and identify differences, you can utilize IF and IS functions.

For example, we have used the formula =IF(A2=B2,"MATCH","NOT MATCHED") to compare cells A2 and B2. And we have dragged the formula to apply it to other cells. So, identical cells are returning MATCH otherwise NOT MATCHED.

You can combine IF with IS functions like ISNUMBER or ISTEXT to check for specific data types. For example, the formula =IF(ISNUMBER(B2), "Number", "NaN") will return Number in column c if B2 is numeric and NaN otherwise.

VLOOKUP (Vertical) and HLOOKUP (Horizontal) functions are useful when comparing values in different columns or rows of two tables. To do it, use the following formula: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) or =HLOOKUP(lookup_value, table_array, row_index_num, [range_lookup]).

Replace lookup_value with the value you want to find in the other table, table_array with the range of cells in the second table, col_index_num or row_index_num with the index number of the column or row you want to return a value from, and [range_lookup] with FALSE for exact matches or TRUE for an approximate match.

In the following tutorial, we are using the VLOOKUP function on Column E to compare Employee Names in Column A and Column D to pull off their Salaries from Column B if a match is found.

XLOOKUP is a refreshed version of VLOOKUP and HLOOKUP available on Excel 2023 and Excel 365. To XLOOKUP formula is: =XLOOKUP(lookup_value, lookup_array, return_array,[if_not_found]).

Substitute lookup_value with the value you want to find in the other table, lookup_array with the range of cells to search in the second table, return_array with the array you want to return a value from, and [if_nout_found] with a text value if no matching values are found.

Here’s a quick sample of XLOOKUP with the same tables:

The MATCH function can also be used to compare two lists for discrepancies.

The MATCH function, =MATCH(lookup_value, lookup_array, [match_type]), searches for a specified item in a range and returns the relative position of the item within that range.

Here, we are finding the position of Watermelons in the array A2:A8 using MATCH. We are using match_type as 0 to find the first value that’s exactly equal to the lookup_value.

Excel offers various add-ins and built-in tools that can help you detect and analyze discrepancies more efficiently. In this section, we will discuss some of those useful add-ins.

5. Once enabled, navigate to the Inquire tab to use the Compare Files command.

In summary, Excel offers various add-ins to help you detect and handle discrepancies in your data. You can improve your data accuracy by familiarizing yourself with these tools and formulas.

To create data validation rules in Excel that can help prevent discrepancies, follow these steps:

1. Select the cells you want to restrict. This can be a single cell, a range of cells, or an entire column.

3. In the Data Validation window, ensure you are on the Settings tab. Here, you will have the opportunity to define your validation criteria based on your needs.

There are various types of validation criteria available in Excel, some of which include:

Whole number – allows only whole numbers within a specified range.

Decimal – permits only decimal numbers within a specified range.

List – restricts input to a predefined list of acceptable values.

Date – requires dates within a specific date range.

Time – accepts only times within a particular time range.

Text length – enforces constraints on the length of the text entered.

Custom – enables you to create custom validation rules using Excel formulas.

After selecting the appropriate validation criteria, specify the parameters as needed. For instance, if you chose Whole Number, you would have to set the Minimum and Maximum with a Data range validator like between, equal to, less than, etc.

In addition to the validation criteria, you can also set up custom error messages and input hints to help users understand and adhere to your validation rules. To do this, switch to the Input Message and Error Alert tabs to enter your desired messages.

By implementing data validation rules, you significantly reduce the likelihood of data inconsistencies and improve the overall accuracy of your Excel worksheets.

In this article, you’ve learned various methods to find and address discrepancies in your Excel data. Let’s recap those key strategies:

Using Filter drop-downs in columns to identify unique values and inconsistent data.

Employing Conditional Formatting in selected cells to highlight discrepancies using a bunch of rule templates.

Using Excel add-ins to detect discrepancies.

Setting up Data validation rules to prevent errors in data

With these techniques in your Excel skillset, your data will be more accurate and trustworthy.

How To Fix Error 0X80070652 And Install The Latest Windows Updates

Error Code 0x80070652 in Windows Update [SOLVED] Expert-tested solutions for you to resume your updates

3

Share

X

Many users claimed that error 0x80070652 in Windows 10 prevents them from installing updates.

The first step is to restart your computer and run the Windows Troubleshoot tool.

It may also help to uninstall the latest updates and check again for upgrades.

You can also use the Media Creation tool to install the necessary patches.

X

INSTALL BY CLICKING THE DOWNLOAD FILE

To fix Windows PC system issues, you will need a dedicated tool

Fortect is a tool that does not simply cleans up your PC, but has a repository with several millions of Windows System files stored in their initial version. When your PC encounters a problem, Fortect will fix it for you, by replacing bad files with fresh versions. To fix your current PC issue, here are the steps you need to take:

Download Fortect and install it on your PC.

Start the tool’s scanning process to look for corrupt files that are the source of your problem

Fortect has been downloaded by

0

readers this month.

Besides the abundance of the new features, Windows 10 also has some distinctive problems that were rarely seen in the previous system editions.

One of those troubling segments is the update errors that are sometimes hard to cope with.

To make things even harder, there’s no way to ignore updates, like that, was the case in some other Windows versions.

Just like the error we’ll try and address today. This error goes by the code 0x80070652, and if you’ve encountered it, you should definitely check the workarounds we have provided below.

How can I fix the error code 0x80070652 in Windows 10? 1. Restart the PC and run Windows Troubleshoot tool

The first obvious step is the PC reboot. On more than one occasion, troubled users resolved update issues by a simple restart.

Restart can clear the infliction brought by some of the system’s features, like third-party programs or update services.

Another thing you should do as soon as possible is hidden under the refurbished Troubleshoot menu that came with the Creators update.

We now have the troubleshooting tools that cover most of the system errors in one spot. You can utilize the Windows Update troubleshooter by following the above steps.

With Windows 10, we got ourselves a bunch of mandatory updates installed on (almost) a daily basis. And it’s quite hard to prevent them from appearing, almost impossible.

But, you can at least uninstall them if something goes wrong and check for updates again. If your problem goes deeper than this, you’ll probably want to check the remaining solutions.

3. Use third-party tool for Windows issues

Clutter in your system and damaged registry files can sometimes lead to errors like 0x80070652. Restoring the registry settings to their default values and performing an in-depth cleanup might prevent these issues.

Expert tip:

4. Run the batch script

It not uncommon for the Windows update services to get unresponsive. But, luckily, there’s a way to reset them.

Now, you can do that manually by resetting certain update services or use the pre-created batch script that can do that for you. You can use the batch file in a few simple steps.

If you want to create the script on your own, the complete instructions can be found in our comprehensive guide.

Go to the Microsoft Update catalog.

Write the number of KB in the search bar.

Download the file and run it.

After the installation is finished, restart your PC.

If you have a problem with a major patch (build) you may need to restore your system in order to start from scratch. Follow the above instructions to reinstall the update file manually.

Luckily, that’s not the case with the small security patches or cumulative updates. You can download those from Microsoft’s official site and manually install them.

In the end, if all of these steps weren’t enough to overcome the error, you can use the final step to force the updates.

Media Creation Tool was introduced with Windows 10 to vastly improve the digital delivery of the system.

And it’s more than a welcomed tool for the multitude of upgrade/installation procedures.

Additionally, you can use it to force the updates and surpass the issues brought by the standard over-the-air update system.

Was this page helpful?

x

Start a conversation

Practicing Machine Learning Techniques In R With Mlr Package

Introduction

In R, we often use multiple packages for doing various machine learning tasks. For example: we impute missing value using one package, then build a model with another and finally evaluate their performance using a third package.

The problem is, every package has a set of specific parameters. While working with many packages, we end up spending a lot of time to figure out which parameters are important. Don’t you think?

To solve this problem, I researched and came across a R package named MLR, which is absolutely incredible at performing machine learning tasks. This package includes all of the ML algorithms which we use frequently. 

In this tutorial, I’ve taken up a classification problem and tried improving its accuracy using machine learning. I haven’t explained the ML algorithms (theoretically) but focus is kept on their implementation. By the end of this article, you are expected to become proficient at implementing several ML algorithms in R. But, only if you practice alongside.

Note: This article is meant only for beginners and early starters with Machine Learning in R. Basic statistic knowledge is required. 

Table of Content

Getting Data

Exploring Data

Missing Value Imputation

Feature Engineering

Outlier Removal by Capping

New Features

Machine Learning

Feature Importance

QDA

Logistic Regression

Cross Validation

Decision Tree

Cross Validation

Parameter Tuning using Grid Search

Random Forest

SVM

GBM (Gradient Boosting)

Cross Validation

Parameter Tuning using Random Search (Faster)

XGBoost (Extreme Gradient Boosting)

Feature Selection

Machine Learning with MLR Package

Until now, R didn’t have any package / library similar to Scikit-Learn from Python, wherein you could get all the functions required to do machine learning. But, since February 2024, R users have got mlr package using which they can perform most of their ML tasks.

Let’s now understand the basic concept of how this package works. If you get it right here, understanding the whole package would be a mere cakewalk.

The entire structure of this package relies on this premise:

Create a Task. Make a Learner. Train Them.

Creating a task means loading data in the package. Making a learner means choosing an algorithm ( learner) which learns from task (or data). Finally, train them.

MLR package has several algorithms in its bouquet. These algorithms have been categorized into regression, classification, clustering, survival, multiclassification and cost sensitive classification. Let’s look at some of the available algorithms for classification problems:

22 classif.xgboost                 xgboost

And, there are many more. Let’s start working now!

1. Getting Data

For this tutorial, I’ve taken up one of the popular ML problem from DataHack  (one time login will be required to get data): Download Data.

After you’ve downloaded the data, let’s quickly get done with initial commands such as setting the working directory and loading data.

2. Exploring Data

Once the data is loaded, you can access it using:

Loan_Status      factor   0      NA         0.3127036     NA        NA    192    422    2

This functions gives a much comprehensive view of the data set as compared to base str() function. Shown above are the last 5 rows of the result. Similarly you can do for test data also:

From these outputs, we can make the following inferences:

In the data, we have 12 variables, out of which Loan_Status is the dependent variable and rest are independent variables.

Train data has 614 observations. Test data has 367 observations.

In train and test data, 6 variables have missing values (can be seen in na column).

ApplicantIncome and Coapplicant Income are highly skewed variables. How do we know that ? Look at their min, max and median value. We’ll have to normalize these variables.

LoanAmount, ApplicantIncome and CoapplicantIncome has outlier values, which should be treated.

Credit_History is an integer type variable. But, being binary in nature, we should convert it to factor.

Also, you can check the presence of skewness in variables mentioned above using a simple histogram.

As you can see in charts above, skewness is nothing but concentration of majority of data on one side of the chart. What we see is a right skewed graph. To visualize outliers, we can use a boxplot:

Similarly, you can create a boxplot for CoapplicantIncome and LoanAmount as well.

Let’s change the class of Credit_History to factor. Remember, the class factor is always used for categorical variables.

To check the changes, you can do:

[1] "factor"

You can further scrutinize the data using:

We find that the variable Dependents has a level 3+ which shall be treated too. It’s quite simple to modify the name levels in a factor variable. It can be done as:

3. Missing Value Imputation

Not just beginners, even good R analyst struggle with missing value imputation. MLR package offers a nice and convenient way to impute missing value using multiple methods. After we are done with much needed modifications in data, let’s impute missing values.

In our case, we’ll use basic mean and mode imputation to impute data. You can also use any ML algorithm to impute these values, but that comes at the cost of computation.

This function is convenient because you don’t have to specify each variable name to impute. It selects variables on the basis of their classes. It also creates new dummy variables for missing values. Sometimes, these (dummy) features contain a trend which can be captured using this function. dummy.classes says for which classes should I create a dummy variable. dummy.type says what should be the class of new dummy variables.

$data attribute of imp function contains the imputed data.

Now, we have the complete data. You can check the new variables using:

Did you notice a disparity among both data sets? No ? See again. The answer is Married.dummy variable exists only in imp_train and not in imp_test. Therefore, we’ll have to remove it before modeling stage.

Optional: You might be excited or curious to try out imputing missing values using a ML algorithm. In fact, there are some algorithms which don’t require you to impute missing values. You can simply supply them missing data. They take care of missing values on their own. Let’s see which algorithms are they:

8 classif.rpart                  rpart

chúng tôi = "numeric")

4. Feature Engineering

Feature Engineering is the most interesting part of predictive modeling. So, feature engineering has two aspects: Feature Transformation and Feature Creation. We’ll try to work on both the aspects here.

At first, let’s remove outliers from variables like ApplicantIncome, CoapplicantIncome, LoanAmount. There are many techniques to remove outliers. Here, we’ll cap all the large values in these variables and set them to a threshold value as shown below:

I’ve chosen the threshold value with my discretion, after analyzing the variable distribution. To check the effects, you can do summary(cd_train$ApplicantIncome) and see that the maximum value is capped at 33000.

In both data sets, we see that all dummy variables are numeric in nature. Being binary in form, they should be categorical. Let’s convert their classes to factor. This time, we’ll use simple for and if loops.

}

}

These loops say – ‘for every column name which falls column number 14 to 20 of cd_train / cd_test data frame, if the class of those variables in numeric, take out the unique value from those columns as levels and convert them into a factor (categorical) variables.

Let’s create some new features now.

While creating new features(if they are numeric), we must check their correlation with existing variables as there are high chances often. Let’s see if our new variables too happens to be correlated:

As we see, there exists a very high correlation of Total_Income with ApplicantIncome. It means that the new variable isn’t providing any new information. Thus, this variable is not helpful for modeling data.

Now we can remove the variable.

There is still enough potential left to create new variables. Before proceeding, I want you to think deeper on this problem and try creating newer variables. After doing so much modifications in data, let’s check the data again:

5. Machine Learning

Until here, we’ve performed all the important transformation steps except normalizing the skewed variables. That will be done after we create the task.

As explained in the beginning, for mlr, a task is nothing but the data set on which a learner learns. Since, it’s a classification problem, we’ll create a classification task. So, the task type solely depends on type of problem at hand.

Let’s check trainTask

Positive class: N

As you can see, it provides a description of cd_train data. However, an evident problem is that it is considering positive class as N, whereas it should be Y. Let’s modify it:

For a deeper view, you can check your task data using str(getTaskData(trainTask)).

Now, we will normalize the data. For this step, we’ll use normalizeFeatures function from mlr package. By default, this packages normalizes all the numeric features in the data. Thankfully, only 3 variables which we have to normalize are numeric, rest of the variables have classes other than numeric.

Before we start applying algorithms, we should remove the variables which are not required.

MLR package has an in built function which returns the important variables from data. Let’s see which variables are important. Later, we can use this knowledge to subset out input predictors for model improvement. While running this code, R might prompt you to install ‘FSelector’ package, which you should do.

If you are still wondering about information.gain, let me provide a simple explanation. Information gain is generally used in context with decision trees. Every node split in a decision tree is based on information gain. In general, it tries to find out variables which carries the maximum information using which the target class is easier to predict.

Let’s start modeling now. I won’t explain these algorithms in detail but I’ve provided links to helpful resources. We’ll take up simpler algorithms at first and end this tutorial with the complexed ones.

With MLR, we can choose & set algorithms using makeLearner. This learner will train on trainTask and try to make predictions on testTask.

1. Quadratic Discriminant Analysis (QDA).

In general, qda is a parametric algorithm. Parametric means that it makes certain assumptions about data. If the data is actually found to follow the assumptions, such algorithms sometime outperform several non-parametric algorithms. Read More.

Upload this submission file and check your leaderboard rank (wouldn’t be good). Our accuracy is ~ 71.5%. I understand, this submission might not put you among the top on leaderboard, but there’s along way to go. So, let’s proceed.

2. Logistic Regression

This time, let’s also check cross validation accuracy. Higher CV accuracy determines that our model does not suffer from high variance and generalizes well on unseen data.

Similarly, you can perform CV for any learner. Isn’t it incredibly easy? So, I’ve used stratified sampling with 3 fold CV. I’d always recommend you to use stratified sampling in classification problems since it maintains the proportion of target class in n folds. We can check CV accuracy by:

This is the average accuracy calculated on 5 folds. To see, respective accuracy each fold, we can do this:

3  3    0.7598039

Now, we’ll train the model and check the prediction accuracy on test data.

Woah! This algorithm gave us a significant boost in accuracy. Moreover, this is a stable model since our CV score and leaderboard score matches closely. This submission returns accuracy of 79.16%. Good, we are improving now. Let’s get ahead to the next algorithm.

A decision tree is said to capture non-linear relations better than a logistic regression model. Let’s see if we can improve our model further. This time we’ll hyper tune the tree parameters to achieve optimal results. To get the list of parameters for any algorithm, simply write (in this case rpart):

This will return a long list of tunable and non-tunable parameters. Let’s build a decision tree now. Make sure you have installed the rpart package before creating the tree learner:

I’m doing a 3 fold CV because we have less data. Now, let’s set tunable parameters:

)

As you can see, I’ve set 3 parameters. minsplit represents the minimum number of observation in a node for a split to take place. minbucket says the minimum number of observation I should keep in terminal nodes. cp is the complexity parameter. The lesser it is, the tree will learn more specific relations in the data which might result in overfitting.

You may go and take a walk until the parameter tuning completes. May be, go catch some pokemons! It took 15 minutes to run at my machine. I’ve 8GB intel i5 processor windows machine.

# [1] 0.001

It returns a list of best parameters. You can check the CV accuracy with:

0.8127132

Using setHyperPars function, we can directly set the best parameters as modeling parameters in the algorithm.

getLearnerModel(t.rpart)

Decision Tree is doing no better than logistic regression. This algorithm has returned the same accuracy of 79.14% as of logistic regression. So, one tree isn’t enough. Let’s build a forest now.

Random Forest is a powerful algorithm known to produce astonishing results. Actually, it’s prediction derive from an ensemble of trees. It averages the prediction given by each tree and produces a generalized result. From here, most of the steps would be similar to followed above, but this time I’ve done random search instead of grid search for parameter tuning, because it’s faster.

)

)

Though, random search is faster than grid search, but sometimes it turns out to be less efficient. In grid search, the algorithm tunes over every possible combination of parameters provided. In a random search, we specify the number of iterations and it randomly passes over the parameter combinations. In this process, it might miss out some important combination of parameters which could have returned maximum accuracy, who knows.

Now, we have the final parameters. Let’s check the list of parameters and CV accuracy.

0.8192571

[1] 168

[1] 6

[1] 29

Let’s build the random forest model now and check its accuracy.

Support Vector Machines (SVM) is also a supervised learning algorithm used for regression and classification problems. In general, it creates a hyperplane in n dimensional space to classify the data based on target class. Let’s step away from tree algorithms for a while and see if this algorithm can bring us some improvement.

Since, most of the steps would be similar as performed above, I don’t think understanding these codes for you would be a challenge anymore.

)

0.8062092

This model returns an accuracy of 77.08%. Not bad, but lesser than our highest score. Don’t feel hopeless here. This is core machine learning. ML doesn’t work unless it gets some good variables. May be, you should think longer on feature engineering aspect, and create more useful variables. Let’s do boosting now.

6. GBM

Now you are entering the territory of boosting algorithms. GBM performs sequential modeling i.e after one round of prediction, it checks for incorrect predictions, assigns them relatively more weight and predict them again until they are predicted correctly.

)

n.minobsinnode refers to the minimum number of observations in a tree node. shrinkage is the regulation parameter which dictates how fast / slow the algorithm should move.

The accuracy of this model is 78.47%. GBM performed better than SVM, but couldn’t exceed random forest’s accuracy. Finally, let’s test XGboost also.

Xgboost is considered to be better than GBM because of its inbuilt properties including first and second order gradient, parallel processing and ability to prune trees. General implementation of xgboost requires you to convert the data into a matrix. With mlr, that is not required.

As I said in the beginning, a benefit of using this (MLR) package is that you can follow same set of commands for implementing different algorithms.

)

)

Terrible XGBoost. This model returns an accuracy of 68.5%, even lower than qda. What could happen ? Overfitting. So, this model returned CV accuracy of ~ 80% but leaderboard score declined drastically, because the model couldn’t predict correctly on unseen data.

What can you do next? Feature Selection ?

For improvement, let’s do this. Until here, we’ve used trainTask for model building. Let’s use the knowledge of important variables. Take first 6 important variables and train the models on them. You can expect some improvement. To create a task selecting important variables, do this:

Also, try to create more features. The current leaderboard winner is at ~81% accuracy. If you have followed me till here, don’t give up now.

End Notes

The motive of this article was to get you started with machine learning techniques. These techniques are commonly used in industry today. Hence, make sure you understand them well. Don’t use these algorithms as black box approaches, understand them well. I’ve provided link to resources.

What happened above, happens a lot in real life. You’d try many algorithms but wouldn’t get improvement in accuracy. But, you shouldn’t give up. Being a beginner, you should try exploring other ways to achieve accuracy. Remember, no matter how many wrong attempts you make, you just have to be right once.

You might have to install packages while loading these models, but that’s one time only. If you followed this article completely, you are ready to build models. All you have to do is, learn the theory behind them.

Got expertise in Business Intelligence  / Machine Learning / Big Data / Data Science? Showcase your knowledge and help Analytics Vidhya community by posting your blog.

Related

Javascript Seo: Best Practices And Debugging Tools

JavaScript is a great option to make website pages more interactive and less boring.

But it’s also a good way to kill a website’s SEO if implemented incorrectly.

Here’s a simple truth: Even the best things in the world need a way to be found.

No matter how great your website is, if Google can’t index it due to JavaScript issues, you’re missing out on traffic opportunities.

In this post, you’ll learn everything you need to know about JavaScript SEO best practices as well as the tools you can use to debug JavaScript issues.

Why JavaScript Is Dangerous for SEO: Real-World Examples

“Since redesigning our website in React, our traffic has dropped drastically. What happened?”

This is just one of the many questions I’ve heard or seen on forums.

You can replace React with any other JS framework; it doesn’t matter. Any of them can hurt a website if implemented without consideration for the SEO implications.

Here are some examples of what can potentially go wrong with JavaScript.

Example 1: Website Navigation Is Not Crawlable

What’s wrong here:

The links in the navigation are not in accordance with web standards. As a result, Google can’t see or follow them.

Why it’s wrong:

It makes it harder for Google to discover the internal pages.

The authority within the website is not properly distributed.

There’s no clear indication of relationships between the pages within the website.

As a result, a website with links that Googlebot can’t follow will not be able to utilize the power of internal linking.

Example 2: Image Search Has Decreased After Improper Lazy Load Implementation

What’s wrong here:

While lazy loading is a great way to decrease page load time, it can also be dangerous if implemented incorrectly.

In this example, lazy loading prevented Google from seeing the images on the page.

Why it’s wrong:

The content “hidden” under lazy loading might not be discovered by Google (when implemented incorrectly).

If the content is not discovered by Google, the content is not ranked.

As a result, image search traffic can suffer a lot. It’s especially critical for any business that heavily relies on visual search.

Example 3: The Website Was Switched to React With No Consideration of SEO

What’s wrong here:

This is my favorite example from a website I audited a while ago. The owner came to me after all traffic just tanked. It’s like they unintentionally tried to kill their website:

The URLs were not crawlable.

The images were not crawlable.

The title tags were the same across all website pages.

There was no text content on the internal pages.

Why it’s wrong:

If Google doesn’t see any content on the page, it won’t rank this page.

If multiple pages look the same to Googlebot, it can choose just one of them and canonicalize the rest to it.

In this example, the website pages looked exactly the same to Google, so it deduplicated them and used the homepage as a canonical version.

A Few Things You Need to Know About Google–JavaScript Relationships

When it comes to how Google treats your content, there are a few main things you should know.

Google Doesn’t Interact With Your Content

Googlebot can see only the content available in rendered HTML without any additional interaction.

For example, if you have an expandable text section, and its text is available in the source code or rendered HTML, Google will index it.

Google Doesn’t Scroll

Googlebot does not behave like a usual user on a website; it doesn’t scroll through the pages. So if your content is “hidden” behind an endless amount of scrolls, Google won’t see it.

See: Google’s Martin Splitt on Indexing Pages with Infinite Scroll.

Google doesn’t see the content which is rendered only in a browser vs on a server.

That’s why client-side rendering is a bad idea if you want Google to index and rank your website (and you do want it if you need traffic and sales).

Ok, so is JavaScript really that bad?

Not if JavaScript is implemented on a website using best practices.

And that’s exactly what I’m going to cover below.

JavaScript SEO Best Practices Add Links According to the Web Standards

While “web standards” can sound intimidating, in reality, it just means you should link to internal pages using the HREF attribute:

This way, Google can easily find the links and follow them (unless you add a nofollow attribute to them, but that’s a different story).

Don’t use the following techniques to add internal links on your website:

window.location.href=‘/page-url‘

#page-url

By the way, the last option can still be successfully used on a page if you want to bring people to a specific part of this page.

But Google will not index all individual variations of your URL with “#” added to it.

See: Google SEO 101: Do’s and Don’ts of Links & JavaScript.

Add Images According to the Web Standards

As with internal links, image usage should also follow web standards so that Googlebot can easily discover and index images.

To be discovered, an image should be linked from the ‘src’ HTML tag:

For example:

It helps with page speed optimization and works well if implemented correctly.

During the recent Google Search Central Live event, I did a live case study of how to debug issues with images lazy-loaded using a JavaScript Library.

Alternatively, you can eliminate JavaScript by using native lazy loading. which is now supported by many browsers.

Use Server-Side Rendering

If you want Google to read and rank your content, you should make sure this content is available on the server, not just in a user’s browser.

Alternatively, you can use dynamic rendering which implies detecting search engines and serving them static HTML pages while users are served HTML + JavaScript content in their browsers.

Make Sure That Rendered HTML Has All the Main Information You Want Google to Read

You need to make sure that rendered HTML shows the right information such as:

Copy on the page.

Images.

Canonical tag.

Title & meta description.

Meta robots tag.

Structured data.

hreflang.

Any other important tags.

Tools for Debugging JavaScript Implementation for SEO

Gone are the days when you’d only needed to look at the source code of a page to check if it includes the right content.

JavaScript has made it more complicated, in that it can add, remove or change different elements. Looking at the source code is not enough; you need to check the rendered HTML instead.

Step 1: Check How Much a Website Relies on JavaScript to Serve the Content

The first thing that I usually do when I see a website that relies on JavaScript is to check how much it depends on it. The easiest way to do this is to disable JS in your browser.

I use the Web Developer Chrome extension for that.

Once you do it, you’ll see how a page would look without any JS.

In the example above, you can see that no content is available without JavaScript.

Note that this method just gives you an overview of how much JavaScript influences content delivery. It does not tell you if Google will index it or not.

Even if you see a blank page like above, it doesn’t mean that nothing’s working. It just means that a website heavily relies on JavaScript.

That’s why you need to test the rendered HTML with the tools I’ll show you in the next step.

Step 2: Check if Googlebot Is Served the Right Content and Tags

Google Mobile-friendly Test Tool

Google’s Mobile-friendly Test Tool is one of the best and most reliable tools when it comes to checking mobile-rendered HTML because you get information right from Google.

What you need to do:

Load the Mobile-friendly tool.

Check your URL.

Look at the info in the HTML tab:

That’s where the technical SEO side comes in, as you’ll have to check the code to make sure it has the right information.

Note: you can use the Rich Results Test tool to do these checks, too:

URL Inspection Tool in Google Search Console

The URL Inspection tool also gives you access to the raw HTML of your page that Googlebot uses for evaluating your page content:

The Mobile-friendly Test Tool vs URL Inspection Tool

Ok, so what’s the difference between these tools and which is preferred?

The short answer is that there’s no difference in the output since the Mobile-Friendly Test and URL inspection tool use the same core technology.

There are some differences in other aspects, though:

To use the URL Inspection Tool, you need to have access to the Google Search Console of the website you’re checking. If you don’t have such access, use the Mobile-Friendly Test (or Rich Results Test).

The URL inspection tool can show you two versions of the same page — the last crawled version and the live version. It’s handy if something has just been broken by JavaScript and you can compare the new implementation to the previous one.

The Mobile-Friendly Test and Rich Results Test give you the output for your current live page version only.

Other Debugging Tools

View Rendered Source Chrome Extension

I love this extension as it shows the difference between the source code and rendered HTML. It gives you an overview of what JavaScript changes on the page:

Note: Make sure you check mobile rendered HTML vs desktop.

To do this, you need to first load a mobile view in the Chrome inspection tool and then use the View Rendered Source extension:

JavaScript Rendering Check

I think this is the most user-friendly JS debugging tool as you don’t even need to check the code.

It checks the main elements in the page source code for you and compares them to the same elements in the rendered HTML (again, make sure to check the mobile version):

In this example, I see that JavaScript changes the main elements on the page such as the Title Tag, canonical, internal links.

It’s not always a bad thing but as an SEO professional, you’ll need to investigate whether these changes harm the page you’re checking or not.

You can also use the SEO Pro extension to see the Title tag and other important tags found in rendered HTML, not source code:

I prefer using a combination of the tools mentioned above to investigate JavaScript SEO issues or ensure that best practices are implemented.

More Resources:

Image Credits

All screenshots taken by author, March 2023

Update the detailed information about Lean Manufacturing Tools And Techniques (Latest) on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!