You are reading the article Knime Tutorial – A Friendly Introduction To Components Using The Knime Analytics Platform updated in December 2023 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Knime Tutorial – A Friendly Introduction To Components Using The Knime Analytics Platform
This article was published as a part of the Data Science Blogathon.
In the last article A Friendly Introduction to KNIME Analytics Platform I provided a brief insight into the open-source software KNIME Analytics Platform and what it is capable of. With the help of a customer segmentation example, I showed the general functions of KNIME Analytics Platform.
This article takes up a topic that was briefly mentioned at the end of the last article: Components. I’ll provide an in-depth explanation of what components are, what functionalities they have, and why they are useful. Additionally, I’ll show you how to create and setup your very own component within a few steps.
Component vs. Metanode in KNIMEA component is not to be confused with a metanode. At first glance, they might look similar as they both incorporate a group of nodes. However, the only purpose of metanodes is to visually improve your workflow by collapsing logical groups of nodes. This gives the workflow a clearer structure, which is especially helpful when sharing the workflow with your peers.
Components on the other hand have sophisticated characteristics, which make them a very powerful feature in KNIME Analytics Platform. They encapsulate functionalities, even very complex functionalities, that can be reused in other workflows at any time. In the following, I will highlight the aspects which differentiate a component from a metanode.
1. Components have an Interactive ViewIn the case of the Scatter Plot node, a separate window pops up displaying the scatter plot, where you can, for example, zoom in and out, change the attributes displayed on the x- and y-axis, or rename the plot and its axes.
As soon as a component contains a node with an Interactive View, this interactive view becomes part of the component’s interactive view and can be accessed from a web browser when running on the KNIME Server. Metanodes, on the other hand, do not have that feature (Fig. 2).
Fig. 2. Component vs. Metanode: Interactive View.
2. Components can also have a configuration windowUsually, nodes in KNIME Analytics Platform need to be configured (or at least there is the option to configure the node). Components can also have a Configuration Window. It’s not necessarily required but you have the option to create a configuration window for your component. You can place configuration nodes inside your component, which enable a configuration window for the component. This means you can change the parameters of the component without changing the setting of the nodes inside the component. Fig. 3 shows the configuration dialog of the Optimized k‑Means (Silhouette Coefficient) component, which I already introduced in the previous article and can be accessed via the KNIME Hub. It’s comparable with the configuration window of a KNIME node. Metanodes can’t be configured.
Fig. 3. The configuration window of the Optimized k-Means (Silhouette Coefficient) component.
3. Components can enclose flow variablesYou may ask yourself: What are flow variables? Flow variables are parameters that can be of any data type (e.g., string, integer, etc.). They are used to automatically update specific settings of a node. For example, if you need to always filter a dataset on the current date, use a flow variable. A flow variable of data type Date&Time updated at every workflow execution with the current date can be used to overwrite the setting of the filter node. You find all sorts of flow variable nodes in the Node Repository under the Workflow Control category (see Fig. 4).
Fig. 4. The Node Repository and where to find flow variable nodes. Fig. 5. The Component Output dialog.
To learn more about flow variables and workflow control in general, have a look at the KNIME Documentation.
4. Components can be sharedOne last thing to say – components can be reused and shared with others. The component template can be stored somewhere and a linked instance created by drag&drop of the component template into the workflow editor. The component can then be used like any other KNIME node. Whenever the workflow is opened, a check is carried out to see if any updates have been made in the component template.
1. Local Workspace
This option allows you to save the component template to your local workspace, which makes it accessible to you only for the further usage. This is better than simply copy-pasting the component because component updates are automatically transmitted to all the linked instances.
2. KNIME Hub
You can either save the component template to your private space or to your public space on the KNIME Hub. By uploading it to your public space, other KNIME users can create a linked instance to your component. Either way, you can control the component template, by logging in to your KNIME Hub account. This makes it independent from your local installation.
3. KNIME Server
A third option is to share the component via a KNIME Server. In this case you will be able to access it from any connected KNIME Server client.
The Example: Encapsulating Functionalities of a Workflow in KNIMENow that you’ve learned about components and their functionalities, it’s time to show you how to build one. To make things easy, let’s recall the example workflow I showed in the last article (you can download the workflow from the KNIME Hub). Here, I showed you how to cluster customer data according to the two attributes Annual Income (k$) and Spending Score (1‑100), using k-Means clustering.
Let’s now build a component, which bundles together some functionalities of that workflow, and share it to become able to reuse it in other workflows.
1. Creating the ComponentFig. 6. Creating a component from all the selected nodes, i.e., all nodes in this workflow besides the reader node.
As you can see, creating a component is not complicated at all. The important part though is setting up the component appropriately.
2. Setting up the Component1. Component Output
Fig. 7. The Setup Component Wizard.
2. Component Description
3. Component Configuration DialogAs already mentioned above, components can be configured via a configuration window. To create such a configuration window the Configuration Nodes are used which are located under the Workflow Abstraction category in the Node Repository (Fig. 9).
Fig. 9. The Node Repository and where to find the Configuration Nodes.
Let’s demonstrate how this works on an example. One very important parameter of the k-Means Clustering algorithm is k – the number of clusters. It makes sense to have the option to adjust k from the component configuration window, without changing the settings of the k-Means node inside the component. We do this with the Integer Configuration node. In the configuration window of the node (Fig. 10), you can label and describe the option, give the variable a name and you can define a minimum and a maximum value as well as a default value.
Fig. 10. The Configuration Window of the Integer Configuration node.
Additionally, we want our component to be generalizable to different input data. Depending on the data, it’s not always feasible to include all attributes for clustering because some attributes won’t affect the clustering (e.g., a column assigning an ID to each customer) and the k-Means Clustering algorithm generally can’t handle non-numerical data. Thus, an Include-Exclude panel in the Configuration Dialog of the component will be helpful. Using the Column Filter Configuration node (Fig. 11) enables specification of what types of inputs are allowed, or what the minimum number of required columns is.
Fig. 11. The Column Filter Configuration node. Fig. 12. The Configuration Window of the component.
4. Component Composite ViewThe Composite View of a component contains all the Interactive Views that are part of the component. Besides placing a node with an Interactive View inside the component, nothing else needs to be done. In this example, the Scatter Plot node is placed inside the component and its view can be seen in the component composite view, upon successful execution. Go on – try it out!
Note: The layout of the Composite View can be customized, which for example becomes important when you want to create an Interactive Dashboard. But this is a topic for another time… let’s first complete this article on components before tackling new topics.
Summary of our KNIME ArticleComponents are a powerful feature of KNIME Analytics Platform. Indeed, if a node you need is missing, you could build it yourself. If you cannot code, you can create your new node via a component. A component is the natural evolution of a metanode.
With this article, I tried to give you a brief insight into the topic of components within KNIME Analytics Platform. The example showed in this article containing the component can be downloaded from the KNIME Hub.
If you wish to learn more about components and KNIME Analytics Platform, check out the KNIME Documentation or the learning materials on our KNIME site.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion
Related
You're reading Knime Tutorial – A Friendly Introduction To Components Using The Knime Analytics Platform
What Analytics Metrics Should A Video Streaming Platform Provider Track?
Successful OTT video streaming platform growth starts with insights into content performance and user behavior analysis. Without tracking analytics and constantly enhancing your service, you can lose your viewers. People will simply find a new service that is comfortable to use.
What are the benefits of tracking analytics?From our point of view, they are the following:
Getting to know your viewers. A good-quality OTT solution will provide you with analytics tools. You will have deep insights into your audience’s behavior, preferences, and interests. You will be able to learn more information about them and understand what they like about your platform and content and what they don’t like. It is the way to optimize the content you create in the future. Moreover, you will be able to offer more personalized recommendations to your viewers. Consequently, users will be more satisfied with your platform.
Enhancing your service. When tracking analytics, a video streaming platform provider will find potential problems with the service. For example, you may find that a video doesn’t play when a user requests it, or videos are poorly categorized, and people cannot find what they are looking for. That can be the reason why they leave the platform.
As we said above, analytics is a way to optimize and enhance service. Don’t neglect it.
Metrics that a video streaming service provider should trackThese are data that show the performance of the service – how well the service delivers the videos and how well the delivery infrastructure operates. For example, metrics can be the following:
Bit rate. It helps you understand what video quality your viewers experience. It is an essential metric as many people prefer watching a high-quality video, sometimes even in 4K.
Buffer fill. When users press to play a video, they wait for some time before it actually starts. Tracking this metric will help you understand how long your viewers wait until a video plays.
Viewers expect to watch a video in one playback. But sometimes, a video halts, and the viewer needs to wait until it continues playing. Some customers will wait patiently for a video to keep going, and some people will immediately leave the service.
You also need to track data about your users’ preferences, interests, and behavior. And metrics that can tell you more about your customers. For example, these metrics can include:
Plays and views. It is the number of times your video has been played. You can understand whether a video is watched or not and why. There can be a technical issue, or this video is not played in a particular browser, and so on.
Watch time. This metric shows whether viewers watch the whole video or several seconds of it. The watch time metric shows the general amount of time that all your users watched your video. Combined with other metrics, watch time can show you more specific information. For example, maybe people using tablets cannot play a particular video.
Audience retention and engagement. This metric can show you users’ behavior during the video. You can understand what parts of the video are the most popular and what parts users prefer to rewind.
Also, it is important to track the traffic, the number of new users, the bounce rate, the devices people use, their geographic location, and so on. When analyzed, this data can help your service thrive.
The Bottom LineSuccessful OTT video streaming platform growth starts with insights into content performance and user behavior analysis. Without tracking analytics and constantly enhancing your service, you can lose your viewers. People will simply find a new service that is comfortable to use. OTT analytics provides data that can show your weaknesses and strengths when analyzed. Many businesses neglect to track it. As a result, they don’t know how to optimize the service, and soon they get out of the competition. Below, we take a closer look at the benefits of tracking analytics and important metrics for a video streaming chúng tôi our point of view, they are the following:As we said above, analytics is a way to optimize and enhance service. Don’t neglect it.These are data that show the performance of the service – how well the service delivers the videos and how well the delivery infrastructure operates. For example, metrics can be the following:You also need to track data about your users’ preferences, interests, and behavior. And metrics that can tell you more about your customers. For example, these metrics can include:Also, it is important to track the traffic, the number of new users, the bounce rate, the devices people use, their geographic location, and so on. When analyzed, this data can help your service chúng tôi neglecting the analytics of your video streaming service, you lose a chance to enhance your service, as analytics can provide you with multiple ideas. While your competitors track it and constantly improve the quality of their service, your business soon can fail and stop operating.
A Quick Introduction To K – Nearest Neighbor (Knn) Classification Using Python
This article was published as a part of the Data Science Blogathon.
IntroductionThis article concerns one of the supervised ML classification algorithm-KNN(K Nearest Neighbors) algorithm. It is one of the simplest and widely used classification algorithms in which a new data point is classified based on similarity in the specific group of neighboring data points. This gives a competitive result.
Working
For a given data point in the set, the algorithms find the distances between this and all other K numbers of datapoint in the dataset close to the initial point and votes for that category that has the most frequency. Usually, Euclidean distance is taking as a measure of distance. Thus the end resultant model is just the labeled data placed in a space. This algorithm is popularly known for various applications like genetics, forecasting, etc. The algorithm is best when more features are present and out shows SVM in this case.
KNN reducing overfitting is a fact. On the other hand, there is a need to choose the best value for K. So now how do we choose K? Generally we use the Square root of the number of samples in the dataset as value for K. An optimal value has to be found out since lower value may lead to overfitting and higher value may require high computational complication in distance. So using an error plot may help. Another method is the elbow method. You can prefer to take root else can also follow the elbow method.
Let’s dive deep into the different steps of K-NN for classifying a new data point
Step 1: Select the value of K neighbors(say k=5)
Step 2: Find the K (5) nearest data point for our new data point based on euclidean distance(which we discuss later)
Step 3: Among these K data points count the data points in each category
Step 4: Assign the new data point to the category that has the most neighbors of the new datapoint
Example
Let’s start the programming by importing essential libraries
import numpy as np import matplotlib.pyplot as plt import pandas as pd import sklearn
Importing of the dataset and slicing it into independent and dependent variables
dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [1, 2, 3]].values y = dataset.iloc[:, -1].values
Since our dataset containing character variables we have to encode it using LabelEncoder
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() X[:,0] = le.fit_transform(X[:,0])
We are performing a train test split on the dataset. We are providing the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Next, we are doing feature scaling to the training and test set of independent variables for reducing the size to smaller values
from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
Now we have to create and train the K Nearest Neighbor model with the training set
from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2) classifier.fit(X_train, y_train)
We are using 3 parameters in the model creation. n_neighbors is setting as 5, which means 5 neighborhood points are required for classifying a given point. The distance metric we are using is Minkowski, the equation for it is given below
As per the equation, we have to select the p-value also.
In our problem, we are choosing the p as 2 (also u can choose the metric as “euclidean”)
Our Model is created, now we have to predict the output for the test set
y_pred = classifier.predict(X_test)
y_test
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1], dtype=int64) array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)We can evaluate our model using the confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score cm = confusion_matrix(y_test, y_pred) ac = accuracy_score(y_test,y_pred)
confusion matrix –
[[64 4] [ 3 29]]accuracy is 0.95
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, -1].values # Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) # Training the K-NN model on the Training set from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2) classifier.fit(X_train, y_train) # Predicting the Test set results y_pred = classifier.predict(X_test) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix, accuracy_score cm = confusion_matrix(y_test, y_pred) ac = accuracy_score(y_test, y_pred)
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Related
Koboldai Tutorial: The Guide To Ai
See More: How to Create AI Videos with HeyGen
KoboldAI offers several notable features that make it a valuable tool for AI-assisted writing. Let’s take a closer look at some of these features:
KoboldAI Lite is a volunteer-based version of the platform that generates tokens for users. This feature enables users to access the core functionality of KoboldAI and experience its capabilities firsthand.
To utilize KoboldAI, you need to install the software on your own computer. The installation process involves downloading the software from the KoboldAI GitHub page and installing the necessary dependencies. However, please note that the installation process may vary depending on your operating system.
KoboldAI boasts a collection of local and remote AI models specifically designed for writing purposes. These models are developed by the KoboldAI community and are either uploaded by their original finetune authors or with their permission. The availability of multiple models provides users with a range of options to suit their specific writing needs.
KoboldAI has a dedicated subreddit where users can interact, seek assistance, and discuss various aspects of the platform. With over 6.5K members, the subreddit stands as one of the largest communities on Reddit, highlighting the vibrant and engaged user base of KoboldAI.
To begin your journey with KoboldAI, you must first install the platform on your computer. Follow these steps to get started:
Visit the KoboldAI GitHub page by navigating to the official repository.
Select the “Download ZIP” option to download the software package.
Once the ZIP file is downloaded, extract its contents to a suitable location on your computer.
For more detailed instructions on the installation process, you can refer to the step-by-step guide available on Cloudbooklet.
Once you have successfully installed KoboldAI, you can begin using the platform to generate text. Follow these steps to run KoboldAI:
Open the console application on your computer.
Launch a new tab in your preferred web browser.
When you first start KoboldAI, you will be greeted with a prompt that says, “Welcome to KoboldAI! You are running ReadOnly. Please load or import a story to read.” To load or import a story, follow these steps:
Open the KoboldAI web interface in your browser.
Choose the desired file from your computer to load it into KoboldAI.
Alternatively, you can also copy and paste your existing text into the editor.
Once the story is loaded or imported, you can proceed to interact with KoboldAI and explore its various writing capabilities.
See More: How to Use Romantic AI Chatbot?
KoboldAI provides an interactive interface where you can engage with the AI model and generate text. Here’s how you can use KoboldAI to enhance your writing process:
After loading or importing a story, you will see the existing text in the editor. You can make edits, add new content, or simply use it as a starting point for your writing.
You can experiment with different inputs and explore the suggestions provided by KoboldAI. It’s important to note that the AI-generated text may require further refinement and editing to align with your writing style and intentions.
If you encounter any issues or need assistance, you can refer to the KoboldAI subreddit or community forums to seek help from other users.
To make the most out of your KoboldAI experience, consider the following tips:
Experiment with Different Prompts: KoboldAI responds to the prompt you provide. Try different prompts to explore various writing directions and generate diverse ideas.
Edit and Refine: The AI-generated text is a starting point. Take time to edit and refine the output to ensure it aligns with your desired style and message.
Utilize Specific Instructions: If you have specific requirements or guidelines for your writing, provide clear instructions in your prompt to steer the AI’s response in the desired direction.
Collaborate and Share: Engage with the KoboldAI community to share your experiences, seek feedback, and collaborate with other writers. The community can provide valuable insights and support.
KoboldAI offers a powerful AI-assisted writing platform that can enhance your creative process and help you generate compelling text. With its diverse AI models, active community, and user-friendly interface, KoboldAI provides a valuable resource for writers of all levels. By installing and using KoboldAI, you can unlock new possibilities, explore different writing styles, and refine your skills. Embrace the potential of AI-assisted writing with KoboldAI and let your creativity flourish.
Share this:
Like
Loading…
Related
The Best Iot Platform As A Service Solution
IoT platforms: What are They?
The Internet of Things platform is a multi-layer technology that controls and automates linked devices. In other terms, it is a service that enables you to deliver tangible goods digitally. You can use the services offered by this platform to link gadgets for machine-to-machine communication.
The Internet of Things (IoT) software connects edge devices, access points, and data networks to the other end, which is typically the end-user application.
Platforms for the Internet of ThingsIoT platforms are available to address every area while creating an IoT product.
For the creation of Internet of Things (IoT) devices, hardware development platforms include physical development boards that include microcontrollers, microprocessors, systems on chip (SoC), and systems on module (SoM).
Platforms for developing apps act as an integrated development environment (IDE) with tools and functionalities.
Platforms for connectivity offer the communication tools needed to send information between physical items and data centers (on-premise or in the cloud). MQTT, DDS, AMQP, Bluetooth, ZigBee, WiFi, cellular, LoRaWAN, and other prevalent connection protocols and standards for the Internet of Things are just a few examples.
Analytics platforms use sophisticated algorithms to analyze collected data and transform it into customer-actionable insights.
All facets of IoT goods, from creation and communication to data administration and visualization, are covered by end-to-end IoT platforms.
Amazon Web Services IOT platformIn the consumer cloud industry, Amazon is king. In 2004, they were the first to genuinely make cloud computing a commodity. Since then, they have worked hard to innovate and add capabilities, and they now likely have the complete set of tools on the market. It is a very scalable platform, promising to handle trillions of interactions between billions of devices. The cost is determined by how many communications AWS IoT sends and receives. Every IoT interaction can be considered a conversation between a server and a device. Amazon assesses a fee per million messages sent or received between the endpoints. You won’t be charged for messages sent to the following AWS accounts because there are no minimum fees.
S3 Amazon
DynamoDB by Amazon
Lambda on AWS
Google Kinesis
NS Amazon
Google SQS
They also offer a software development kit (SDK) for building and running apps on AWS, which developers can use.
Blynk IoT platformWith the Blynk IoT platform, linked electrical devices may be built and managed at any scale using an integrated suite of low-code software. The only platform that provides native mobile apps for your devices and a complete IoT development infrastructure allows for an easy transition to production-grade solutions that support complicated enterprise use cases and quick prototyping with IoT capabilities that are ready to use.
Native mobile app builder with low-code Apps may be distributed to shops under a white label.
A wide range of hardware compatibility runs on over 400 hardware modules and allows connections to many different libraries.
WiFi, Ethernet, Cellular, Serial, USB, and Bluetooth connectivity protocols are all supported (BETA).
Strong web console with an intuitive and tidy user interface.
Reliable cloud infrastructure for the creation of IoT products of any size.
Management, analytics, data, and logical visualization.
Ready-to-use widgets have many helpful features and come with simple configuration instructions.
Salesforce IoT CloudSalesforce focuses on customer relationship management and expertly uses IoT technology to improve this market.
The Salesforce IoT Cloud platform collects important data from connected devices to provide clients with personalized experiences and foster deeper relationships. Salesforce CRM is used in conjunction with it; data from linked assets are supplied straight to the CRM platform, where context-based actions are started immediately.
For instance, if sensors identify a problem with a windmill’s performance, the CRM dashboard immediately displays the information. The system can either automatically adjust the settings or generate a service ticket.
The Salesforce IoT Cloud’s primary attributes are −
Complete CRM, customer, and product integration
Proactive response to client requirements
OREGA IOTOracle offers endpoint data management and real-time IoT data analysis to assist businesses in making crucial decisions. Utilizing the Oracle IoT cloud platform has additional benefits, such as quick device data analytics and device visualization.
Oracle IoT pricing is determined per device. Each device is limited to a certain number of messages each month, and there is an extra fee if you send more than that.
ParticleParticle provides hardware solutions, including development kits, production modules, asset tracking devices, and an IoT edge-to-cloud platform for managing devices and enabling worldwide connectivity. You can build your product from conception to manufacturing with the help of Particle’s team of IoT specialists, who offer end-to-end professional services.
The Particle platform’s main characteristics are −
REST API integration with outside services
cloud with firewall protection
the ability to work with data from Microsoft Azure or Google Cloud
There is no requirement for technical knowledge to use the platform.
ConclusionIoT clouds from Salesforce and Particle are both simples to use. Due to the flexibility cloud infrastructures bring to corporate operations, businesses rely on them more and more. Hosting, running, and maintaining hardware and software components is no longer your responsibility, nor is it that of your technical staff.
A Gentle Introduction To Bokeh: Interactive Python Plotting Library
This article was published as a part of the Data Science Blogathon.
Data visualization is an important and useful stage in a Data Science project. It skimmed view of the dataset and trace out all the possible strategies to manipulate the data.
There are many visualization libraries present in the market right now. Being in the initial stage of a Data Science project, you may be familiar with matplotlib or seaborn. Every beginner starts with these libraries and they are indeed a good starting point for understanding what plotting is all about and discovering different types of plots.
As we progress with our exploration journey, we always want to upgrade ourselves in terms of new skills and that’s where this article will add a new skill to your current knowledge! Bokeh is a Python library that facilitates the creation of interactive graphs within your Jupyter notebooks or you can also create a standalone web app. Let’s discover this library. Also, I will share a code snippet for each type of plot.
Introduction to BokehBokeh is an interactive visualization library made for Python users. It provides a Python API to create visual data applications in chúng tôi without necessarily writing any JavaScript code. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications. The installation for this library is simple and can be done via pip:
pip install bokehOpen up a new Jupyter notebook and configure the output of the graphs as shown below:
from chúng tôi import output_notebook, show output_notebook() The DatasetThe dataset we will be exploring today is Trending YouTube Video Statistics. The hyperlink will land you on Kaggle from where you can directly download the dataset. One thing to note here is that the dataset has CSVs for multiple countries. For this article, we will be exploring India’s Trending videos.
The corresponding CSV to that is “INvideos.csv”. Let’s look at the df.info() for all the information about columns:
Date-Time conversion:
The current date in the dataset is like this: 17.14.11 It is in the format of year-month-day and pandas may not recognize this. That’s why we will specify the format while converting:
df["trending_date"] = pd.to_datetime(df.trending_date, format='%y.%d.%m')Mapping Categories: The column category_id has numbers between 1 to 44. As the name suggests, these are ids of the categories of a video. These include entertainment, news, films, or trailers. In the Kaggle dataset, there is a JSON file named IN_category_id.json which contains the mapping of these ids to relevant categories for India. In the code below, the JSON is loaded, extracted the required information and then appended changes:
with open("IN_category_id.json", 'r') as f: categories = json.load(f) mappedCategories = {} for i in categories['items']: mappedCategories[i['id']] = i['snippet']['title'] df['category_id'] = df.category_id.astype(str).map(mappedCategories)We have ample features to be plotted for various kinds of plots available. Let’s see how to implement each type of plot using the Bokeh library.
Different Kinds of PlotsI am setting the theme of the plots as dark mode using these lines of code:
from chúng tôi import curdoc curdoc().theme = 'dark_minimal'1. Bar Plot
In this bar plot, we will plot the Channel names with the number of times they appeared in the trending section. Many unique channels appeared in the trending section and therefore, I have set two criteria for this. One, they should have a count greater than 150 and sliced down to the top 10 entries.
from chúng tôi import show from bokeh.plotting import figure fig = figure(x_range=temp.index.tolist(), title="Major Channels Which Made it To Trending List", plot_width=950) fig.vbar(x=temp.index.tolist(), top=temp.values.tolist(), width=0.9, ) fig.ygrid.visible = False fig.xgrid.visible = False fig.xaxis.major_label_orientation = pi/4 fig.xaxis.axis_label = 'Channels' fig.yaxis.axis_label = 'Number of times in Trending' show(fig)2. Pie Chart
Pie charts help in looking at a category contribution in a feature. The area covered by each category helps us to determine the overall impact of that category on other ones. It can be called a visual presentation of the percentages. In the plot below, we are looking at different categories of videos that made it to the trending section.
from chúng tôi import show from bokeh.palettes import Category20c from bokeh.plotting import figure from bokeh.transform import cumsum temp = df.category_id.value_counts() data = pd.Series(temp).reset_index(name='value').rename(columns={'index':'categories'}) data['angle'] = data['value']/data['value'].sum() * 2*pi data['color'] = Category20c[len(temp)] p = figure(title="Categories Contribution to Trending Section", toolbar_location=None, tools="hover", tooltips="@categories: @value") p.wedge(x=0, y=1, radius=0.6, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color="white", fill_color='color', legend_field='categories', source=data) p.axis.axis_label=None p.axis.visible=False p.grid.grid_line_color = None show(p)We can clearly see that entertainment videos are most commonly found in the trending section.
3. Scatter Plot
Scatter plots are very useful to analyze a trend over a range of values. In the plot below, the number of videos trending on a particular day. One thing to note here is that the date column was not supported as a string in hover tools and that’s why I had to create a separate column that contains the string type of the dates. Look at the implementation below:
from bokeh.plotting import figure, show from bokeh.models import DatetimeTickFormatter temp = df.trending_date.value_counts() data = pd.Series(temp).reset_index(name='value').rename(columns={'index':'dates'}) data['hoverX'] = data.dates.astype(str) p = figure(title="Trending Video Each Day", x_axis_type="datetime", tools="hover", tooltips="@hoverX: @value") p.scatter(x='dates', y='value',line_width=2, source=data) p.xaxis.major_label_orientation = pi/4 p.xaxis.axis_label = 'Timeline' p.yaxis.axis_label = 'Number of Videos Trending' show(p)Looking at these codes, you must be thinking that it is not an easy task and requires a lot of inputs for each element of a graph. That’s why most of the users use pandas-bokeh which provides the Bokeh plotting backend to pandas. It is very easy to use and doesn’t require this much code!
Plots using pandas-bokehPandas-Bokeh is a great module that allows you to plot Bokeh graphs directly from your data frames with all the hovering tools, labeled axis, and much more! In the first step, you need to install this module:
import pandas_bokehNext, we also need to set the output of these graphs in our notebook:
pandas_bokeh.output_notebook()See an example plot below (using pandas-bokeh):
title=’Comments Disabled or Not?’);
Yes, only this much code was required for this pie chart! You can compare it with the pie chart made with pure bokeh in the above section.
ConclusionIn this article, I walked you through the plotting library Bokeh. Plotting graphs is an important aspect of a data science project and allows you to filter out important features. For instance, a box plot can help in eliminating outliers. The points outside the whiskers are considered as outliers as data between whiskers is 50% of the whole. Likewise is a histogram that helps in analyzing the distribution of data.
If you have any doubts, queries, or potential opportunities, then you can reach out to me via
1. Linkedin – in/kaustubh-gupta/
2. Twitter – @Kaustubh1828
3. GitHub – kaustubhgupta
4. Medium – @kaustubhgupta1828
Related
Update the detailed information about Knime Tutorial – A Friendly Introduction To Components Using The Knime Analytics Platform on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!