Trending December 2023 # A Simple User Guide For Ubuntu Oneiric # Suggested January 2024 # Top 16 Popular

You are reading the article A Simple User Guide For Ubuntu Oneiric updated in December 2023 on the website Katfastfood.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 A Simple User Guide For Ubuntu Oneiric

Ubuntu Oneiric, while added much improvements to its Unity desktop, also brought lot of changes that make it more confusing and difficult to use. Users who have come from previous version of Ubuntu will find that some of their favourite applications are missing and replaced with another application that is either more resource intensive or more difficult to use. New users will also be confused where they can find all their applications and system settings. In this tutorial, we provide a simple user guide to help you familiarize with Ubuntu Oneiric.

1. How to Add/Remove Applications to/from the Launcher Bar 2. How to open an application in Unity desktop

The default method:

Alternative method 1: Install Classicmenu indicator

The ClassicMenu Indicator is a third-party appindicator that brings the classic Gnome menu back. To install:

sudo add-apt-repository ppa:diesch/testing sudo apt-get update sudo apt-get install classicmenu-indicator

Run ClassicMenu indicator from Dash. You should be able to access your applications from the system tray now.

Alternative method 2: use quick launcher app like Gnome Do or Synapse

Quick Launcher app like Gnome Do or Synapse allows you to open your application very quickly. You just have to press a hotkey (Ctrl + Space) to activate the search box and you can quickly search for the app you want.

3. How to Add Quicklist to the Launcher in the Launcher Bar 4. How to Configure the Unity Desktop

The default installation of Ubuntu Oneiric doesn’t come with any option to configure the Unity desktop. You can, however, install CompizConfig Settings Manager to access the configuration menu.

If you are using an old computer that doesn’t support Compiz, most probably you are running Unity 2D instead of the usual Unity desktop. To configure Unity 2D, you have to install dconf-tools by using the following command in the terminal:

sudo apt-get install dconf-tools

Next, in the terminal, type:

5. How to Restore Synaptic package Manager

The useful and popular Synaptic Package Manager was removed from Ubuntu Oneiric in favour of Ubuntu Software Center. Luckily you can easily restore it by installing the Synaptic application. Type the following command in the terminal:

sudo apt-get install synaptic 6. How to install .deb File Without Using Ubuntu Software Center 7. How to Change Login Screen Background

To change the background of the login screen (LightDM), open a terminal and type:

sudo nano /etc/lightdm/unity-greeter.conf

At the field starting with “background=…“, change the background path to your favourite wallpaper. Leave everything else untouched.

Once done, press “Ctrl + O” follow by Enter. Lastly, press “Ctrl + X” to exit.

Log out. You should see your new login background in action.

8. How to Restart Ubuntu Oneiric 9. How to Restore Gnome Classic

If you don’t like the Unity desktop, you can switch back to the classic Gnome desktop.

sudo apt-get install gnome-session-fallback

Log out and choose “Gnome Classic” in the login screen.

Note: The Gnome Classic is running on the Gnome 3 platform, so don’t expect everything to be the same as the old Gnome

Damien

Damien Oh started writing tech articles since 2007 and has over 10 years of experience in the tech industry. He is proficient in Windows, Linux, Mac, Android and iOS, and worked as a part time WordPress Developer. He is currently the owner and Editor-in-Chief of Make Tech Easier.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.

By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.

You're reading A Simple User Guide For Ubuntu Oneiric

A Simple Guide To How Quantum Computers Work

The Basics of Quantum Computing

Even if you have a fibre internet connection, the speed of online services will be determined by the computing power of both the device you use and the servers that host those services. For example, elements (i.e., games) on the «Roulette77 Australia» site can run faster or slower – actual speed will vary according to computing power.

While increasing computing power is the first solution that comes to mind to gain more speed, it has physical limits. The processors of the computers and smartphones we use are subject to the constraints of the physical world: for example, they cannot be smaller than a certain form in size. But if you change your point of view completely, you can get incredible speed on sites that offer localised services such as «Roulette77 India» (and in any business that uses computers in general): this is the promise of quantum computing.

The difference between “bit” and “qbit”

The computers we currently use store and process information as “1” and “0”. These are simply called “bits”. For example, this web page you are currently viewing is a long sequence of 1s and 0s in “bit language”. A bit has only one state. For practical purposes, you can think of 1s as “yes” and 0s as “no”: traditional computers have to constantly switch between “yes” and “no” states when processing information.

For example, a traditional computer would view a DNA sequence as a very long 1-0 code and would take too long to decode the entire sequence because it would have to process each bit in it one by one. Such a job can take years. It is possible to reduce the processing time by upgrading the processor, or by using multiple processors at the same time, but in any case, limitations in the physical world will eventually cause a limit to be reached.

Quantum computers, on the other hand, are built on the “qbit” principle. This is simply an abbreviation of the term “quantum bit”. In this technology, bits do not have a single state: each bit can be both 1 and 0 at the same time. This is a situation inherent in quantum physics and is very difficult to explain in a simple way. Quantum physics basically says that any object can be in any state, and the famous Schrödinger’s cat thought experiment is based on this. A cat can be both alive and dead – both possibilities are valid until the user checks the cat’s state and stabilises the quantum state.

Everything everywhere all at once

This means that quantum computers do not have to process bits one by one to work. A quantum computer can process all data in one go, because the state of the bits that make up that data is not “fixed”. Let’s use the example of DNA sequence again: quantum computers can see that sequence as a whole without having to break it down into bits and find the fastest way to decode it. This makes them 158 million times faster than today’s supercomputers. To give a more practical example, they can do in four minutes what a conventional computer would do in 10,000 years.

How To Schedule Linkedin Posts: A Quick And Simple Guide

Learn how to schedule LinkedIn posts, and free up more time in your day to focus on creating engaging content.

Can you schedule posts on LinkedIn? Yes! It’s actually pretty simple to do — and there’s a couple of ways to go about it.

Bonus: Download a free guide that shows the 11 tactics Hootsuite’s social media team used to grow their LinkedIn audience from 0 to 278,000 followers.

How to schedule posts on LinkedIn

In November 2023, LinkedIn started rolling out a simple native scheduling tool.

To schedule a post on LinkedIn, follow these steps:

Step 1. Sign in to your LinkedIn account and start creating a post

Type out your broem, add hashtags, include a photo or link… you know the drill. Create a post the same way you normally would.

The scheduling icon is right next to the Post button.

3. Select a day and time for your post to go live

4. Hit the Schedule button

You will have a chance to review your post before you do.

That’s it!

How to see and edit scheduled posts on LinkedIn

You can delete scheduled posts or change the publication time using the clock and garbage can icons in the top right corner.

Note that you currently can’t edit scheduled posts on LinkedIn (but you can in Hootsuite — more on that below).

How to schedule LinkedIn posts with Hootsuite

If you’re looking for a more robust tool to handle your LinkedIn marketing, Hootsuite is the way to go. With Hootsuite, you can schedule all your social posts in one place — that includes LinkedIn, Facebook, Instagram, Twitter, TikTok, Pinterest, and YouTube. Plus, you get personalized recommendations for the best times to post to maximize reach and engagement, and you can easily track your performance.

Here’s how to schedule LinkedIn posts in Hootsuite:

Step 1. Add your LinkedIn account to your Hootsuite dashboard

First up, you need to connect Hootsuite and LinkedIn. Note that you can add both LinkedIn profiles and LinkedIn pages to your Hootsuite account.

You only need to do this once. Next time you want to schedule Linked in posts, you can skip ahead to step 2.

Open a new browser window and log out of your LinkedIn account.

Your LinkedIn account is now connected to Hootsuite, and you’re ready to start scheduling.

Step 2. Compose and schedule a LinkedIn post

Under Publish to, choose your LinkedIn page or profile. Then enter the content of your post: text, links, images, and so on.

Tip: This is what the LinkedIn scheduling tool looks like in a free Hootsuite account. With a Professional, Team, Business, or Enterprise account, this stage will be a little different. You’ll see recommended times to post in the scheduling box, rather than having to choose your time manually. Of course, you can always choose your time manually if that’s what you prefer.

That’s it! Your LinkedIn post is now scheduled and will go live at the time you selected.

How to see and edit scheduled LinkedIn posts in Hootsuite

Once you’ve scheduled your LinkedIn content, you have a couple of options if you want to view them or make changes.

Option 1: List view in the Hootsuite dashboard

When you added your LinkedIn account to Hootsuite, it automatically created a new LinkedIn Board. By default, this board contains two streams:

Scheduled, which shows a list of all the content you have scheduled to post to LinkedIn, along with the upcoming posting time for each

Option 2: Calendar view in Hootsuite Planner

For a more comprehensive view of your scheduled LinkedIn posts, including how they fit into your overall social media posting schedule, use the Hootsuite Planner.

Select the Week or Month view and use the arrows or the date selection box to move through your content calendar.

Here’s a quick video with more information about how to use Hootsuite Publisher:

How to schedule multiple LinkedIn posts at once

With the Hootsuite Bulk Composer (available in paid plans), you can schedule up to 350 posts at the same time. These posts can be split between your LinkedIn profile and LinkedIn pages (and your other social accounts).

Step 1. Prepare your bulk post file

Enter the scheduled date and time of your post in Column A, the text of your post in Column B, and an optional link in Column C.

Step 2. Upload your bulk post file

For more details, check out our full blog post on using the Hootsuite bulk composer.

3 tips for scheduling LinkedIn posts 1. Schedule at the right time to increase engagement

Hootsuite’s research shows the best time to post on LinkedIn is 9:00 a.m. on Tuesdays and Wednesdays. But just that’s an average. The exact right time to post for your audience will vary based on location, demographics, and other factors.

As we mentioned above, Hootsuite’s Best Time to Post feature can show you the best time to schedule posts on LinkedIn for your specific audience. You’ll see recommendations right in the scheduling box, but you can also dive into Hootsuite Analytics for more specific scheduling data.

Choose the LinkedIn page or profile you want to analyze. You can see recommendations for the best time to schedule your posts based on various goals:

Increase engagement: Pages and profiles

Drive traffic: Pages and profiles

Build awareness: Pages only

You’ll see a heat map showing when your LinkedIn posts have performed best for the selected goal. You can point to any square to see the average response to your posts for that given day and time.

Try for free

You can also use LinkedIn Analytics to find out more about your LinkedIn followers, which can give you some insight into when they are most likely to be online.

2. Know when to pause your LinkedIn posts

Scheduling LinkedIn posts ahead of time is a great way to save time while maintaining a consistent LinkedIn presence. However, this is not a situation where you can just set it and forget it.

We live and work in a fast-moving world, and it’s important to be aware of major news events, trends, and potential crises that could impact your scheduled posts or make pre-created content inappropriate. (Tip: Social listening is a good way to stay on top of the zeitgeist.)

We’ve already talked about how you can edit, reschedule, or delete individual scheduled LinkedIn posts, but in some situations, it might be best to pause all scheduled content.

In Publisher, all posts will be marked with a Suspended yellow alert and will not publish at their scheduled time.

3. Promote and target scheduled LinkedIn posts

Everything we’ve talked about so far focuses on scheduling organic LinkedIn posts. But you can use the same steps to create scheduled LinkedIn sponsored posts for your business page. You’ll still get the recommended times to post, so you can make the most of your LinkedIn ad budget.

Set up your post following the steps in the first section of this blog post. In Composer, check the box next to Promote this post.

For more details on all the targeting and budget options when scheduling a sponsored LinkedIn post, check out our complete tutorial.

Get Started

Do it better with Hootsuite, the all-in-one social media tool. Stay on top of things, grow, and beat the competition.

Noteable Chatgpt Plugin: User Guide With Examples

With ChatGPT plugins, you can summarize web pages, create personalized content, and analyze data without writing any code. As you explore the world of natural language processing, you may come across the Noteable ChatGPT plugins.

The Noteable ChatGPT plugin is an add-on you can pair with GPT-4 that allows you to quickly and easily query your data using natural language. It can be used to perform data transformations and identify patterns or trends in a user-friendly manner.

With this plugin, you can supercharge your work in a collaborative environment, obtain real-time insights, and solve complex data problems without being a skilled programmer. It’s undoubtedly a beneficial addition to your toolkit.

In this article, we’ll look at the Noteable ChatGPT plugin and how you can use it to simplify your data analysis.

Let’s get into it!

Noteable is a product by Project Jupyter, a non-profit, open-source project that supports interactive data science and scientific computing across all programming languages. It’s a revolutionary tool designed to make data analysis and manipulation more accessible for people of all skill levels.

On May 11, 2023, Noteable announced the launch of a cutting-edge plugin that integrates ChatGPT with Noteable Notebooks. It offers enhanced data exploration, visualization, and machine-learning capabilities.

The Noteable plugin has a lot to offer, so let’s go over its core features in the next section!

We’ve listed the main features offered by the plugin below:

Data Exploration: You can use ChatGPT to quickly and easily query your data, perform data transformations, and identify patterns or trends. This feature simplifies the process of understanding your data and helps you identify important insights.

Collaboration: With Noteable ChatGPT Plugin, you can collaborate with others using a shared workspace, enabling teams to work together efficiently and effectively on data-driven projects.

No-Code Data Visualization: Along with its integration with ChatGPT, Noteable data notebooks also offers no-code data visualization support for the Modern Data Stack. This means you can create visually appealing and informative graphics from your data without having to write code or use specialized tools.

The Noteable ChatGPT Plugin is a valuable resource for users who want to explore data with the power of ChatGPT and data. It provides efficient data analysis and visualization tasks while maintaining a friendly user interface.

Now that we’ve gone over the basics, let’s discuss how you can use the Noteable ChatGPT plugin for data analysis.

In this section, we’ve listed a step-by-step approach to setting up the Notable ChatGPT plugin.

To use the Noteable ChatGPT plugin, you must first be a ChatGPT Plus user.

OpenAI is slowly introducing plugin access to all users, so make sure your account is eligible.

If you’re already a ChatGPT Plus user, open up ChatGPT’s chat prompt. Once you’re there, hover over GPT-4 and select “Plugins.”

This will open up the plugin store within ChatGPT where you can find and install a wide variety of plugins to integrate into ChatGPT.

This will take you to the Noteable plugin sign-up page, where you need to set up your account. Here, select a Login method to setup your account.

After you are done logging in, you will be redirected to ChatGPT’s homepage, where you will see the Noteable plugin installed.

Once you have access to the Noteable ChatGPT plugin, you can easily create, edit, and manage notebooks within your projects.

Before you can carry out your analyses, you need to create an a new project in your Noteable workspace.

Let’s say you’re working on creating a customer churn report with a bank to help them identify customers that are at risk of churn. Go ahead and create a new project as shown below:

After creating a new project, upload the CSV file you want to analyze. Once you’re done uploading, head back to ChatGPT to start your analysis.

Using the plugin, you can empower ChatGPT with data manipulation capabilities. It can successfully transform data, query it, and identify patterns or trends.

To start the analyses, you can feed a prompt similar to the following to ChatGPT:

I am a data analyst working with a bank and I am creating a customer churn report. The purpose is to identify what customers are at high risk of churn, and what can be done to prevent them from churning.

I have created a Noteable workspace which contains the CSV file. The name of the CSV file is Bank Customer Churn Prediction.csv.

I want you to create a complete churn report on the provided dataset.

After feeding the above prompt to ChatGPT, it will start analysis on your dataset and will give you a complete report on customer churn.

You can check the complete report by going to your Noteable workspace and opening the .ipynb file. This file will contain all the code that ChatGPT has written for you, with graphs and visuals that you can use in your findings.

Furthermore, you can also ask questions from ChatGPT based on the analyses it has carried out for you. This is helpful when you want to drill down on a certain cause of customer churn.

Now let’s take a step further and create data-driven documents that you can share with your team.

By integrating Noteable with ChatGPT, you no longer have to write your own code, but instead, you can ask ChatGPT to write it for you!

Noteable excels in helping collaboration among team members.

You can share your projects and notebooks with ease, discuss your findings, and optimize your team’s workflow.

Using the ChatGPT plugin with your Noteable projects helps to create a seamless experience that integrates analysis, visualization, and collaboration.

This empowers your team to make data-driven decisions and manage projects without extensive technical knowledge.

With the Noteable plugin integrated into your ChatGPT environment, you can use it for a number of tasks.

We’ve listed some of the most common use cases of the Noteable plugin below:

With the Noteable ChatGPT plugin, you can tap into the power of data science and machine learning easily.

This plugin provides you with an accessible interface to perform a variety of tasks, such as building predictive models and training classifiers.

By using this plugin, you can quickly identify patterns and trends in your data for your data science projects. This enables you to make data-driven decisions with confidence.

The ChatGPT plugin also offers support for natural language processing (NLP) tasks.

This means you can analyze, understand, and generate human language with ChatGPT. The possibilities are endless as you find ways to gain insights from unstructured text data.

Harnessing NLP allows you to process and analyze large volumes of textual data. This aids in myriad use cases, including customer reviews and content generation.

Exploratory data analysis is crucial for getting a general understanding of your dataset. The Noteable ChatGPT plugin empowers you to perform these tasks with minimal effort.

You can carry out data transformations, filter data, and calculate descriptive statistics to identify trends and patterns and make data-informed decisions with ease.

It makes preliminary data analysis easy, streamlining your workflow and accelerating time to insights.

The Noteable ChatGPT plugin also revolutionizes access to no-code data visualization for users of all skill levels.

When working with data, visualization is key to conveying information clearly and concisely.

With this plugin, you can create visually stunning charts, graphs, and tables without writing a single line of code.

You simply ask ChatGPT to generate a specific type of visualization for your data. This will elevate your presentations and make your findings more comprehensible.

The Noteable ChatGPT plugin stands to become as a game-changer for data projects. It helps you navigate complex data analysis tasks with ease.

This plugin allows you to focus on deriving insights, instead of getting tangled in the process of writing coding. Furthermore, it makes data analysis accessible, regardless of your coding proficiency. As data becomes increasingly integral to decision-making, being able to harness its power is a valuable skill.

In this section, you’ll find some of the frequently asked questions that users new to Noteable Plugin have.

The Noteable ChatGPT plugin is an AI-powered tool that assists you in performing data analysis tasks directly within a chat interface. It’s like having a personal data science assistant.

You interact with the plugin using natural language. Simply type your data analysis queries or commands in the chat, and the plugin will execute them for you.

While having some coding knowledge can be beneficial, the plugin is designed to be user-friendly and accessible to both coders and non-coders alike.

Yes, the plugin can handle a wide range of tasks, from basic data manipulation and visualization to more complex machine learning model building and evaluation.

Yes, the plugin is designed with data privacy and security in mind. It operates within your Noteable workspace and does not store or share your data.

How To Submit Websites & Pages To Search Engines: A Simple Guide

Knowing how to submit websites and individual pages to search engines is an essential skill for SEO professionals and webmasters alike.

Whether you’re building a new website or simply adding new content, knowing the ins and outs of indexation is key.

What You’ll Need Before Submitting

First, you’ll need access to edit your website.

Some people may refute this and claim backend web access is not necessary to submit a website to search engines. Well, they’re right.

However, there are some cases where you’ll need access to a website’s backend.

Situations Where You’ll Need Backend Access

The website doesn’t have a sitemap.

The website doesn’t have a chúng tôi file.

The website doesn’t have Tag Manager or a way to verify Google Search Console/Bing Webmaster Tools access.

If your client or IT team doesn’t allow you to have access to their backend, or your CMS has certain limitations, see if you’re able to obtain FTP access. This will come in handy later in this article.

Get Access to Google Search Console & Bing Webmaster Tools

If you really want to maximize your organic traffic potential, make sure to submit your website to as many relevant search engines as possible. This may seem pretty obvious, but a little reminder is always nice to have.

So, what will we need?

Most search engines have their own set of webmaster tools to help us manage our web presence. However, the big two you really need are:

Google Search Console

Bing Webmaster Tools

Setting up Google Search Console

Before you can submit your website to Google, you’ll need to set up a Search Console account and verify website ownership. You use your Gmail account for this.

If you manage multiple website domains, you will be able to manage all of them from the same account.

Once your account is set up, make sure to follow Google’s guidelines on verifying your website property. There will be prompts that provide you with multiple options to verify your website. If your account is the same as your Analytics account, your website will be auto-verified.

Setting up Bing Webmaster Tools

The first step here is to set up an account. Bing makes this easy by allowing you to sync your existing email accounts to quickly create a Bing Webmaster Tools profile.

Once you’re logged in to Bing’s Webmaster Tools, you’re ready for the next step of submitting your website.

How to Submit an Entire Website 1. Create a Sitemap Index with Categorized Sitemaps

When managing the indexation for an entire website, it’s important to know how to manage it at scale. Having an optimized sitemap can help make this process much easier for webmasters, and most importantly search engines.

When I refer to a sitemap here, I’m referring to an XML file, not an HTML sitemap.

Depending on the nature of your site, it may make sense to create multiple sitemaps to help silo your content into relevant categories. You can even create an image sitemap to help boost your image optimization strategy.

You can use a sitemap index as a root, and link to each sitemap from there.

If you use the Yoast plugin, all of this can be done automatically.

If you aren’t on WordPress, there are many other tools that can help you create your own sitemap. Screaming Frog is my go-to for situations where there’s no automatic sitemap tool.

Uploading XML Sitemap Through FTP

If you don’t have access to your website’s backend, having FTP access can really save you here.

If this is your first time accessing the backend of your website, this can be tricky. Once you’re connected to your FTP, follow these steps to upload your XML sitemap.

Search for your public_html directory.

Open your public_html directory.

Upload your sitemaps to that directory.

Easy!

Now you need to test your site to make sure it has been uploaded correctly.

To test, simply copy your file name and add it to the end of your website URL. For example:

or

2. Optimize Your Robots.txt

So you’ve created and optimized your sitemap. What does a chúng tôi file have to do with this?

Well, there are few simple steps to make sure that search engines are able to crawl and index your website.

Add a Link to Your Sitemap

You can actually add a link to your sitemap file in your chúng tôi file. This helps search engines quickly locate your sitemap and may improve your crawl rates.

Double Check Disallow Directives

If you’re launching a new website, make sure your website DOES NOT say this:

That simple two lines of text will block all major search engines from crawling your website. You’d be surprised how often web developers don’t check for this when launching websites.

3. Submit Sitemap to Google & Bing

Remember at the beginning of this article when we set up Search Console and Bing Webmaster Tools? Well, that’s about to come in handy.

Both of these platforms allow us to submit sitemap links. This is a quick way to tell search engines which pages you want them to crawl.

You can submit individual sitemaps, but it may be a bit quicker to just submit your sitemap index.

How to Submit Individual Pages

Did you know that both Google and Bing allow you to submit individual pages?

That’s right! But don’t get too excited.

While some pages can get indexed quickly, sometimes search engines may take longer to index your submitted pages.

Google Search Console – URL Inspection Tool

Google’s new Search Console platform has one of my favorite new tools, the URL Inspection Tool. This is a fairly comprehensive tool that allows webmasters to get instant feedback about how Google perceives certain aspects of a webpage.

One of the best features of this tool is the ability to request indexing. Sound familiar?

That’s because the URL Inspection Tool has replaced the Fetch as Google tool from the old Search Console.

I’ve personally seen instant indexation using this tool, but other SEO professionals have reported slower results.

Bing Webmaster Tools – Submit URLs

In a recent announcement, Bing said that they’re allowing webmasters to submit up to 10,000 URLs per day.

By using the Submit URLs tool in Bing Webmaster tools, you’re helping Bing save crawling resources. Makes sense, but it goes against how search engines typically work.

If Bing is openly encouraging webmasters to submit their URLs for indexation, then why not add this to your marketing checklist?

Bonus Tips!

We covered some direct methods for getting your links indexed, but there are also some indirect methods for getting links indexed.

For those who are new to SEO, it’s important to note that most search engine crawlers discover new webpages through links.

Let’s throw search engines a bone and help them discover our content!

Google Publishers Search API Optimize Internal Links

Optimizing your internal links is a vital part of every essential SEO checklist. Having a structured linking scheme on your site helps search engines discover new pages on your site.

Ramp Up Your Link Building Efforts

Yes, Link building should already be on your SEO checklist.

However, ramping up your link building efforts can help Google find pages on your site. Try earning new links or even reclaim broken backlinks.

The Wrap Up

Some SEO experts may consider implementing all of these tactics to be a bit overkill. However, every little bit helps.

Just like anything in SEO, better performance comes from a culmination of factors, not just one.

More Resources:

A Comprehensive Guide On Databricks For Beginners

This article was published as a part of the Data Science Blogathon

Overview

Databricks in simple terms is a data warehousing, machine learning web-based platform developed by the creators of Spark. But Databricks is much more than that. It’s a one-stop product for all data needs, from data storage, analysis data and derives insights using SparkSQL, build predictive models using SparkML, it also provides active connections to visualization tools such as PowerBI, Tableau, Qlikview etc. It can be viewed as Facebook of big data.

Databricks is integrated with Amazon Web Services, Microsoft Azure and Google Cloud Platform making it easy to integrate with these major cloud computing infrastructures. Some of the major firms such as Starbucks,

Image 1 

This article focuses on Databricks for data science enthusiasts. For more information check out the Databricks youtube page for the same.

Table of content

Prior reading.

About Databricks community edition.

DataLake/Lakehouse.

Role-based Databricks adoption.

Advantages of Databricks.

Apache Spark

Step by step guide –

Useful resources.

Data bricks certification.

Endnotes.

Prior Readings on Analytics Vidhya

Analytics Vidhya has quality content around pyspark, the goto language on Databricks, SQL and the Apache Spark framework. It would be immensely helpful to skim through them as well. Below are some relevant articles.

About Databricks community edition

Databricks community version is hosted on AWS and is free of cost.

Ipython notebooks can be imported onto the platform and used as usual.

15GB clusters, a cluster manager and the notebook environment is provided and there is no time limit on usage.

Supports SQL, scala, python, pyspark.

Provides interactive notebook environment.

Databricks paid version has a 14 days trial period but needs to be used alongside AWS or Azure or GCP.

Image 2

Data Lake

Lakehouse or data lake – is a marketing term used in Databricks for storage layer which can accommodate structured or unstructured, streaming or batch information. It’s a simple platform to store all data. Data lake from Databricks is called delta lake. Below are a few features of delta lake.

It’s based on Parque file format.

Compatible with Apache Spark.

Versioning of data.

ACID transactions –  (atomicity, consistency, isolation, durability) ensure data durability and consistency.

Batch and streaming data sink

Supports deleting and upserting into tables using API’s

Able to query millions of files using Spark

 

Data Lake Grid

Data lake

Data warehouse

Types of data

All types: Structured data, semi-structured data, unstructured (raw) data All types: Structured data, semi-structured data, unstructured (raw) data Structured data only

Cost $ $ $$$

Format Open format Open format The closed, proprietary format

Scalability Scales to hold any amount of data at low cost, regardless of the type Scales to hold any amount of data at low cost, regardless of the type Scaling up becomes exponentially more expensive due to vendor costs

Intended users

Limited: Data scientists Unified: Data analysts, data scientists, machine learning engineers Limited: Data analysts

Ease of use

Difficult: Exploring large amounts of raw data can be difficult without tools to organize and catalog the data Simple: Provides simplicity and structure of a data warehouse with the broader use cases of a data lake Simple: The structure of a data warehouse enables users to quickly and easily access data for reporting and analytics

Image 3

Role-based Databricks adoption

Data Analyst/Business analyst: As analysis, RAC’s, visualizations are the bread and butter of analysts, so the focus needs to be on BI integration and Databricks SQL. Read about Tableau visualization tool here.

Data Scientist: Data scientist have well-defined roles in larger organizations but in smaller organizations, data scientist wears various hats, one can own all the 3 roles, of an analyst, data engineer, bi visualizer etc. In a well-defined role, data scientists are responsible to source data, a skill grossly neglected in the face of modern ML algorithms. Build predictive models, manage model deployment. Monitor data drift,

Important skills

Sourcing data – Identify data sources, and leverage them to build holistic models.

Build predictive models.

Model lifecycle.

Model deployment.

Data Engineer: Largely responsible to build ETL’s, and manage the constant flow of ever-increasing data. Process, clean, and quality check the data before pushing it to operational tables. Model deployment and platform support are other responsibilities entrusted to data engineers.

Databricks have to be combined either with Azure/AWS/GCP and due to its relatively higher costs, adoption of it in small/medium startups is quite low in India.

Advantages of Databricks

Support for the frameworks (scikit-learn, TensorFlow,Keras), libraries (matplotlib, pandas, numpy), scripting languages (e.g.R, Python, Scala, or SQL), tools, and IDEs (JupyterLab, RStudio).

Databricks delivers a Unified Data Analytics Platform, data engineers, data scientists, data analysts, business analysts can all work in tandem on the same notebook.

Flexibility across different ecosystems – AWS, GCP, Azure.

Data reliability and scalability through delta lake.

Basic built-in visualizations.

AutoML and model lifecycle management through MLFLOW

Hyperparameter tuning support through HYPEROPT

Github and bitbucket integration

10X Faster ETL’s.

Apache Spark

Spark is a tool to coordinate tasks/jobs across a cluster of computers. These clusters of machines are managed by a cluster manager, it could be either YARN(yet another resource negotiator) or Mesos or Sparks own Standalone cluster manager. It supports languages such as Scala, Python, SQL, Java, and R. Spark application consists of one driver and executors.

The driver node is responsible for three things:

Maintaining information about the Spark application;

Responding to a user’s program.

Analyzing, distributing, and scheduling work across the executors.

The executors are responsible for two things:

Executing code assigned to it by the driver.

Reporting the state of the computation, on that executor, back to the driver node.

The cluster manager is responsible:

Controlling physical machines and 

allocates resources to Spark applications.

Check out this article for an indepth understanding of Spark –  Understand The Internal Working of Apache Spark. 

Image 4

Step by step guide to Databricks

Databricks community edition is free to use, and it has 2 main Roles 1. Data Science and Engineering and 2. Machine learning. The machine learning path has an added model registry and experiment registry, where experiments can be tracked, using MLFLOW. Databricks provides Jupyter notebooks to work on, which can be shared across teams, which makes it easy to collaborate.

Create a cluster:

For the notebooks to work, it has to be deployed on a cluster. Databricks provides 1 Driver:15.3 GB Memory, 2 Cores, 1 DBU for free.

Provide a cluster name.

Select Databricks Runtime Version – 9.1 (Scala 2.12, Spark 3.1.2) or other runtimes, GPU aren’t available for the free version.

Select the availability zone as AUTO, it will configure the nearest zone available.

It might take a few minutes before the cluster is up and running.

The cluster will automatically terminate after an idle period of two hours.

To close the cluster there are 2 options, one is to terminate and then restart later. Secondly, delete the cluster entirely. Deleting the cluster will not delete the notebook as notebooks can be mounted on any available cluster relevant to the job at hand.

Alternatively

Select Compute

Select on create cluster and then follow step 2 onwards given above.

Create a notebook:

Provide a relevant name to the notebook.

Select language of preference – SQL, Scale, Python, R.

Select a cluster for the notebook to run on.

Publish workbook:

Once the analysis is complete, Databricks notebooks can be published(publicly available) and the links will be available for 6 months.

Import published notebook:

Databricks notebooks, which are published can be imported using URL as well as physical files. To import using URL.

Select Workspace and move to the folder to which the file needs to be saved.

Use the link and import a sparkSQL tutorial to the workspace.

Run SQL on Databricks

Create a new notebook and select SQL as the language. In the notebook, select the Upload Data and upload the csv.

Write the data to events002

Create a SQL table using the below code:

DROP TABLE IF EXISTS diamonds; CREATE TABLE diamonds USING DELTA LOCATION '/mnt/delta/events002/'

Run SQL commands to query data:

select * from diamonds limit 10 select manufacturer, count(*) as freq from diamonds group by 1 order by 2 desc Visualize the SQL output on Databricks notebook

The output data-frames can be visualized directly in the notebook. Select the bar icon below and choose the appropriate chart. A total of 11 chart types are available.

Bar chart

Scatter chart

Maps

Line chart

Area chart

Pie chart

Quantile chart

Histogram

Box plot

Q-Q plot

Pivot (Excel-like pivot chart interface. )

End to end machine learning classification on Databricks

Databricks machine learning support is growing day by day, MLlib is Spark’s machine learning (ML) library developed for machine learning activities on Spark. Below is a classification example to predict the quality of Portuguese “Vinho Verde” wine based on the wine’s physicochemical properties.

Download the data using the link, download both chúng tôi and chúng tôi to your local machine. And upload the CSV using the Upload Data command in the toolbar.

white_wine['is_red'] = 0.0 red_wine['is_red'] = 1.0 data_df = pd.concat([white_wine, red_wine], axis=0)

Plotting :

Plot a histogram of the Y label:

import seaborn as sns sns.distplot(data.quality, kde=False)

Box plots to compare features and Y label:

import matplotlib.pyplot as plt dims = (3, 4) f, axes = plt.subplots(dims[0], dims[1], figsize=(25, 15)) axis_i, axis_j = 0, 0 for col in data.columns: if col == 'is_red' or col == 'quality': continue # Box plots cannot be used on indicator variables sns.boxplot(x=high_quality, y=data[col], ax=axes[axis_i, axis_j]) axis_j += 1 if axis_j == dims[1]: axis_i += 1 axis_j = 0

Split train test data:

from sklearn.model_selection import train_test_split train, test = train_test_split(data, random_state=123) X_train = train.drop(["quality"], axis=1) X_test = test.drop(["quality"], axis=1) y_train = train.quality y_test = test.quality

Build a baseline Model:

import mlflow import mlflow.pyfunc import mlflow.sklearn import numpy as np import sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_auc_score from mlflow.models.signature import infer_signature from mlflow.utils.environment import _mlflow_conda_env import cloudpickle import time # The predict method of sklearn's RandomForestClassifier returns a binary classification (0 or 1). # The following code creates a wrapper function, SklearnModelWrapper, that uses # the predict_proba method to return the probability that the observation belongs to each class. class SklearnModelWrapper(mlflow.pyfunc.PythonModel): def __init__(self, model): self.model = model def predict(self, context, model_input): return self.model.predict_proba(model_input)[:,1] # mlflow.start_run creates a new MLflow run to track the performance of this model. # Within the context, you call mlflow.log_param to keep track of the parameters used, and # mlflow.log_metric to record metrics like accuracy. with mlflow.start_run(run_name='untuned_random_forest'): n_estimators = 10 model = RandomForestClassifier(n_estimators=n_estimators, random_state=np.random.RandomState(123)) model.fit(X_train, y_train) # predict_proba returns [prob_negative, prob_positive], so slice the output with [:, 1] predictions_test = model.predict_proba(X_test)[:,1] auc_score = roc_auc_score(y_test, predictions_test) mlflow.log_param('n_estimators', n_estimators) # Use the area under the ROC curve as a metric. mlflow.log_metric('auc', auc_score) wrappedModel = SklearnModelWrapper(model) # Log the model with a signature that defines the schema of the model's inputs and outputs. # When the model is deployed, this signature will be used to validate inputs. signature = infer_signature(X_train, wrappedModel.predict(None, X_train)) # MLflow contains utilities to create a conda environment used to serve models. # The necessary dependencies are added to a chúng tôi file which is logged along with the model. conda_env = _mlflow_conda_env( additional_conda_deps=None, additional_pip_deps=["cloudpickle=={}".format(cloudpickle.__version__), "scikit-learn=={}".format(sklearn.__version__)], additional_conda_channels=None, ) mlflow.pyfunc.log_model("random_forest_model", python_model=wrappedModel, conda_env=conda_env, signature=signature)

Derive feature importance:

feature_importances = pd.DataFrame(model.feature_importances_, index=X_train.columns.tolist(), columns=['importance']) feature_importances.sort_values('importance', ascending=False)

Experiment with XGBoost and Hyperopt:

Hyperopt is a hyperparameter tuning framework based on bayesian optimization. Grid search is time-consuming and Random search while better than grid search, fails to provide optimum results. Hyperopt know-how article on Analytics Vidhya.

from hyperopt import fmin, tpe, hp, SparkTrials, Trials, STATUS_OK from chúng tôi import scope from math import exp import mlflow.xgboost import numpy as np import xgboost as xgb search_space = { 'max_depth': scope.int(hp.quniform('max_depth', 4, 100, 1)), 'learning_rate': hp.loguniform('learning_rate', -3, 0), 'reg_alpha': hp.loguniform('reg_alpha', -5, -1), 'reg_lambda': hp.loguniform('reg_lambda', -6, -1), 'min_child_weight': hp.loguniform('min_child_weight', -1, 3), 'objective': 'binary:logistic', 'seed': 123, # Set a seed for deterministic training } def train_model(params): # With MLflow autologging, hyperparameters and the trained model are automatically logged to MLflow. mlflow.xgboost.autolog() with mlflow.start_run(nested=True): train = xgb.DMatrix(data=X_train, label=y_train) test = xgb.DMatrix(data=X_test, label=y_test) # Pass in the test set so xgb can track an evaluation metric. XGBoost terminates training when the evaluation metric # is no longer improving. booster = xgb.train(params=params, dtrain=train, num_boost_round=1000, evals=[(test, "test")], early_stopping_rounds=50) predictions_test = booster.predict(test) auc_score = roc_auc_score(y_test, predictions_test) mlflow.log_metric('auc', auc_score) signature = infer_signature(X_train, booster.predict(train)) mlflow.xgboost.log_model(booster, "model", signature=signature) # Set the loss to -1*auc_score so fmin maximizes the auc_score return {'status': STATUS_OK, 'loss': -1*auc_score, 'booster': booster.attributes()} # Greater parallelism will lead to speedups, but a less optimal hyperparameter sweep. # A reasonable value for parallelism is the square root of max_evals. spark_trials = SparkTrials(parallelism=10) # Run fmin within an MLflow run context so that each hyperparameter configuration is logged as a child run of a parent # run called "xgboost_models" . with mlflow.start_run(run_name='xgboost_models'): best_params = fmin( fn=train_model, space=search_space, algo=tpe.suggest, max_evals=96, trials=spark_trials, rstate=np.random.RandomState(123) )

screenshot

Finally, retrieve the best model from MLFLOW run :

best_run = mlflow.search_runs(order_by=['metrics.auc DESC']).iloc[0] print(f'AUC of Best Run: {best_run["metrics.auc"]}')

Practice – Market Basket Analysis on Databricks

Use the online notebook to analyse InstaKart grocery data and recommended upselling/cross-selling opportunities using market basket analysis.

Dashboarding on Databricks:

Databricks has a feature to create an interactive dashboard using the already existing codes, images and output.

Move to View menu and select + New Dashboard

Provide a name to the dashboard.

It will show the available dashboard for the notebook.

If the code block or image needs to be on the dashboard, tick the box.

All ticked cells will appear on the dashboard.

The cell and be organised as necessary easily with the drag and drop feature.

Useful resources and references

Databricks Certification

Databricks provides live instructor lead training, as well as self-paced programs to help individuals understand the platform better. The self-paced course is priced at $2000. It also provides certification based on role fitment. The common career tracks are Business leader, platform admin, SQL analyst, data engineer, data scientist.

There are four certifications, namely

Databricks certified associate developer for Apache Spark – For spark developers.

Databricks Certified Professional Data Scientist – For all things ML.

Azure Databricks Certified Associate Platform Administrator – is an exam to assesses the understanding of basics in network infrastructure and security, identity and access, cluster usage, and automation with the Azure Databricks platform.

Databricks Certified Professional Data Engineer – All things ETL, pipelines, and deployment.

Image 5

End Notes

This article just scratches the surface of what Databricks is capable of. Databricks is capable of a lot more, which are not explored in this article, and for data enthusiasts, it is quite a treasure throve. So practice and always keep learning.

Good luck! Here is my Linkedin profile in case you want to connect with me. I’ll be happy to be connected with you. Check out my other articles on data science and analytics here.

Image References :

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Update the detailed information about A Simple User Guide For Ubuntu Oneiric on the Katfastfood.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!