The Artificial Intelligence Basics
In February on our parent company’s blog, we wrote about artificial intelligence and cybersecurity. Today, we’re looking at the artificial intelligence basics.
The artificial intelligence basics: What is artificial intelligence?
Artificial intelligence is when a task that is traditionally performed by humans is now performed by a machine. There are a lot of different branches of artificial intelligence. There is also theoretical AI (AI that currently does not exist, but may exist in the future) and the type of AI that is currently possible in 2020. AI that is possible is sometimes called narrow or weak AI, meaning a machine can perform a narrow, specific task very well.
A few examples of AI we encounter in our everyday life are:
- Intelligent personal assistants like Siri, Cortana, and Google Assistant
- Sophisticated ecommerce platforms such as Amazon or eBay that can make related recommendations based on past purchases or views
- Music streaming platforms such as Spotify and Pandora
- Video streaming like Netflix and Hulu
- Social networks
- Ridesharing services
- Manufacturing robots
- Chatbot tools
Some of these AI applications we use regularly rely on machine learning, and some don’t.
Webshrinker makes use of machine learning, which is a subset of artificial intelligence, and simply refers to the method by which an AI learns. Based on machine learning, Webshrinker can identify patterns and make decisions based on them. The amount of human supervision (and intervention) varies based on who has programmed the AI.
If a system doesn’t use any form of artificial intelligence, then that system has been built with specific instructions from the programmer and is not as flexible or able to adapt to changing conditions.
How does machine learning-based AI work?
Since Webshrinker uses machine learning, let’s focus on defining how AI works in a generic machine learning model. Exact order and the tasks themselves might vary, but this is essentially how machine learning-based AI works:
- After identifying what you would like an AI system to perform, large amounts of training data, or samples, need to be collected
- Testing is performed to find the right type of AI to use for the task
- The AI is given the data necessary to learn how to perform its task (e.g., examples of malicious sites and benign sites)
- It performs the task it was programmed to do
- The AI is judged on how well it performed that task
- Based on how well it performed the task, additional training data may need to be collected to make it better
We’ll reiterate the example used in the DNSFilter blog, because it’s a great example.
An AI is tasked to create a picture of a hamburger. It’s given thousands of pictures of hamburgers so it understands what a hamburger generally looks like. It then creates a picture of a hamburger, repeatedly.
Here’s what it looks like when GAN (Generative Adversarial Network, a type of machine learning) learns how to make a hamburger by using competing neural networks:
The two brains
A GAN essentially has two brains: The brain that learns and the brain does the work. Each time after a task is completed, these two parts of the brain speak to one another. The brain that does the work tells the one that learns how this task was similar or dissimilar to tasks in the past; it also tells the other brain how much this task matches the original dataset. The brain that learns processes the information and then tells its counterpart how to improve for the next time it performs its task.
Let’s look at how these roles might play out if filled by humans to explain this a little further. Imagine that a detective is working with a counterfeiter to create fake bills and pass them off as the real thing. The detective has years of experience and can look at a bill and make a judgment as to whether they think it is false or not. Then there is the counterfeiter creating counterfeit bills. These are the learning and working brains, respectively. Each time the counterfeiter creates a new bill, the detective examines the work and will either state “This looks like a real bill” or “This looks like a fake bill.” The detective then tells the counterfeit how it was able to determine the bill was real or fake, enabling the counterfeiter to make more believable, fake bills going forward.
The GAN will do whatever it can to make the task it performs closely match the examples it was given.
The part of the GAN that judges its own work might look at those pictures of hamburgers and recommend making the burgers look a little less messy if it thinks that will make them match the original data better.
What’s the benefit of machine learning?
You might be wondering: Why even rely on machines in the first place to do the thinking? The easy answer is automation. By training AI to learn instead of programming AI to perform specific tasks without the ability to learn, it frees up time for people to concentrate on other aspects of their work. It takes normally tedious tasks and runs them in the background.
Doing this can expedite processes that previously took months or years of work. And computers can be taught to process information and learn on a greater scale than humans usually do. For instance, Webshrinker processes domains and categorizes them. It’s not realistic for a person (or even a team of people) to categorize the web. It would take years, and the information would get outdated quickly. The amount of resources necessary to take on these tasks would be astronomical. By employing machine learning, we’re able to quickly and accurately process large amounts of domains in a short amount of time.
This brings us to how machine learning can have cybersecurity benefits. Domain categorization isn’t just placing websites into categories like news, gambling, or social media. It’s also about categorizing threats based on types of malicious content they might have on them in the form of phishing, malware, ransomware, botnets, or more. This is what Webshrinker does.
We can train machine learning algorithms to notice subtle differences between a legitimate site and a malicious one. It can examine the website as an end user as well as look at the code of the site. And as it views more and more sites, it finds similarities between malicious sites and improves its ability to categorize those sites. This would simply not be possible if we required people to scan and categorize sites. Not only would the categorization take a long time, but the ability to differentiate between a malicious and non-malicious site would be poor since there are certain indicators that are easier for an AI to notice than a person.
Done with the artificial intelligence basics and ready to see machine learning in action? Get your free trial of Webshrinker.