Machine-learning code predicts whether connections are legit or likely to result in a bad day for someone
JPMorgan Chase is integrating AI into its internal security systems to thwart malware infections within its own networks.
A formal paper [PDF] emitted this month by techies at the mega-bank describes how deep learning can be used to identify malicious activity, such as spyware on staff PCs attempting to connect to hackers’ servers on the public internet. It can also finger URLs in received emails as suspicious. And it’s not just an academic exercise: some of these AI-based programs are already in production use within the financial giant.
The aim is, basically, to detect and neutralize malware that an employee may have accidentally installed on their workstation after, say, opening a booby-trapped attachment in a spear-phishing email. It can also block web-browser links that would lead the employee to a page that would attempt to install malware on their computer.
Neural networks can be trained to act as classifiers, and predict whether connections to the outside world are legit or fake: bogus connections may well be, for example, attempts by snoopware on an infected PC to reach the outside world, or a link to a drive-by-download site. These decisions are thus based on the URL or domain name used to open the connection. Specifically, long-short term memory networks (LSTM) used in the bank’s AI software can predict if a particular URL or domain name is real or fake. The engineers trained theirs using a mixture of private and public datasets.
The public datasets included a list of real domains scraped from the top million websites as listed by Alexa; they also used 30 different Domain Generation Algorithms (DGA), typically used by malware, to spin up a million fake malicious domains. For the URL data, they took 300,000 benign URLs from the DMOZ Open Directory Project dataset and 267,418 phishing URLS from the Phishtank dataset. The researchers didn’t specify the proportion of data used for training, validation, and testing.
You may think just firewalling off and logging all network traffic from bank workers’ PCs to the outside world would do the trick in catching naughty connections, though clearly JP Morgan doesn’t mind its staff reading the likes of El Reg at lunch, and thus has turned to machine-learning to improve its network monitoring while allowing ongoing connections, it seems.
How it works
First, the string of characters in a particular URL or domain name to be checked are converted into vectors and fed into the LSTM as input. The model then spits out a number or probability that the URL or domain name is bogus.
AI-powered IT security seems cool – until you clock miscreants wielding it too
The LSTM was able to a performance of 0.9956 (with one being the optimal result) when classifying phishing URLs and 91 per cent accuracy for DGA domains, with a 0.7 per cent false positive rate. AI is well adapted to discovering the common patterns and techniques used in malicious software, and can even be more effective than traditional URL and domain-name filters.
We asked the eggheads to describe what features the model learned when identifying whether something is benign or malicious, but they declined to comment. It’s probably things like typos in words or random snippets of characters and numbers jumbled together.
“Advanced Artificial Intelligence (AI) techniques, such as Deep learning, Graph analysis, play a more significant role in reducing the time and cost of manual feature engineering and discovering unknown patterns for Cyber security analysts,” the researchers said.
Next, they hope to experiment with other types of neural networks like convolutional neural networks and recurrent neural networks to clamp down on the spread of malware even further. Watch this space. ®