It is always a challenging task for malware analysts in attribution of malware attackers. Most of the researchers extract the technological artifacts from the malware binaries and perform data mining analysis to determine the identity of the attackers or at least to fingerprint partial information of the malware authors.
For studying the Windows malware, the PE headers are first “des-constructed”.
The extracted metadata are then categorized with some defined rules, stored in a SQL database and further analyzed (Yonts
, 2012). Some malware analysts extend their work to the areas of contextual analysis by obtaining attributes or “genes” from different “layers” like the exploits or shell code it used, the metadata of the PE information, the connected TCP port number and the C2 network infrastructure (Xecure-Lab
, 2012). Some researchers take a step backward to extract the metadata information from email headers if the malware was distributed through spear-phishing emails (Lee, M.& Lewis, D
Some malware analysts group the attackers by referring to a proprietary reverse engineering and behavioural analysis technology (Digital DNA
All of them claim that they have successfully identified some attacker groups. Sometimes they gave them a code name, such as APT1 (Mandiant), Comment Crews (Hoglund, G.),
Soysauce (HB Gary) or DeepPanda (CrowdStrike).
They disclose some hints on how they categorize the groups, but none of them, except Mandiant’s APT1 Report
, provide the complete details of their work.
Since Mandiant published the APT1 Report, quite a number of researchers study command and control (C2) infrastructure of the attackers in similar way.
They use virtualization tools like Maltego to illustrate their findings on how to associate these attacker groups (HBGary, 2012
). The Maltego graphs are great for illustration to the analysts, but the information displayed is only a representation of the static version analytical results at time of queries. Our work starts in the similar way, but we put more emphasis on monitoring the continuous changes of the network infrastructure. We developed an automated solution to simplify the tasks of gathering and storing the information as a knowledge base for future analysis. Once the initial set of malicious DNS-IP address pair, “parked domain” and “whois information” are identified, the database can be called to perform updates manually. This database can be used for further analysis by visualization tool, and for identification of the possible identity or personas of the attackers. In our studies, we used Maltego
for the analysis.
In order to maintain the command and control (C2) network redundant, APT attacks are generally embedded with multiple DNS names (2-level domains). An intuitive view is that APT attackers keep and control a high number of DNS-IP address pairs. We believe the attackers might have assigned a team to register huge amount of DNS domain names for such purposes. For every fresh registration, they left some identifiable personal information on the “whois” servers. The “whois” servers may provide reliable links especially at the stage between the registration and the first launch of the malware.
We study two aspects of the network infrastructure: the DNS-IP address pairs and “whois information”. We believe the information contains the highest possible intelligence that links the malware to the human actors.
In the 2010 Blackhat event, Hoglund presented his theory called Malware Attribution: Tracking Cyber Spies & Digital Criminals (Hoglund
, 2010). He believed Social Cyberspace (i.e. DIGINT) and Physical Surveillance (i.e. HUMINT) were the keys to associate who is behind the malicious attacks. But he also pointed out that it was nearly impossible to find the human actors with definitive intelligent like social security number or physical location of the attackers. However, he described some forensics marks, which could be extracted from raw data in three intelligence layers of (a) Net Recon (C2), (b) Developer Fingerprints and (c) Tactics, Techniques & Procedures (TTP). Out of these three layers, TTP carries the highest intelligent value to identify the human attackers.
Based on Hoglund’s theory, Boman developed a tool, called VXCage to extract the technical metadata from binaries and store the artifacts in a relational database for further analysis.
Pfeffer et al. (Pfeffer
, 2012) investigated on how to extract “genetic information” of the malware from reverse engineering of critical PE header information and follow the evolutionary traces and functional linguistics from binaries.
However, their works do not provide direct links to DIGINIT or HUMINT in the higher end of the Intelligence Spectrum as described by Hoglund in his Blackhat presentation.
Our work, on the other hand, tracks the changes of DNS domain and “whois information” not only at the time of first found, but monitoring the changes for a longer period after its identification. Our analysis confirmed the “whois information” retained in DNS domain network infrastructure kept more stable when comparing to the fast flux infrastructure used by the publicly distributed malware. By monitoring the network infrastructure and “whois information” of the “parked domains” from the malicious IP addresses of APT attacks, the knowledge base may provide more relevant direct links of DIGINIT or HUMINT of the attackers.
3. APT attacks attributes and patterns
Most of the current researches described the APT cycle by a series of processes of: (a) defining target, then (b) sending spear-phishing, (c) dropped backdoor, (d) initial
outbound connection, (e) data gathering and (f) exfiltration. (Fig. 1)
|APT life cycle (Fig. 1)
We suggest the life cycle should be further extended to firstly initiate by the social engineering of the victims (or be described as “human reconnaissance” for the purpose of preparing the spear-phishing emails) and expand to the lateral movements (referring to the acts of pushing more tools and selective infiltration of intelligence from victim’s machine) after the initial exfiltration. (Fig. 2) These processes require more human interactions, which may provide direct links to the DIGINIT or HUMINT.
|Extended APT life cycle (Fig. 2)
|3.2 APT infrastructure tactics
Advanced Persistent Threat (APT) attacks are highly organized and are launched for prolonged periods. APT attacks exhibit discernible attributes or patterns. Notably, we find that the APT attackers have tended to set up their DNS domain network infrastructure in the following workflow:
- Domain registration
- Naming of domains: The domain naming convention is not typo squatting, but follows pattern of meaningful Chinese PingYing （拼音）
- Creation of second-level domain name and IP address pairs
- Engaging a ‘friendly-ISP’ for using a portion of their C-class subnet of IP addresses situated at the domicile of the targeted victims (for the purpose of evasion of blacklisting or make it geological viable)
- Reusing of DNS names and IP addresses: The DNS names and IP addresses may be cycled for reuse (aka campaigns) that may provide indications or links of the attacker groups
- Embedding multiple DNS A-record in exploits
- Preparing the spear-phishing email contents after reconnaissance of the targeted victims
- Launching the malicious attachments through spear-phishing emails
- Collection of intelligence: The exploits drop some binaries which extract the DNS records and start communicate with the C2 by resolving the IP addresses from DNS servers. The C2 servers or the C2 proxies register the infections on the database on the C2 database and the intelligence analysts of the attacker groups review, the preliminary collected information of the targeted victims through C2 portals. Give further instructions to the infected machines to exfiltration of further intelligence from the infected machines.
- Manipulation of domains: The infrastructure technical persons of the attacker groups make changes (Domain manipulation) of the DNS-IP address pair, Domain name registration information (Whois information) and the “parked-domains” from time to time or at any incident happens (such as: the takedown of the C2 proxies by law enforcement or being identified as bogus by local CERT)
- Frequency of change of information: In contrast again the Fast-Flux Services Networks mentioned by the HoneyNet Project, the information are not make changed with high frequency
- Monitoring of DNS-IP Address pairs and Whois information: Because of the domain manipulation activities, the contents of the DNS-IP Address pair and Whois information should be monitored immediate after it was identified.