Malware Detection Using Yara And YarGen

Vickie Li

Malware can often be detected by scanning for a particular string or a sequence of bytes that identifies a family of malware. Yara is a tool that helps you do that. “Yara rules” are descriptions that look for certain characteristics in files. Using Yara rules, Yara searches for specific patterns in files that might indicate that the file is malicious. Let’s take a look at this example rule taken from Yara’s official documentation page.

rule silent_banker
{
   meta:
       description = "This is just an example"
       threat_level = 3
       in_the_wild = true
   strings:
       $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
       $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
       $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
   condition:
       $a or $b or $c
}

This above rule tells Yara that any file that contains one of the following strings should be flagged as the Silent Banker Trojan — a Trojan that steals banking credentials.

6A 40 68 00 30 00 00 6A 14 8D 91
8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9
"UVODFRYSIHLNWPEJXQZAKCBGMT"

We will look at how to read these rules later on in this post, but first, let’s install and run Yara!

Using Yara

Yara is multiplatform and supports both Windows and Unix-based systems. You can use it both as a command-line tool and a Python extension to use in your Python scripts. Please refer to the official documentation for a complete guide for installing Yara on different platforms and installing the Python extension.

Getting A Set Of Rules To Use

While you could write your own rules, there are plenty of well-defined Yara rules files available for download in Github repositories. You can simply search for the type of rules that you want and use that file in your Yara command.

Search: Yara rule executables files

Once you’ve found the appropriate rule file, download it by using this command. Where FILENAME is the local file name that the downloaded file will be saved as, and the LINK_TO_FILE is the address of the raw file online.

curl -o FILENAME LINK_TO_FILE
curl -o executables.yar 
https://raw.githubusercontent.com/Xumeiquer/yara-forensics/master/file/executables.yar


Running Yara

To run Yara from the command line, run the command:

yara [OPTIONS] RULES_FILE TARGET

The RULES_FILE points to a file that stores the Yara rules that you want to use, while TARGET points to a file, a folder, or a process to be scanned. For example, let’s analyze if a random file is a PDF using Yara! You would first need to download the rules file that identifies a PDF from the yara_forensics repository on Github.

curl -o pdf.yara 
https://raw.githubusercontent.com/Xumeiquer/yara-forensics/master/file/pdf.yar


Then, run the Yara rules against the file we want to analyze:

yara pdf.yara TARGET_FILE_TO_ANALYZE

Writing Your Own Yara Rules

Of course, if you can’t find Yara rules published online that suit your needs, you’ll need to write your own rules instead! To write a Yara rule, start by declaring the rule’s name using the following syntax:

rule RULE_NAME
{
// Rule definition goes here!
// Comments in Yara rules look like this!
}

Each Yara rule is composed of three sections: meta, strings, and condition. Let’s take a closer look at our previous example and break it down to see how this rule was written!

rule silent_banker
{
   meta:
       description = "This is just an example"
       threat_level = 3
       in_the_wild = true
   strings:
       $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
       $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
       $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
   condition:
       $a or $b or $c
}

The “meta” section of a rule contains the description, author, reference, date, hash, and any other relevant details of the rule. This section is optional and will not be used to classify malware.

meta:
       description = "This is just an example"
       threat_level = 3
       in_the_wild = true

The “strings” section contains string patterns that are used to identify malware. Each string in the “strings” section is identified with a variable name starting with a dollar sign.

strings:
       $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
       $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
       $c = "UVODFRYSIHLNWPEJXQZAKCBGMT"

You should put signature strings that are indicative of the malware here. This example uses hex strings and text strings. But you can also use regex patterns.

strings:
       $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
       // Hex strings are enclosed within curly brackets.
       $b = "UVODFRYSIHLNWPEJXQZAKCBGMT"
       // Plain text strings are enclosed within double quotes.
       $c = /md5: [0-9a-fA-F]{32}/
       // Regex patterns are enclosed within slashes.

Finally, the “condition” section describes how the string patterns in the “strings” section should be used to identify a piece of malware. You can use boolean (and, or, not), relational (>, <, =, and more), and arithmetic (+, -, *, /, %) expressions in this section. In our example, the rule specifies that if one of the strings $a, $b, or $c is present, the file is a silent banker trojan.

condition:
       $a or $b or $c

You can also define more complicated conditions like these.

condition:
       #a > 2 and $b
       // If $a occurs more than twice and if $b is present

condition:
       ($a and $b) or ($b and $c)
       // If both $a and $b are present, or both $b and $c are present

There are many more ways to write Yara conditions. For more detailed specifications, check out the Yara documentation here.

Generating Yara Rules Using YarGen

Writing Yara rules manually often means that you risk writing rules that are either too specific or not specific enough. Alternatively, YarGen is a fast way of generating Yara rules that are both flexible and comprehensive. YarGen generates Yara rules given a malware file or a directory of malware files as input. It generates Yara rules by identifying the strings found in the malware file, while also removing known strings that also appear in non-malicious files. You can download and install the latest version of YarGen in the release section of its Github page.

Running YarGen

The most basic usage of YarGen is this command.

python yarGen.py -m PATH_TO_MALWARE_DIRECTORY

This command will scan and create rules for the malware files under PATH_TO_MALWARE_DIRECTORY. A file named yargen_rules.yar will be created in the current directory, containing the rules generated.

(A sample rule generated by YarGen can be found at YarGen’s official documentation.)

Reading YarGen Rules

YarGen generated rules look just like any typical Yara rule. However, YarGen categorizes the strings section based on the likelihood of them being indicators of malware. There are three categories of these strings, marked by $s, $x, and $z. String names that start with $s are “Highly Specific Strings” that will not appear in legitimate software. These strings can include malicious server addresses, the names of hacking tools and malware, hacking tool outputs, and typos in common strings. For example, sometimes malware files will contain misspelled words like “Micorsoft” or “Monnitor” when it tries to masquerade itself as legitimate software. Strings that start with $x are “Specific Strings” that are likely to be indicators of malware files but might also appear in legitimate files. Lastly, strings that start with $z are likely ordinary but are not currently included in the goodware string database. YarGen uses a combination of a magic header, file size, and strings for the condition section. For example, the conditions in the example rule above specify that a file needs to have the magic header of 0x5a4d, be smaller than 3785 kb, and contain all strings in the “strings” section to be classified as a “backdoor.”

condition:
   unit16(0) == 0x5a4d and filesize < 3785KB and all of them

Some Tips On Using YarGen

Now that you’ve got the basics down, here are a few tips to improve YarGen’s performance to help you generate the most accurate rules possible. YarGen generates Yara rules by identifying the strings found in the malware files, while also removing known strings that also appear in non-malicious files. These known “good strings” are located in YarGen’s built-in “Goodware database.” One of the ways you can fine-tune YarGen’s behavior is to use your own Goodware database. This way, you can include more detailed goodware strings and update it as you learn more about different malware and goodware samples. You can create a new local goodware database by using the “-c” flag. You also need to specify a “-g” flag and point YarGen to a directory of goodware to generate the database.

python yarGen.py -c -g PATH_TO_GOODWARE_FILES -i DATABASE_IDENTIFIER

You can update a goodware database with new input files by using this command.

python yarGen.py -u -g PATH_TO_GOODWARE_FILES -i DATABASE_IDENTIFIER

Another good practice is to use YarGen scores intelligently. YarGen assigns a “score” to each string based on the likelihood of them indicating malware. By default, YarGen uses the top 20 strings in a rule. You can see how a string is scored by using the “ -- score” flag.

python yarGen.py --score -m PATH_TO_MALWARE_DIRECTORY

You can tell YarGen to only use strings that have a certain minimum score using the “-z” flag.

python yarGen.py -z 5 -m PATH_TO_MALWARE_DIRECTORY

You should first run YarGen with the “ --score” flag to determine the top strings and their scores. Then, decide on an appropriate cut-off score and run YarGen again with a minimum score requirement. This will help you generate more concise and efficient rules. A YarGen rule can be either a simple rule or a super rule. If multiple sample files are used, YarGen will try to identify the similarities between the samples and combine the identified strings into a “super rule.” Super rules can be identified by a line in the meta section of the rule:

meta:
       description = "This is a super rule for all files in the directory."
       super_rule = 1

Super rules are useful when you have a couple of samples of the same type of malware. You can put them in the same directory and use YarGen to find the similarities. This way, you can avoid creating rules that generate lots of false positives or false negatives. There are still plenty of options that you can use to customize the behavior of YarGen! To see all the command line parameters, you can run this command for help.

python yarGen.py —-help

Finally, don’t forget that after YarGen generates a rule, you can always add new strings and conditions to it as you see fit!

Vickie Li
Investigator of Nerdy Stuff

Vickie Li is a professional investigator of nerdy stuff, with a primary focus on web security. She began her career as a web developer and fell in love with security in the process. Now, she spends her days hunting for vulnerabilities, writing, and blogging about her adventures hacking the web.