Interesting topic... I have just already watching a Matrix trilogy once again, and I felt bored, so I switched to MalwareTips news or something and voilà! The real game is better!
Backing to the point and video with a quote from AVLab's methodology. I need to add a full description of how we chose a malware file for testing:
Question - What does the name of the "Advanced In-The-Wild Malware Test" mean?
avlab.pl
We used malware source in the wild:
- MWDB project by CERT Poland.
- Malware Bazaar project by Abuse.ch.
- Starting from May, we will use public URLs from urlquery.net - Automated URL scanner
- Custom honeypots based on Dionaea -> but this is not a good source, because there are a lot of samples, but a lot is repeated, duplicated of SHA.
- We can use additional sources, if you can help us with that, then we can include and automate to our tests. We tried, for example, with app.any.run, but they do not have the right API to get only URLs.
Generally, the whole industry used something like WildList, which is not being developed. At AMTSO we are working on replacing it with own list, but it's not easy. It requires a lot of people and often for free to build something like this. AMTSO currently has its own RTTL list, but we do not use this.
What is going on next?
1. Every URL potentially contains something that can be downloaded. We download it and check it (mal score.png)
a. a file is scanning by Linux tool for matching file type and duplication of SHA256 in database (if duplicate, it is rejected and starting new queue)
b. a file is scanning by some Yara rules:
- Rules included in packer_compiler_signatures.yar to detect broken or damaged portable executable files.
- Rules included in maldocs_index.yar to detect good or bad files with macros contained in Microsoft Office.
- Rules included in anti_sandboxing.yar to detect anti-vm techniques that prevent from executing in virtual environment of Windows.
c. only after all, the sample runs in the black box (Windows, without AV protection) to check potential malicious changes based on Sysmon rules and logs - as mentioned by the author of the thread.
If the potential file is "good" based on point C, a parameter from the URL is passed to the browser to all machines with security products installed. From this point onwards, the malware is analyzed at the same time and the response of the security software is checked.
In addition, after the May edition, we'll publish an external CSV with 3rd party scanner opinion about malware. This is implemented and it's not a secret. The technology provider is Arcabit/MKS_VIR from Poland. We do not test it, so I do not see a conflict of interest here. We will add these data to our
changelog soon.
Hopefully, this will further exclude usage of potentially useless, non-malware samples in AVLab tests (Advanced In The Wild Malware Test).
#####
If there is a willingness on your part to be interested, I can make a video for you of how it all works in turn from the inside.