Q&A Few questions regarding malware analysis lab - how to do it properly?

hunter44

New Member
Thread author
Feb 13, 2022
8
I created my home malware analysis lab. I am a newbie in this area so I decided to ask you few questions regarding this lab:

1. My lab is consists of Kali as a host OS, Remnux VM (which is a gateway) and two windows VMs: win7 and win10. When you do analysis of some sample, where do you often do it? Static analyze is done in Remnux (or in general on other env) and behavior in windows? Or maybe everything could be done in Remnux (with e.g. Cuckoo sandbox)? I'm asking, because I do not understand why should I put all tools in every VM and what environment should I use to run malware sample. For me it moght be enough to have only one VM (Remnux) and analyze everything there, but I'm not sure what with windows samples - can they be run on linux too to do some dynamic analysis?

2. What is your general apporach to analyze of the malware? Could you give some tips what is worth to do first and what next? For example, I get some malware sample which I totally do not know what it is doing. I would start with some Cuckoo analyze, then some static analyze and then run it on some VM. Is it a good approach or how should I choose what to do first?

3. I found some malware samples places, but is there any recommendation where I could take samples by difficult level? To start with some easy examples and then try more difficult, to learn new things.

Probably I will have more questions in the future, but for now I would like to know those basics, which would be very helpful.
 

upnorth

Moderator
Verified
Staff member
Malware Hunter
Well-known
Jul 27, 2015
4,881
try to find analysis reports that provide malware hashes and go alongside those reports while analysing the same sample
Some analysts reports sadly miss out on hashes, as it's pretty crucial without risk end up with nothing.

@struppigel is one of the best on this forum to give you hints and tips on your topic, as he's a malware analyst at the AV company GData.
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
1. Using one VM for everything, static and dynamic analysis, is just fine and probably the most convenient option in your case. For dynamic analysis of Windows malware, you need Windows. So I recommend you use a Windows VM for everything, if your focus is on Windows malware. Malware analysts often separate the static analysis stuff because they work on machines that need access to internal systems on their company's network. But as a hobby analyst, you don't have this issue.

2. I made an entire video about that (summary starts at 30:00). But tl;dr is: Extract strings, use Detect it Easy or file type identifiers, and skim through the file in a hex editor first. Simultaneously submit it to an automatic analysis system just to get a superficial overview of the behaviour. Then you decide based on your findings what tools are appropriate to go forward. E.g. if it is a SFX, you extract it first. If it is .NET you check in DnSpy. If it is UPX packed, you unpack it first, ...
So you always go from the superficial and meta data information to the details like actual code. Not the other way around, otherwise you end up disassembling code of, e.g., the Python runtime environment because you did not realize it is a PyInstaller file that you got. Reading the code and debugging is the last thing to do.


3. See what upnorth posted.
 
Last edited:

hunter44

New Member
Thread author
Feb 13, 2022
8
1. Using one VM for everything, static and dynamic analysis, is just fine and probably the most convenient option in your case. For dynamic analysis of Windows malware, you need Windows. So I recommend you use a Windows VM for everything, if your focus is on Windows malware. Malware analysts often separate the static analysis stuff because they work on machines that need access to internal systems on their company's network. But as a hobby analyst, you don't have this issue.

2. I made an entire video about that (summary starts at 30:00). But tl;dr is: Extract strings, use Detect it Easy or file type identifiers, and skim through the file in a hex editor first. Simultaneously submit it to an automatic analysis system just to get a superficial overview of the behaviour. Then you decide based on your findings what tools are appropriate to go forward. E.g. if it is a SFX, you extract it first. If it is .NET you check in DnSpy. If it is UPX packed, you unpack it first, ...
So you always go from the superficial and meta data information to the details like actual code. Not the other way around, otherwise you end up disassembling code of, e.g., the Python runtime environment because you did not realize it is a PyInstaller file that you got. Reading the code and debugging is the last thing to do.


3. See what upnorth posted.

Thank you very much! This is great description how to start, I will try to do it. Regarding VMs, I will stay with REMnux as a gateway as a network monitoring tool and will install all needed tools on Windows VM. Do you have also some tools to recommend (apart from general like PeStudio, Process Hacker , Wireshark etc.)? Maybe something less known but also good and useful.

What do you mean that "analysts often separate static analysis stuff"? I thought static analyze is less vurnelability as dynamic and could be done without specific environment. It just general overview of the malware without running it, so it sounds (for me) as a safer method.
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
What do you mean that "analysts often separate static analysis stuff"? I thought static analyze is less vurnelability as dynamic and could be done without specific environment. It just general overview of the malware without running it, so it sounds (for me) as a safer method
Yes, it is safer, hence you can perform it on a system that has access to an internal company network.
I was just saying this because for you it does not make sense to use several VMs to separate static from dynamic analysis, whereas it can make sense for someone working at a company.
 

hunter44

New Member
Thread author
Feb 13, 2022
8
Understand.
One more thing which came to my mind when I watched your video. As I see, you have normal internet connection from your VM where you're inspecting malware. Is it a common thing or it depends on the type of malware which is under investigation? Personally, I closed internet connection from my windows VM, it can only communicates with REMnux machine to look at network traffic there. Most of tutorials, videos, articles I read, also told that it should be first thing to do - do not connect your VM with internet.
I downloaded samples to my host and just move them to VM to open there. Maybe I should always do that on VM? I'm just wondering what if I want to get some reports from dynamic sandboxes? Should I open a connection in VM to AnyRun or othet tool online?
I also don't think so, every malware completely forbids using internet during analyze, but if you could elaborate on it and explain when it is really safe to run sample with internet connection turned on and when not.

Also one more question - do you recommend some books of malware analysing? I already read "Practical Malware Analysis" Sikorski, Honig, but maybe there are some worth books to read in this topic.
 
Last edited:

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
It is correct that you should not enable real Internet for dynamic analysis. The main reason is that worms may infect other systems connected to your network. But it may also be an issue if personal data gets exposed depending on what you have on the VM (you may expose internal setup or tools or shared folder contents). So generally you should avoid executing unknown samples or known worms with real Internet connection.

I have a different approach when it comes to creating videos than in my daily work, because I don't want to show my host system and it is inconvenient to switch to the host system during the recording. The samples I show on video are the ones I already analysed and I know exactly what they do.
When I am not creating videos, I do things like sandbox analysis submission on the host.

do you recommend some books of malware analysing?
I have no read it yet but my colleagues like Malware Analysis and Detection Engineering: A Comprehensive Approach to Detect and Analyze Modern Malware
 

hunter44

New Member
Thread author
Feb 13, 2022
8
It is correct that you should not enable real Internet for dynamic analysis. The main reason is that worms may infect other systems connected to your network. But it may also be an issue if personal data gets exposed depending on what you have on the VM (you may expose internal setup or tools or shared folder contents). So generally you should avoid executing unknown samples or known worms with real Internet connection.
What is a right approach when choosing sample? When I e.g. found some sample, I do not know if it is network worm or something which could spread out in my network. Is there any way to choose easier samples which do not make a lot of damage? I looked on Malware Bazar and as I saw, there is no filters like difficulty level, so I think there is no rule regarding this and I can take anything, so imo the best way is always have internet connection off.

I have no read it yet but my colleagues like Malware Analysis and Detection Engineering: A Comprehensive Approach to Detect and Analyze Modern Malware
Thanks. Will take a look on it.
 
  • Like
Reactions: SecureKongo

hunter44

New Member
Thread author
Feb 13, 2022
8
I decided to raise the topic, because I have few questions regarding RE of malware.
1. I'm using IDA (free) - what is best way to find main function in deassembled code? Is it even possible? I read about FLIRT signature, but I'm not sure how to use it and even if it works as I want. With huge code it is very difficult to find main function of the program. I try to analyze function by function with all info (like strings, methods, etc.) I found during file static analyze. But in the end, I think I'm a little bit stuck, because I can anlyze multiple functions but cannot find conclusion - where they are used, how the flow looks like?
2. What is the best approach to analyze code statically? What are your tips and tricks to get as much information as you can?
3. What should be the main goal when RE malware? Do we need only to find out what functionality malware has or there is always a specific goal (except how it works)? What should I look and what to search from start to the end to have a view on whole functionality?
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
1. There is no general answer to that. I depends which compiler was used. I suggest you pair your RE learning efforts with programming small C or C++ programs. Important: Disable compiler optimizations when doing so. Write really small programs and compile them. Check the compiled sample in IDA, try to find that code again and understand how it looks like in the disassembly.
2. Do the easy stuff first. Check the strings, e.g., with strings.exe. Check it in a hex editor. Then look into the code. While analysing the code, make sure to add comments and rename functions along the way. Look up the API calls. Make sure you understand and correctly identify the calling convention that is used.
3. That depends entirely on you. You are the one who sets the goals. When I check malware for blog articles, I usually try to find everything that seems novel to me. Techniques I haven't seen before. Apart from that persistence and spreading techniques are usually important and so is the main functionality or damage it does to a system. Any hints to the threat actor are interesting as well.
 

hunter44

New Member
Thread author
Feb 13, 2022
8
Hi, it's me again. Come back to you with some questions regarding malware analysis. Got some knowledge since last 2 months - read few books, reversed my own programs and also done all Labs from Practical Malware Analysis book, which gave me very good basics (I know book is old, but it still has very good basics to learn). Understand a lot more, but still not enough ;).
I want to go futher, so I decided to ask about few things again.

1. What is a common or best approach to start analysing when you run real sample? I tried to run many samples and most of them do nothing. They just start and that's all. No CPU usage, procmon shows nothing, same autoruns. I'm wondering why? I have a separated VM (windows do to run samples and linux to monitoring newtork, both with isolated, private network and fake dns) with all needed tools and would like to do some basic dynamic analysis first, but often it is impossible, because I do not get any results.
For example this sample - MalwareBazaar | Browse malware samples
It doesn't seem to have anti-analysis stuff and when I ran it, it only started and nothing more - no internal registry usage, no network etc. Wondering why?

2. Anti-analysing techniques - do you have some advices regarding this? Articles or some stuff which could help to remove it from code and work on "clear" malware without any anti-analysing things? How do you fight with anti-analysis issues, how do you prepare your sample to run and to work with? I see I have big recognition problem what is a "good" code and what is junk and anti-analysing code. How could I fight with it?

3. Static analysing with IDA - what is your approach to analyze code statically? I see that I often fall into rabbit hole, want to analyze everything step by step and in the end I know nothing about analyzed code. I think, I'm doing something wrong but it is difficult to find out what would be better, that's why I'm asking - how to miss unnecessary stuff during analyze and focus only on important parts? What are those important parts? For example, my first tactic is always to check strings, check general info about file, run it on VM and observe behaviour. Then I look on some specific parts in IDA, like registry creation, network connections (or other internet stuff), file creation/modification, etc. But a lot of samples have a very huge codebase, so it is difficult to focus only on one part and often I am lost in it.

4. Samples - are there any "easier" or better malware families to analyze for beginner? I know that in general all malware is lottery, but maybe some types are better to start with than others?
Do you advise to filter samples somehow?

5. Could you elaborate (e.g. step-by-step) your own tactics and approaches when you start analysing samples? How do you start? What are you looking for first? When you are ready to go deeper, etc.? It would be great to have general overview, because I may understand more and create my own path then.

Hope you find some time to answer, maybe they are stupid questions, but I would like to understand fundamentals well.
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
1. Without analysing this sample: I can see that it is very old and a RAT. Those will generally not show much behaviour because they need a working C&C. Even two weeks can be too old for that. RATs always need that, and downloaders also need some working download locations. However it should be possible to see the network communication attempt, at least. Some samples have long sleeps and only run after some time. I sometimes let samples run over lunch and see what they were up to after my break.
Again, this is a generalized answer, I did not analyse the sample you gave. It is also possible there was an anti-reversing technique, that the sample is corrupt, buggy or needs some environmental factors (libraries, frameworks, other files, ...) that are not installed.
The best approach if it does not run is to do static in-depth analysis and debugging, see where it exits and what conditions were checked before that or what kind of exception is thrown.

2. To get deeper into that, check out this anti-reversing reference. Do not only read it but build your own samples by writing assembly or C code (I personally used FASM), then disassemble/debug them with IDA. Defeating a technique is usually not that hard once you are able to recognize it, because most of the time all you gotta do is change a conditional check or a jump.
I started this way. I didn't work through the whole book, but it still helped a lot to recreate some of them.
There are also ways to harden your VM to defeat some VM detection techniques. Google them.

3. Don't be too hard on yourself. You just started, so naturally you will get lost because you won't recognize what is in front of you. I recommend to set small goals. Concentrate on fully analysing one (smaller) function instead of the whole malware code. I recommend the book Reversing: Secrets of Reverse Engineering. It is an old book but still relevant, because it will open your eyes to how you go about reversing in general. How you determine the calling convention. How you find out what the code does if you have nothing else (no strings etc).
Currently your biggest issue is probably recognizing common functions and algorithms. This only comes with experience. Use comments, variable and function renaming in IDA, to make the code better readable while you work through it
If real samples are too intimidating yet and you get lost too often, program smaller samples yourself. C or C++ are ideal candidates here. Disable compiler optimization, then compile the sample, then disassemble the sample and try to find and understand how each part of that code looks like in the disassembly.
Btw this is something I still do if there are new execution environments that I don't understand yet, e.g., Go binaries look a bit different than C and C++ applications, so you can help yourself to learn how they work by compiling your own program and disassembling it afterwards.
What you can also do is pick a sample that is referenced by a malware analysis article. Then try to find and understand the code pieces that the article mentions.

4. Generally stealers, downloaders and keyloggers tend to be easier. .NET assemblies, PyInstaller malware, Batch2EXE wrapped files also tend to be a bit more beginner friendly.
As a beginner I would keep my hands off of worms, viruses (file infectors) and ransomware. Not because they are hard, but because they are risky.
RATs are interesting, but not ideal to start with because they need a working C&C to see action in dynamic analysis.
Also, as a beginner, I would avoid Go, VB6 and Delphi binaries. C++ can be difficult if it uses a lot of object oriented code, but if not, it is much like C. Code written in assembly and C tends to be a bit more beginner friendly.

5. There is no generic approach to this. It highly depends on the sample. It depends on what you see in there and what your goal of the analysis is. In my daily work I mostly create detection signatures, so static analysis of file and memory dumps to find patterns for the signature is enough in 90% of the cases.
When I hunt and pick samples for blog articles, I spend a lot of time figuring out if the sample is already a known family or something new, so I research a lot and take detections names on VT, Intezer, yara rule matches, sandbox reports and typical strings as a basis for the research.
When I finally have a sample to analyze in-depth, most of the time is usually spent on deobfuscation (if the sample is obfuscated). For my latest sample this task took me two weeks (with breaks, though, it would have been one week or so if I did not have other tasks in between). Reading the code once it is deobfuscated, is not really the issue.
I fear I cannot help you with this question because it is too broad.
Watch some malware analysis Youtube channels like OALabs. There you can see how people approach specific samples.
 

hunter44

New Member
Thread author
Feb 13, 2022
8
Thank you for answers.

1. Without analysing this sample: I can see that it is very old and a RAT. Those will generally not show much behaviour because they need a working C&C. Even two weeks can be too old for that. RATs always need that, and downloaders also need some working download locations. However it should be possible to see the network communication attempt, at least. Some samples have long sleeps and only run after some time. I sometimes let samples run over lunch and see what they were up to after my break.
Again, this is a generalized answer, I did not analyse the sample you gave. It is also possible there was an anti-reversing technique, that the sample is corrupt, buggy or needs some environmental factors (libraries, frameworks, other files, ...) that are not installed.
The best approach if it does not run is to do static in-depth analysis and debugging, see where it exits and what conditions were checked before that or what kind of exception is thrown.
Understand. I ran it with x64dbg and indeed, it fell into some Thread sleep loop. But you probably right, if I have fake dns and redirecting it would never reach C&C server.

2. To get deeper into that, check out this anti-reversing reference. Do not only read it but build your own samples by writing assembly or C code (I personally used FASM), then disassemble/debug them with IDA. Defeating a technique is usually not that hard once you are able to recognize it, because most of the time all you gotta do is change a conditional check or a jump.
I started this way. I didn't work through the whole book, but it still helped a lot to recreate some of them.
There are also ways to harden your VM to defeat some VM detection techniques. Google them.
Yes, I know this article, already read it and have it at hand all the time. I will look on this VM detections, because it probably cause more problems for me, because I cannot find out often where to find anti-VM code in the sample.
And yes, I create some small C/C++ programs and disassmble them to look for specific structures, behaviours or patterns. It is helpful, but it also is very different than malware code from real samples.

3. Don't be too hard on yourself. You just started, so naturally you will get lost because you won't recognize what is in front of you. I recommend to set small goals. Concentrate on fully analysing one (smaller) function instead of the whole malware code. I recommend the book Reversing: Secrets of Reverse Engineering. It is an old book but still relevant, because it will open your eyes to how you go about reversing in general. How you determine the calling convention. How you find out what the code does if you have nothing else (no strings etc).
Currently your biggest issue is probably recognizing common functions and algorithms. This only comes with experience. Use comments, variable and function renaming in IDA, to make the code better readable while you work through it
If real samples are too intimidating yet and you get lost too often, program smaller samples yourself. C or C++ are ideal candidates here. Disable compiler optimization, then compile the sample, then disassemble the sample and try to find and understand how each part of that code looks like in the disassembly.
Btw this is something I still do if there are new execution environments that I don't understand yet, e.g., Go binaries look a bit different than C and C++ applications, so you can help yourself to learn how they work by compiling your own program and disassembling it afterwards.
What you can also do is pick a sample that is referenced by a malware analysis article. Then try to find and understand the code pieces that the article mentions.
Yes, I'm trying. I also find new sample which is much better to analyze and as I see I can recognize many fields and understand what it is doing and how it works. Also, function graphs in IDA are amazing, because I can choose some interesting path at the beginning and anlyze it part by part.
I also started this book and trying to apply those methods to my work. What I find out in working with IDA is that I often do not know where e.g. given var or parameter is set and what value is there. It is hard for me. But, I also tried to solve it by running example in debugger and watch all values set in given function.

This is my common approach when I looking for new samples - before I even download sample, I read articles, malpedia, other reports regarding specific type of malware. If I have all theoretical stuff, I start to analyze sample.


4. Generally stealers, downloaders and keyloggers tend to be easier. .NET assemblies, PyInstaller malware, Batch2EXE wrapped files also tend to be a bit more beginner friendly.
As a beginner I would keep my hands off of worms, viruses (file infectors) and ransomware. Not because they are hard, but because they are risky.
RATs are interesting, but not ideal to start with because they need a working C&C to see action in dynamic analysis.
Also, as a beginner, I would avoid Go, VB6 and Delphi binaries. C++ can be difficult if it uses a lot of object oriented code, but if not, it is much like C. Code written in assembly and C tends to be a bit more beginner friendly.

That's good to know. Keyloggers and stealers as I see are commonly recommended for beginners. I also find new sample - Ardamax, keylogger theZoo/malware/Binaries/Keylogger.Ardamax at master · ytisf/theZoo and as I said above, I find it pretty friendly to analyze. It runs, do something, was packed, but I unpacked it with debugger and it seems to be good one for start.

I also tried to run some ransomwares in my isolated env (GrandGrab or WannaCry), but when I looked at the code in IDA it was quite big and difficult. Hope to come back to these samples in the future. I'm also focused only on C/C++ PE samples, so all Delphis or NET one need to wait now.
5. There is no generic approach to this. It highly depends on the sample. It depends on what you see in there and what your goal of the analysis is. In my daily work I mostly create detection signatures, so static analysis of file and memory dumps to find patterns for the signature is enough in 90% of the cases.
When I hunt and pick samples for blog articles, I spend a lot of time figuring out if the sample is already a known family or something new, so I research a lot and take detections names on VT, Intezer, yara rule matches, sandbox reports and typical strings as a basis for the research.
When I finally have a sample to analyze in-depth, most of the time is usually spent on deobfuscation (if the sample is obfuscated). For my latest sample this task took me two weeks (with breaks, though, it would have been one week or so if I did not have other tasks in between). Reading the code once it is deobfuscated, is not really the issue.
I fear I cannot help you with this question because it is too broad.
Watch some malware analysis Youtube channels like OALabs. There you can see how people approach specific samples.
I thought so. Everything depends on specific example and every approach would be different.
Regarding obfuscated programs - is there any common techniques/programs or some good books to learn how to deobfuscate strings? As I know it is often a simple XOR or base64 used for this, but maybe there are more spohisticated algorithms which are more difficult to solve? 99% of samples I see have obfuscation so it owuld be nice to know how to fight with it.

Regarding videos, I watch OALabs and cybercdh mostly. Sometimes John Hammond channel, but it is not striclty related to malware analysis.
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
is there any common techniques/programs or some good books to learn how to deobfuscate strings?
String deobfuscation tends to be one of the easier tasks, since it is often just one ore more functions called to decode the strings. So in many cases you can find that function and decode the string by checking its output. That even works if you have no idea what algorithm is used for decoding. A step up to that is reversing this algorithm and programming your own string decoder, e.g., in IDA Python for automatic decoding of all the strings.
XOR and base64 are common. I also use CyberChef to test hypothesis, because sometimes I see a string and think, it looks like XOR. But often you have custom algorithms.

Control flow obfuscation is harder to deal with. If you are lucky, someone has already created a tool. But otherwise it is a pita.

I think you are already doing very well in regards to your malware analysis learning. You will reach learning plateaus where you think you learn nothing, but that is normal for any kind of learning. Keep going if that happens. At some point you will look back and realize how much you have actually learnt. Reversing needs quite some frustration resistance at times. Also, you will never stop learning, the field is so vast. I do this professionally since 2015 and still feel like a noob sometimes.
 

hunter44

New Member
Thread author
Feb 13, 2022
8
String deobfuscation tends to be one of the easier tasks, since it is often just one ore more functions called to decode the strings. So in many cases you can find that function and decode the string by checking its output. That even works if you have no idea what algorithm is used for decoding. A step up to that is reversing this algorithm and programming your own string decoder, e.g., in IDA Python for automatic decoding of all the strings.
XOR and base64 are common. I also use CyberChef to test hypothesis, because sometimes I see a string and think, it looks like XOR. But often you have custom algorithms.

Control flow obfuscation is harder to deal with. If you are lucky, someone has already created a tool. But otherwise it is a pita.

I think you are already doing very well in regards to your malware analysis learning. You will reach learning plateaus where you think you learn nothing, but that is normal for any kind of learning. Keep going if that happens. At some point you will look back and realize how much you have actually learnt. Reversing needs quite some frustration resistance at times. Also, you will never stop learning, the field is so vast. I do this professionally since 2015 and still feel like a noob sometimes.

Got it. Will take a deeper look on these obfuscation techniques, thanks.

One more thing regarding RAT/Botnets analyze or other type of malware using C&C. I understand it is hard to analyze old samples, because probably domains for C&C servers do not exist anymore, but how do you simulate or prepare your environment to test/analyze this kind of samples? Connection with external server is "main" logic there, but probably you don't want to send direct requests to attackers domain, so what is an approach? Do you change hosts somehow to fake this connection and let malware thinks it connects to real domain?
As of now my laboratory looks like:
- HyperV with configured network
- Ubuntu VM with external and private network - external to downaload samples, private to have isolated connection between this machine and Windows machine. I use Burp and InetSim.
- Windows 10 VM (FLARE VM package) with only private network configured (totally cut off from internet), so all network requests from malware are redirected on ports to InetSim and Burp.

As I understand this configuration do not give me 100% chances to run RAT or Botnets dynamically and look what they are doing? In some point, all those malwares will sleep or wait for connection to C&C. So what should I change here to prepare environment for this type of malware?
 

struppigel

Moderator
Verified
Staff member
Well-known
Apr 9, 2020
524
One approach is to redirect any connections (either localhost or second VM) and run your own C&C server there or fake the responses using InNetSim or similar tools. You'd need to analyse the expected responses and write the server yourself. But tbh, I rarely do this. I mostly just analyse the code statically then. I have done this only two times or so. For me it is rarely worth the time.

As I understand this configuration do not give me 100% chances to run RAT or Botnets dynamically and look what they are doing?
You don't need 100% chance that they show behavior. If they don't show behavior, you continue with other methods of analysis.
In most cases you will see at least persistence and injection/unpacking behavior, though.