Advice Request I am head of research at Emsisoft. Ask me anything! :)

Fabian Wosar · Mar 6, 2019

motox781 said:
"AI" Next Gen AVs...Hype or real?

There is no AI AV out there. All of them use human-assisted machine learning, which by its very definition is not AI. Machine learning has been in use by pretty much every single major AV (what they call "legacy") in one form or another for literally over a decade. The oldest and most prominent one being systems that automatically recognise malware families, clustering samples and extracting appropriate signatures from them.

For example, we use machine learning extensively inside the anti-malware network. Our signature generation tools also automatically suggest certain functions and code fragments that would make good signatures.

So all those fancy machine learning based technologies are available in one way or another to most traditional anti-viruses as well. Except that they, in addition, also have all those other technologies at their disposal: Emulation, behaviour monitoring, signature scanning, reputation-based anomaly detection etc.. You know, the ones that traditional AVs always had, making them, in general, more flexible.

Scorpion Illuminati · Mar 6, 2019

Fabian Wosar said:
PHP is just horrible. It doesn't scale particularly well compared to DotNet Core or even Python for example. Go is really interesting and definitely a good pick. Especially if you want to go into orchestration and devops.

Does your dev team use git? I know the basic commands(add, commit, push, branch, status etc.) wanna know if it will help me.

Fabian Wosar · Mar 6, 2019

Talking about signature tools. I almost forgot. This is, for example, one of the tools we developed internally. It's called "Signature Maker". It's a clever name, I know. It's kind of like an IDE, except for creating detection signatures for our scan engine:

In general signatures for the Emsisoft engine are essentially functions that are being called by the scan engine depending on certain filter flags, like the file type for example. The signature flags you can see on the right are pretty much functions that perform certain tests. We can match signatures against certain version information fields for example or based on specific PE header fields. Things like imported APIs or exported APIs. But also more advanced information. Programming languages like .NET or Delphi, for example, leave a bunch of meta information behind, that our scan engine is capable of parsing and use as flags and information to feed into the actual detection functions (which is what signatures for our engine actually are).

Fields can be matched using a variety of methods. The most obvious one is literal matching, so checking whether the value of the file to be scanned is exactly like a given value. But it's also possible to use wild cards or regular expressions, to create more complex strings to match against. This applies to binary strings as well by the way.

One way we apply machine learning, for example, is by automatically suggesting our analysts flags and fields that are high-quality candidates for an actual signature, depending on which samples they are currently working on. You can see those red pins in front of some of the signature flags, which indicate attributes that are anomalies and therefore likely flags that would make a good signature.

But we aren't limited to just these flags. Signatures can also be made up or contain more complex patterns:

You can simply highlight the areas of the file that should be used for detection and how to locate that area. Whether it should be relative to certain points of interest for example. Patterns can have ranges. So even if they move around in the file, they still can be found. Obviously doing those by hand is a bit tedious. So you can also, once again using machine learning techniques, let the tool figure out good candidates for you:

This one, for example, parses all the functions inside the code of the file and extracts the code blocks and fragments that are most unique and don't appear in other good files. But it also works for normal strings:

At the very end of all of this, whether you decided to create the signature manually or let all the machine learning stuff help you, you end up with a small function in our own domain-specific programming language that is used by our scan engine:

This function will then be compiled into native machine code. The code of hundreds of thousands of these signatures is then combined into signature files that are being shipped to our users.

This is just a very small portion of what Signature Maker can do, but it outlines roughly how we would go about adding detection of a new malicious file. Ultimately there are a whole bunch of additional features, especially for clustering vast amounts of samples to find all the samples that are related to each other for example, so we can extract a single signature that matches all of them (often tens of thousands of variations).

It also signifies something, that I don't think a lot of people realise: For a lot of AVs, there is no difference between the engine and the signatures. In many cases, the "engine" is just a loader or a virtual machine, that loads and executes the actual logic and functionality that is part of the signature files. I only showed you a very small amount of what we can do, but in general, it can get a lot crazier and "signatures", which are really just normal code running on your system, can end up being entire algorithms and perform complex operations (for unpacking for example) and can interact with the entire Windows API.

I hope that little excursion was interesting.

Fabian Wosar · Mar 6, 2019

Scorpion Illuminati said:
Does your dev team use git? I know the basic commands(add, commit, push, branch, status etc.) wanna know if it will help me.

Yes. We do use Git. Some very old projects also use Subversion. But honestly, there is absolutely no point in learning any other version control system than Git.

Nightwalker · Mar 6, 2019

Fabian Wosar said:
There is no AI AV out there. All of them use human-assisted machine learning, which by its very definition is not AI. Machine learning has been in use by pretty much every single major AV (what they call "legacy") in one form or another for literally over a decade. The oldest and most prominent one being systems that automatically recognise malware families, clustering samples and extracting appropriate signatures from them.

For example, we use machine learning extensively inside the anti-malware network. Our signature generation tools also automatically suggest certain functions and code fragments that would make good signatures.

So all those fancy machine learning based technologies are available in one way or another to most traditional anti-viruses as well. Except that they, in addition, also have all those other technologies at their disposal: Emulation, behaviour monitoring, signature scanning, reputation-based anomaly detection etc.. You know, the ones that traditional AVs always had, making them, in general, more flexible.

Great post, I wrote something similar in a discussion on Wilders Security forum. For me the marketing of "Next Gen AV" is a insult for the real malware analysts/developers/security specialists out there.

How effective is Signatureless AVs like Panda Dome?

drakester · Mar 6, 2019

Impressive amount of insight on signatures and how they are built, thank you. A lot of other vendors wouldn't disclose as much.
Emsisoft support is absolutely great, another kudos to them.

No questions from me, just some props, thanks for doing this and being so close to users and potential users.

motox781 · Mar 6, 2019

Difficult question and touchy subject here. Is a 3rd party firewall necessary? And if so, when? (Assuming Windows 10 OS).

bjm_ · Mar 6, 2019

Fabian Wosar said:
Don't get me wrong, I understand that companies, who do elaborate tests, need some way to pay the bills as well. They have employees who need to get paid for example. But I think different price tiers that buy you more frequent testing or the ability to withhold test results (Matousec used to do that) or the ability to buy the performance data of the other participating products, for example, goes a little bit too far.

Malwarebytes Labs, November 27, 2018 article Why Malwarebytes decided to participate in AV testing.

We still do not believe in the “pay-to-play” model, and especially the “pay-to-see-what-you-missed” model that some organizations use. (AV companies, for an additional fee, can see the samples they did not catch in the test and develop fixes in the product for future tests/use.) Nonetheless, we want to give our customers some idea of what we are capable of, even when the playing field is skewed.

Scorpion Illuminati · Mar 6, 2019

Fabian Wosar said:
Yes. We do use Git. Some very old projects also use Subversion. But honestly, there is absolutely no point in learning any other version control system than Git.

What's wrong with subversion?

oldschool · Mar 6, 2019

Raiden said:
Thanks for a great post!

I agree whole heartedly and thats one of things I like most about Emsisoft. IMHO (and I am not just saying this because you are here), I honestly believe that Emsisoft has probably one of the best, if not THE best customer service available out of all the security companies around.

Fabian Wosar said:
I hope that little excursion was interesting.

I'm a fairly new student of Windows and the world of security softs. I don't pay a yearly subscription for anything currently, but have purchased a few programs. This is to say if I were to buy a yearly/multi-year subscription for an AV, it would be yours based on the above. Wow, a company with some ethics, what a delight!

Burrito · Mar 6, 2019

I used to use Emsisoft. I've always enjoyed reading Fabian's posts.

While I no longer use it... I've always thought of Emsisoft as the 'good guys' of the industry.

Honest and fair brokers.

Thanks for your participation at MT @Fabian Wosar.

Scorpion Illuminati · Mar 6, 2019

I wish this Q& A would never end! Do you have any old projects from when you started coding that you are willing to share(c64, zx spectrum, dos, win 3.1 etc.)?

ForgottenSeer 72227 · Mar 6, 2019

Fabian Wosar said:
Talking about signature tools. I almost forgot. This is, for example, one of the tools we developed internally. It's called "Signature Maker". It's a clever name, I know. It's kind of like an IDE, except for creating detection signatures for our scan engine:

View attachment 210180

In general signatures for the Emsisoft engine are essentially functions that are being called by the scan engine depending on certain filter flags, like the file type for example. The signature flags you can see on the right are pretty much functions that perform certain tests. We can match signatures against certain version information fields for example or based on specific PE header fields. Things like imported APIs or exported APIs. But also more advanced information. Programming languages like .NET or Delphi, for example, leave a bunch of meta information behind, that our scan engine is capable of parsing and use as flags and information to feed into the actual detection functions (which is what signatures for our engine actually are).

Fields can be matched using a variety of methods. The most obvious one is literal matching, so checking whether the value of the file to be scanned is exactly like a given value. But it's also possible to use wild cards or regular expressions, to create more complex strings to match against. This applies to binary strings as well by the way.

One way we apply machine learning, for example, is by automatically suggesting our analysts flags and fields that are high-quality candidates for an actual signature, depending on which samples they are currently working on. You can see those red pins in front of some of the signature flags, which indicate attributes that are anomalies and therefore likely flags that would make a good signature.

But we aren't limited to just these flags. Signatures can also be made up or contain more complex patterns:

View attachment 210182

You can simply highlight the areas of the file that should be used for detection and how to locate that area. Whether it should be relative to certain points of interest for example. Patterns can have ranges. So even if they move around in the file, they still can be found. Obviously doing those by hand is a bit tedious. So you can also, once again using machine learning techniques, let the tool figure out good candidates for you:

View attachment 210181

This one, for example, parses all the functions inside the code of the file and extracts the code blocks and fragments that are most unique and don't appear in other good files. But it also works for normal strings:

View attachment 210183

At the very end of all of this, whether you decided to create the signature manually or let all the machine learning stuff help you, you end up with a small function in our own domain-specific programming language that is used by our scan engine:

View attachment 210184

This function will then be compiled into native machine code. The code of hundreds of thousands of these signatures is then combined into signature files that are being shipped to our users.

This is just a very small portion of what Signature Maker can do, but it outlines roughly how we would go about adding detection of a new malicious file. Ultimately there are a whole bunch of additional features, especially for clustering vast amounts of samples to find all the samples that are related to each other for example, so we can extract a single signature that matches all of them (often tens of thousands of variations).

It also signifies something, that I don't think a lot of people realise: For a lot of AVs, there is no difference between the engine and the signatures. In many cases, the "engine" is just a loader or a virtual machine, that loads and executes the actual logic and functionality that is part of the signature files. I only showed you a very small amount of what we can do, but in general, it can get a lot crazier and "signatures", which are really just normal code running on your system, can end up being entire algorithms and perform complex operations (for unpacking for example) and can interact with the entire Windows API.

I hope that little excursion was interesting.

Another fantastic post @Fabian Wosar!

I really appreciate (and I am sure many people here do as well) you taking the time to do this. It's really interesting to see what makes Emsisoft tick and it just goes to show what a great company Emsisoft is truly. While I know you can't tell us absolutely everything, it's just great to see little bits of what goes behind the scenes. As I've said previously I am very eager and excited to see the new upcoming changes/improvements and how they will work.

I know in a previous post you mentioned that with the upcoming changes you will be able to provide signatures in real-time, does this mean that you will have some form of Machine Learning along side people creating the sigs? Also am I safe to assume that if an Emsisoft user comes in contact with a new piece of malware, all other Emsisoft users will be protected as well, due to the fact that the signature was created in real-time, similar to what some of your competitors are doing?

goodjohnjr · Mar 6, 2019

Fabian Wosar said:
We may. The problem is, that ultimately with these free AVs you as a user pay with your data. That's generally speaking something we don't feel very comfortable with. Especially given that not a lot of people are even aware of it.

Recently I was kind of surprised to see that an otherwise super privacy conscious user had Traffic Light installed for example. It doesn't seem to be common knowledge that Traffic Light and a bunch of other browser extensions (Comodo Online Security Pro, Norton Safe Web, Avira Browser Safety, Avast Online Security being the biggest ones) like it will literally send every single URL you visit in clear text off to the vendor's server. The privacy policies aren't always clear and kinda sketchy at times. I am sure that some people don't mind. But I am also sure that a lot of people do mind, but simply don't know.

Hello Fabian Wosar,

Thank you for this Q & A, and for informing us that some security browser extensions are sending the URLs we visit over clear text instead of using something like SSL or whatever because I did not realize that even some (most) of those security extensions by major companies were doing that.

I am currently using the Emsisoft Browser Security and Windows Defender Browser Protection (WDBP) extensions, and I was wondering if the WDBP and Malwarebytes extensions are guilty of sending the URLs unencrypted as well and can you name any other extensions that you know of that are guilty of this?

That will really help me / some of us know which extensions to avoid for those of us worried about privacy / security.

Thank you,
-John Jr

Vasudev · Mar 7, 2019

Fabian Wosar said:
We may. The problem is, that ultimately with these free AVs you as a user pay with your data. That's generally speaking something we don't feel very comfortable with. Especially given that not a lot of people are even aware of it.

Recently I was kind of surprised to see that an otherwise super privacy conscious user had Traffic Light installed for example. It doesn't seem to be common knowledge that Traffic Light and a bunch of other browser extensions (Comodo Online Security Pro, Norton Safe Web, Avira Browser Safety, Avast Online Security being the biggest ones) like it will literally send every single URL you visit in clear text off to the vendor's server. The privacy policies aren't always clear and kinda sketchy at times. I am sure that some people don't mind. But I am also sure that a lot of people do mind, but simply don't know.

We have bigger heads that lives off Telemetry or advanced diagnostics data's ahem Win 10. So, EAM free will be a small tail that uses minimal cost to and fro data in exchange for better security for the un-paid user.

I use BD TL and for the most part, it isn't invasive. Tried a few and they become highly intrusive and bloated after updates. I recently threw out BD TS which was blocking windows updates, driver updates and everything else until I uninstalled it.

ForgottenSeer 72227 · Mar 7, 2019

Vasudev said:
We have bigger heads that lives off Telemetry or advanced diagnostics data's ahem Win 10. So, EAM free will be a small tail that uses minimal cost to and fro data in exchange for better security for the un-paid user.

It's a fair point for sure, but personally Emsisoft's stance on privacy is one of the many reasons why I like the company. Sure they can go down the route like every one else when it comes to free AV/AM, but just because everyone else does it, doesn't mean they should. It's one of the many things that makes them different than the rest. Sure it may be "priced" higher, but I am willing to pay it knowing I am going to get a great product that has excellent protection, amazing customer service and excellent privacy. Things in life shouldn't always be about getting everything for free. If you are unable to purchase the product for what ever reason, it's totally cool, there are other great options out there to choose from and heck if your using W10 you technically already have a free AV/AM. I use to really like Avast, but I can no longer stand them due to the way they treat their customers and how they data mine them. I don't care how good it is protection wise, their data mining has turned me off completely. I really would not like to see Emsisoft go down this route at all. Keep it a paid product and if you aren't able to purchase it, wait for either a potential giveaway, or a sale.

Fabian Wosar · Mar 7, 2019

Nightwalker said:
Great post, I wrote something similar in a discussion on Wilders Security forum. For me the marketing of "Next Gen AV" is a insult for the real malware analysts/developers/security specialists out there.

Some of them do pretty good work. But especially once they try to tell you, that all "legacy AVs" do are signatures, they are blatantly lieing to your face.

drakester said:
No questions from me, just some props, thanks for doing this and being so close to users and potential users.

Thanks.

motox781 said:
Difficult question and touchy subject here. Is a 3rd party firewall necessary? And if so, when? (Assuming Windows 10 OS).

Well, we discontinued our firewall precisely because we don't see much benefit compared to the firewall in Windows 7 even. The biggest issue with the Windows firewall is tamper protection. Meaning: Everything running on your system can create rules and allow itself. EAM actually blocks that. So only applications you allow can interact with the Windows firewall.

bjm_ said:
Malwarebytes Labs, November 27, 2018 article Why Malwarebytes decided to participate in AV testing.

It's funny because for 2019 we decided to drop out of AV Comparatives.

Scorpion Illuminati said:
What's wrong with subversion?

Their slogan used to be: "CVS done right." There is no way you can do CVS right, hence why it was doomed to failuget-gom the get go.

oldschool said:
I'm a fairly new student of Windows and the world of security softs. I don't pay a yearly subscription for anything currently, but have purchased a few programs. This is to say if I were to buy a yearly/multi-year subscription for an AV, it would be yours based on the above. Wow, a company with some ethics, what a delight!

Thanks.

Burrito said:
I used to use Emsisoft. I've always enjoyed reading Fabian's posts.

Honest question: Why did you stop?

Scorpion Illuminati said:
I wish this Q& A would never end! Do you have any old projects from when you started coding that you are willing to share(c64, zx spectrum, dos, win 3.1 etc.)?

I unfortunately no longer do. But my first stuff were small anti-virus tools that detected one specific virus and cleaned infected files. I then quickly moved to heuristic stuff, because I thought it was stupid to create new signatures and detections for every new virus. Back then there were literally only like a hundred or so of them in the first place though.

Raiden said:
I know in a previous post you mentioned that with the upcoming changes you will be able to provide signatures in real-time, does this mean that you will have some form of Machine Learning along side people creating the sigs?

As I showed in the tool we use, we already do that. There is no way we can keep up with the number of samples we get otherwise. We obtain more than 450.000 new malicious files every single day. What I showed you there was pretty much the "manual" mode.

Raiden said:
Also am I safe to assume that if an Emsisoft user comes in contact with a new piece of malware, all other Emsisoft users will be protected as well, due to the fact that the signature was created in real-time, similar to what some of your competitors are doing?

That's already the case for the behaviour blocker. If we see a malicious file on a single system and it is being picked up by the behaviour blocker there, automatic blocks are issued for all other users using EAM already.

goodjohnjr said:
Thank you for this Q & A, and for informing us that some security browser extensions are sending the URLs we visit over clear text instead of using something like SSL or whatever because I did not realize that even some (most) of those security extensions by major companies were doing that.

Ah, sorry. They do use SSL. But they send the entire URL to their servers, while most browsers or our extension for example, only send hashes and non-specific information that can't be turned back into URLs.

goodjohnjr said:
I am currently using the Emsisoft Browser Security and Windows Defender Browser Protection (WDBP) extensions, and I was wondering if the WDBP and Malwarebytes extensions are guilty of sending the URLs unencrypted as well and can you name any other extensions that you know of that are guilty of this?

Both are fine.

Vasudev said:
I use BD TL and for the most part, it isn't invasive. Tried a few and they become highly intrusive and bloated after updates. I recently threw out BD TS which was blocking windows updates, driver updates and everything else until I uninstalled it.

Windows doesn't get a list of all the websites you look at. Unlike Traffic Light:

KevinYu0504 · Mar 7, 2019

I got a question too ,
I already saw you explanation that Avira doesn't want partner with Emsi at first ,
but why Emsisoft final choose Birdefender to be a partner ? any special reason ?

I am from Asia , Taiwan ,
Bitdefender do not have any branch office or server in Asia ,
so the detection rate and reaction speed always behind than others , such as Kaspersky , ESET , Norton .....

Is there any chance Emsisoft will use Kaspersky's data base in the future ?

Fabian Wosar · Mar 7, 2019

KevinYu0504 said:
I already saw you explanation that Avira doesn't want partner with Emsi at first ,
but why Emsisoft final choose Birdefender to be a partner ? any special reason ?

Mostly detection to false positive ratio combined with relative affordability. We also approached ESET, but they didn't have an OEM program back then and they still don't to my knowledge, as well as Kaspersky, who just stopped doing OEM deals back then.

KevinYu0504 said:
Is there any chance Emsisoft will use Kaspersky's data base in the future ?

Never say never, but it is unlikely.

ForgottenSeer 72227 · Mar 7, 2019

Fabian Wosar said:
As I showed in the tool we use, we already do that. There is no way we can keep up with the number of samples we get otherwise. We obtain more than 450.000 new malicious files every single day. What I showed you there was pretty much the "manual" mode.

That's already the case for the behaviour blocker. If we see a malicious file on a single system and it is being picked up by the behaviour blocker there, automatic blocks are issued for all other users using EAM already.

Thanks for the info and that's great to hear!

Thanks for taking the time to make it clear. Sorry if I am making you repeat yourself, I am still learning, so sometimes understanding how things are done does take take a few goes at it.

Search

Advice Request I am head of research at Emsisoft. Ask me anything! :)

Fabian Wosar

From Emsisoft

Scorpion Illuminati

Level 2

Fabian Wosar

From Emsisoft

Fabian Wosar

From Emsisoft

Nightwalker

Level 24

drakester

Level 1

motox781

Level 10

bjm_

Level 15

Scorpion Illuminati

Level 2

oldschool

Level 85

Burrito

Level 24

Scorpion Illuminati

Level 2

ForgottenSeer 72227

goodjohnjr

Level 5

Vasudev

Level 33

ForgottenSeer 72227

Fabian Wosar

From Emsisoft

KevinYu0504

Level 5

Fabian Wosar

From Emsisoft

ForgottenSeer 72227

Similar threads