Serious Discussion Why does the Comodo "Disappearing HIPS rules" bug require a complete source code rewrite?

How would you prevent the termination of this one-by-one backup creation when Windows on shutdown forcibly kills it?

Furthermore it could be the case that all rules are deleted first and than all rules are rewritten when only one new rule needs to added.
When one manually adds e new HIPS rule in CIS and the HIPS rules list is (very) long it takes pretty much time before the new rule is added before the HIPS rules window closes.

Yeah, they likely at the first step delete all rules in preparation for non-atomic write and this can be easily verified, add a rule, before that open regedit at the hips rules location. Once the Comodo loading commences, start refreshing in regedit repeatedly. Observe whether rules will disappear.
Observe whether they will reappear one by one or as a single dump. You can also use tools like procmon.

If all rules disappear and reappear, it all comes down to forceful termination, Windows will not wait forever for Comodo to rewrite its rules. Specially in paranoid mode, they will quickly become far too many.

They can also be performing some rule maintenance operations that take so long.

The fix as I suggested at the start is in first, optimising the writer to support few operations: edit, delete, add.
Instead of always overwriting, you pass what you want it to do, new rule, old rule (or empty string) and whatever else you want it to write. Then your shutdown routines is fixed. This writer is called in multiple places of the GUI (and not only) so the calls need to be updated.
Any maintenance operations will need to be removed from this function, they belong in an executable called as a scheduled task.
 
Last edited:
How would you prevent the termination of this one-by-one backup creation when Windows on shutdown forcibly kills it?

The backup is incremental. So, a shutdown or a crash can only corrupt the last change, but all others are untouched.
Anyway, it is not important how I would do it. It is already done in Xcitium. You must ask someone working for Xcitium or ask on the Xcitium forum.

However, I would store any HIPS setting as an independent registry value. The new setting tries to add a new registry value or delete an existing value without touching others. If something goes wrong at moment X of time during the shutdown, the new settings will not be stored, but the rest stored before moment X will not be affected.
 
Last edited:
The backup is incremental. So, a shutdown or a crash can only corrupt the last change, but all others are untouched.
It has to be executed at the very beginning of the function this backup because whatever is after seems to be taking long and the writing logic is probably pushed at the end. This is my bet.

They could also be verifying all rules (whether the files exist, whether they were changed). Cuz just writing to registry doesn’t really take that long (not sure how long, Pico knows).
 
However, I would store any HIPS setting as an independent registry value.
They are stored as independent registry values, see below image.
Each number below Policy (i.e. the highlited 1003) represents a HIPS rule belonging to a certain object (i.e. a file on disk or file group). Adding one new HIPS rule wouldn't need to consume much time however it does consume pretty much time when the HIPS list is long (even if the list is short it takes little time).

HIPS_Policies.jpg
 
They are stored as independent registry values, see below image.
Each number below Policy (i.e. the highlited 1003) represents a HIPS rule belonging to a certain object (i.e. a file on disk or file group). Adding one new HIPS rule wouldn't need to consume much time however it does consume pretty much time when the HIPS list is long (even if the list is short it takes little time).

View attachment 291922

If those registry keys are corrupted on shutdown, they require an incremental backup/restore, as in the solution I posted.
 
  • Like
Reactions: Trident
How to fix the HIPS issue without rewriting much of the code.
  1. The function that rewrites the HIPS rule should be extended by including what the analogous Xcitium function does. So, CIS will still rewrite the HIPS settings as usual, but additionally, all those settings will be stored one by one in different registry values without rewriting (as a backup).
  2. CIS will use an additional kernel driver that compares the HIPS settings with the backup and updates the HIPS settings if they are corrupted.
  3. The ELAM driver must be slightly modified to run this additional kernel driver as early as possible before activating HIPS rules.

If point 1 is hard to apply, here is another solution:
  1. CIS will use an additional kernel driver and supporting DLL that monitor modifications of HIPS settings in the registry and create incremental backups (in the registry or in the file backup on disk). After restarting Windows, this kernel driver uses backups to update the native HIPS settings in the registry if they are corrupted.
  2. The ELAM driver must be slightly modified to run this additional kernel driver as early as possible before activating HIPS rules.
This method avoids code rewriting (except for small modifications in the ELAM driver).
The driver + DLLs can also be extended to fix other CIS configuration problems if necessary.

Post updated (DLL support added).
 
Last edited:
Maybe this is all easier said than done as we don't fully understand or know how exactly CIS handles (or needs to handle to maintain / continu system protection) HIPS rules modification (delete / rewrite / add) and which parts of HIPS need to be active during boot-time to protect CIS and the system. Also CIS containment may not work yet during boottime so CIS system protection may rely solely on (parts of) HIPS during boottime.
Delaying HIPS activation during boot-time seems to me not a good idea as it may cause unprotected time period(s) during boot-time.
 
  • Hundred Points
Reactions: Divergent
Maybe this is all easier said than done as we don't fully understand or know how exactly CIS handles (or needs to handle to maintain / continu system protection) HIPS rules modification (delete / rewrite / add) and which parts of HIPS need to be active during boot-time to protect CIS and the system. Also CIS containment may not work yet during boottime so CIS system protection may rely solely on (parts of) HIPS during boottime.
Delaying HIPS activation during boot-time seems to me not a good idea as it may cause unprotected time period(s) during boot-time.
We are not saying we know how Comodo HIPS and hipsters work. We are enjoying a technical discussion. The hips and the bug belong to Comodo.
 
Delaying HIPS activation during boot-time seems to me not a good idea as it may cause unprotected time period(s) during boot-time.
Yes, delaying would not be good.

However, I doubt that the HIPS activation code is in the ELAM driver (you can ask on the Comodo forum). Most often, the protection layers are applied by the standard kernel drivers or userland services. That is how I managed to dismantle the AVs in the AV challenge videos, by blocking kernel drivers.

In the case of CIS HIPS, one must only modify the ELAM driver to run the "HIPS restore" driver before the drivers that are related or dependent on HIPS.

If the HIPS were managed via the ELAM driver, it should be rewritten, which can be a more complicated task.
 
you can ask on the Comodo forum
They will not disclose information about their technical implementations. The only place such information can exist is the mandatory legal documentation for patents (in the cases where a developer could have solved in a unique way a universal problem). Other than that, the mods won’t even bother sending an email to the development team. And if they do, no one will detail the fix.

The corruption anyway is the same like delaying. On the next boot there are no rules. So Comodo HIPS starts early but applies minimal rules (if any).

HIPS is not an essential protection layer in any case. It is switched off by default. The default configuration does not apply any rules on boot either and it is recommended by Comodo to keep it that way.

So in this context, “delaying it is not good” is not a valid reason to not parse the rules from userland as soon as it can be done.

Various other components like the AV also rely on the start of user mode services to be fully active.
 
Last edited:
  • +Reputation
Reactions: simmerskool
These are workarounds, not a true fix. They addresses the symptom, the loss of rules, but not the underlying disease, which is the race condition that causes the configuration file to become corrupted in the first place.

The consistent trigger for this bug is the "Create rules for safe applications" feature, which attempts to write to the configuration file during system shutdown. This isn't a simple, isolated bug that can be patched. It's a race condition rooted in the architecture of how Comodo's modules interact and how they handle configuration changes during critical system events.

The only way to truly fix this is to re-engineer the entire configuration management system to be atomic, meaning the entire save operation either completes successfully or not at all, leaving the existing configuration intact. This would require a fundamental redesign of how CmdAgent.exe and its related modules handle their data, which is why I maintain that a complete source code rewrite of this core component is necessary.

Anything less is just a band-aid on a foundational crack.
 
The only way to truly fix this is to re-engineer the entire configuration management system to be atomic, meaning the entire save operation either completes successfully or not at all

Microsoft long time ago has fixed the problem you are describing and has implemented atomic write by default.
Developers who update the APIs they call have less problems.

This is exactly one of the reasons why software rarely updated (or not at all) is not recommended to be used.

Ask AI about registry transactions.

You don’t need kernels and ELAMs and you don’t need to re-engineer the whole module because you can open the source for the settings or HIPS or whatever dll performs the write and you can use CTRL+F to see where registry writing is being called.
 
  • +Reputation
Reactions: simmerskool
Basically, ELAM is used to allow/block other boot drivers by checking with malicious digital signatures db in registry, then it unloads and "passes the torch" to regular protection driver (its MS recommendation for the ELAM/AM drivers). I think it couldn't be used for anything else.
MS has pretty strict requirements for ELAM driver performance:
{1505CC19-9A6A-4350-B2B7-1EFC67E7B924}.png
Also, it stores certificates required for AM service to start as PPL-AM process.
 
  • Like
Reactions: Pico and Trident
Microsoft long time ago has fixed the problem you are describing and has implemented atomic write by default.
Developers who update the APIs they call have less problems.

This is exactly one of the reasons why software rarely updated (or not at all) is not recommended to be used.

Ask AI about registry transactions.

You don’t need kernels and ELAMs and you don’t need to re-engineer the whole module because you can open the source for the settings or HIPS or whatever dll performs the write and you can use CTRL+F to see where registry writing is being called.
You bring up a valid technical point about atomic operations being available in modern Windows APIs. However, applying that concept here is an oversimplification that misses the forest for the trees.

It's not the tool, It's how you use It. You're correct that Microsoft provides mechanisms like "Registry Transactions (TxR)" and, historically, "Transactional NTFS (TxF)". But simply having these tools available doesn't mean a developer has used them correctly, or that they are even applicable to this specific problem.

"File vs. Registry" forensic analysis points to the corruption of a configuration "file", not a registry hive. While registry transactions are powerful, they don't magically fix file I/O. For atomic file operations, a developer would need to use specific techniques like writing to a temporary file and then using `ReplaceFile`, or leveraging the now-deprecated Transactional NTFS (TxF). We have no evidence Comodo is doing this.

"Logic Over API Calls" The core issue I've been describing is a "race condition" based on the application's "logic". The problem isn't necessarily "which" API is being called, but "when" it's being called. The `CmdAgent.exe` process is initiating a save operation during a system-wide shutdown event. This is an inherently dangerous time to perform complex I/O that involves multiple modules. Even with an atomic write API, if the parent process is terminated prematurely, the transaction could be aborted, leading to the same result, no new file is written, and the old one might be deleted on the next startup if the application logic deems it invalid.

"CTRL+F is Not a Solution" Your suggestion to just open the source and use `CTRL+F` to find the registry writing calls trivializes the complexity of debugging a multi-threaded, enterprise-grade security application. The bug is likely not in a single, obvious `WriteRegKey()` call. It's in the intricate, high-level logic that coordinates the shutdown, gathers settings from various modules, and serializes them to disk. That's not something you find with a simple text search.

In short, blaming this on "outdated APIs" is a red herring. The evidence points to a fundamental "design flaw" in how Comodo handles its configuration state during a critical and time-sensitive OS event. Bolting on a modern API won't fix a broken process. The process itself needs to be re-architected.
 
Comodo Containment works quite well against boot time malware attacks.
But doesn't it require any running parts of HIPS to be fully functional?
Even if HIPS is switched off by user setting some HIPS parts might be still, under the surface, active / functional because they might be needed by other CIS components like containment.
The same is true for the AV part of CIS, if one installs CF without AV option then parts of the AV engine are still active in CF.
 
CTRL+F is Not a Solution" Your suggestion to just open the source and use `CTRL+F` to find the registry writing calls trivializes the complexity of debugging a multi-threaded, enterprise-grade security application. The bug is likely not in a single, obvious `WriteRegKey()` call. It's in the intricate, high-level logic that coordinates the shutdown, gathers settings from various modules, and serializes them to disk. That's not something you find with a simple text search.
Well isn’t that what I was saying earlier (and you argued on)? You went around in tangents to come and post whatever I said from the very beginning.

There are so many solutions, when a company wants to apply them, they will apply them. Specially nowadays when developments are accelerated by AI.

The full solution:

Write rules to disk. Writing massive configurations in registry is not optimal. Just like you don’t write the antivirus definitions in registry, HIPS rules increasing by the minute and containing who knows how much information each, do not belong in registry.
From the very beginning the optimal solution was to operate all rules in memory, put them in a nice buffer and dump them in a file at once.
So in this sense it can be concluded that the design is flawed.

Next, implementing parsers for such massive volume of information in kernel is not a good idea AT ALL.
Instead, the solution can monitor all startup hooks as soon as they are created and compile a config with boolean values. They can be basics like where the file is located? Is it an unusual folder?
What is the reputation?
Is it properly signed?
That minimal config can be saved anywhere you want.

The kernel acts as policy enforcement engine (because it is allowed to start early). Anything that passes basic checks can start. Anything that doesn’t is delayed until HIPS and everything else is loaded and working. At that point, everything can be allowed/denied based on your rules.

If it’s not designed this way from the start, then you either have to rewrite the module or you apply a workaround.
The workarounds can be many, the most sensible one was suggested already in earlier posts.

In any case, the “it is too complex for us to do it” is not valid and either way, you will be required to manage your startup and shutdown properly. If it’s too difficult for you to manage your shutdown and startup routines (because the kernel, the serialisation, this, that), you always have the option to stop offering antivirus software.
 
Last edited:
  • Like
Reactions: simmerskool
So in this context, “delaying it is not good” is not a valid reason to not parse the rules from userland as soon as it can be done.

I am not sure if "HIPS settings restore" can be done by a userland process. I doubt if all protection that uses information about HIPS is related only to the userland. Furthermore, it would be hard to check it.
However, if it is true, then "HIPS settings restore" could be done without using the kernel driver.
 
Well isn’t that what I was saying earlier (and you argued on)? You went around in tangents to come and post whatever I said from the very beginning.

There are so many solutions, when a company wants to apply them, they will apply them. Specially nowadays when developments are accelerated by AI.

The full solution:

Write rules to disk. Writing massive configurations in registry is not optimal. Just like you don’t write the antivirus definitions in registry, HIPS rules increasing by the minute and containing who knows how much information each, do not belong in registry.
From the very beginning the optimal solution was to operate all rules in memory, put them in a nice buffer and dump them in a file at once.
So in this sense it can be concluded that the design is flawed.

Next, implementing parsers for such massive volume of information in kernel is not a good idea AT ALL.
Instead, the solution can monitor all startup hooks as soon as they are created and compile a config with boolean values. They can be basics like where the file is located? Is it an unusual folder?
What is the reputation?
Is it properly signed?
That minimal config can be saved anywhere you want.

The kernel acts as policy enforcement engine (because it is allowed to start early). Anything that passes basic checks can start. Anything that doesn’t is delayed until HIPS and everything else is loaded and working. At that point, everything can be allowed/denied based on your rules.

If it’s not designed this way from the start, then you either have to rewrite the module or you apply a workaround.
The workarounds can be many, the most sensible one was suggested already in earlier posts.

In any case, the “it is too complex for us to do it” is not valid and either way, you will be required to manage your startup and shutdown properly. If it’s too difficult for you to manage your shutdown and startup routines (because the kernel, the serialisation, this, that), you always have the option to stop offering antivirus software.

I'm glad we're finally finding some common ground. You've concluded your post by stating that if the software isn't designed according to your proposal, then "you either have to rewrite the module or you apply a workaround."

This is the exact point I have been making all along. Where we seem to differ is in how we reached this conclusion. My "tangents," as you call them, were a necessary and detailed forensic analysis of the current, existing system to identify the specific point of failure -> the race condition during shutdown. That analysis wasn't a tangent -> it was the evidence proving why the design is flawed.

You, on the other hand, have described an excellent high-level design for a new, better system.

Operating rules in memory and dumping them to a file. Using a minimal kernel component for early-boot policy enforcement. Delaying the full, complex HIPS ruleset until the user-mode services are safely loaded.

This is a textbook example of a solid, modern security architecture. It's also a fundamental re-architecture of the existing module. The solution you've outlined is the "complete source code rewrite" I've been arguing for. It involves changing the core logic of how the kernel and user-mode components interact, how configuration is managed, and how the startup/shutdown sequences are handled.

So, you haven't been "saying this from the beginning." You've just arrived at the same destination. Your own proposed solution validates my initial thesis, a simple patch is insufficient. The problem is architectural, and the only proper fix is the kind of comprehensive redesign you just described.
 
Last edited: