Skip to content

Conversation

spoiicy
Copy link
Member

@spoiicy spoiicy commented Jul 24, 2025

Closes #2738

Description

This PR aims to update the Flare Capa and Flare Floss analyzers, by refactoring the exisiting implementation and utilizing the pip packages for better support.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist

  • I have read and understood the rules about how to Contribute to this project
  • The pull request is for the branch develop
  • A new plugin (analyzer, connector, visualizer, playbook, pivot or ingestor) was added or changed, in which case:
    • I strictly followed the documentation "How to create a Plugin"
    • Usage file was updated. A link to the PR to the docs repo has been added as a comment here.
    • Advanced-Usage was updated (in case the plugin provides additional optional configuration). A link to the PR to the docs repo has been added as a comment here.
    • I have dumped the configuration from Django Admin using the dumpplugin command and added it in the project as a data migration. ("How to share a plugin with the community")
    • If a File analyzer was added and it supports a mimetype which is not already supported, you added a sample of that type inside the archive test_files.zip and you added the default tests for that mimetype in test_classes.py.
    • If you created a new analyzer and it is free (does not require any API key), please add it in the FREE_TO_USE_ANALYZERS playbook by following this guide.
    • Check if it could make sense to add that analyzer/connector to other freely available playbooks.
    • I have provided the resulting raw JSON of a finished analysis and a screenshot of the results.
    • If the plugin interacts with an external service, I have created an attribute called precisely url that contains this information. This is required for Health Checks (HEAD HTTP requests).
    • If the plugin requires mocked testing, _monkeypatch() was used in its class to apply the necessary decorators.
    • I have added that raw JSON sample to the MockUpResponse of the _monkeypatch() method. This serves us to provide a valid sample for testing.
    • I have created the corresponding DataModel for the new analyzer following the documentation
  • I have inserted the copyright banner at the start of the file: # This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl # See the file 'LICENSE' for copying permission.
  • Please avoid adding new libraries as requirements whenever it is possible. Use new libraries only if strictly needed to solve the issue you are working for. In case of doubt, ask a maintainer permission to use a specific library.
  • If external libraries/packages with restrictive licenses were added, they were added in the Legal Notice section.
  • Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved (see tests folder). All the tests (new and old ones) gave 0 errors.
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.
  • After you had submitted the PR, if DeepSource, Django Doctors or other third-party linters have triggered any alerts during the CI checks, I have solved those alerts.

Important Rules

  • If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
  • Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review by using GitHub's reviewing system detailed here.

@spoiicy
Copy link
Member Author

spoiicy commented Jul 24, 2025

Capa Results

image

Floss Results

{
  "report": {
    "strings": {
      "stack_strings": [],
      "tight_strings": [],
      "static_strings": [],
      "decoded_strings": [],
      "language_strings": [],
      "language_strings_missed": []
    },
    "analysis": {
      "functions": {
        "library": 23,
        "discovered": 153,
        "analyzed_stack_strings": 104,
        "analyzed_tight_strings": 0,
        "analyzed_decoded_strings": 20,
        "decoding_function_scores": {
          "4198416": {
            "score": 0.6,
            "xrefs_to": 1
          },
          "4198640": {
            "score": 0.6,
            "xrefs_to": 1
          },
          "4198704": {
            "score": 0.6,
            "xrefs_to": 1
          },
          "4198848": {
            "score": 0.6,
            "xrefs_to": 1
          },
          "4203136": {
            "score": 0.607,
            "xrefs_to": 2
          },
          "4203440": {
            "score": 0.682,
            "xrefs_to": 10
          },
          "4203984": {
            "score": 0.571,
            "xrefs_to": 4
          },
          "4204592": {
            "score": 0.901,
            "xrefs_to": 72
          },
          "4204960": {
            "score": 0.714,
            "xrefs_to": 22
          },
          "4205664": {
            "score": 0.771,
            "xrefs_to": 2
          },
          "4206107": {
            "score": 0.693,
            "xrefs_to": 2
          },
          "4206162": {
            "score": 0.796,
            "xrefs_to": 25
          },
          "4206224": {
            "score": 0.7,
            "xrefs_to": 84
          },
          "4206869": {
            "score": 0.693,
            "xrefs_to": 2
          },
          "4207819": {
            "score": 0.732,
            "xrefs_to": 2
          },
          "4208097": {
            "score": 0.579,
            "xrefs_to": 4
          },
          "4208772": {
            "score": 0.85,
            "xrefs_to": 2
          },
          "4209782": {
            "score": 0.656,
            "xrefs_to": 2
          },
          "4210331": {
            "score": 0.745,
            "xrefs_to": 1
          },
          "4210406": {
            "score": 0.745,
            "xrefs_to": 1
          }
        }
      },
      "enable_stack_strings": true,
      "enable_tight_strings": true,
      "enable_static_strings": false,
      "enable_decoded_strings": true
    },
    "metadata": {
      "runtime": {
        "total": 6.5962,
        "vivisect": 3.2142,
        "start_date": "2025-07-22T04:05:41.936423Z",
        "find_features": 0.2296,
        "stack_strings": 0.2385,
        "tight_strings": 0.0003,
        "static_strings": 0,
        "decoded_strings": 2.6583,
        "language_strings": 0
      },
      "version": "3.1.1",
      "language": "unknown",
      "file_path": "/opt/deploy/files_required/06ebf06587b38784e2af42dd5fbe56e5",
      "imagebase": 4194304,
      "min_length": 4,
      "language_version": "version unknown",
      "language_selected": ""
    },
    "exceeded_max_number_of_strings": {}
  },
  "data_model": null,
  "errors": [],
  "parameters": {
    "rank_strings": {
      "stack_strings": true,
      "static_strings": false,
      "decoded_strings": false
    },
    "max_no_of_strings": {
      "stack_strings": 1000,
      "static_strings": 1000,
      "decoded_strings": 1000
    }
  }
}

@spoiicy
Copy link
Member Author

spoiicy commented Jul 24, 2025

Couple of points to note with this PR.

  1. flare-capa pip package doesn't come with capa-rules and signatures by default unlike the binary file, so the update method dynamically pulls the latest versions for both.
  2. flare-capa usually takes couple of seconds to analyze the given executable though when specifically tested on main.out test file from tests.zip, it takes enormous time to analyze (around 1000 seconds).
    As a result, soft_time_limit parameter has been updated to 1800 seconds and a new timeout parameter has been added which can be set according to user's requirement.

@spoiicy spoiicy marked this pull request as ready for review July 24, 2025 20:09
@spoiicy spoiicy requested a review from fgibertoni July 31, 2025 13:50
@fgibertoni fgibertoni requested a review from drosetti August 1, 2025 14:35
@drosetti
Copy link
Contributor

drosetti commented Aug 1, 2025

The code looks good to me, however I tried to replicate your analysis (main.out sample) but I had some problems with the updates.

Trying from the plugin pages I got an error message. In the logs there are these:

2025-08-01 16:07:22,183 - api_app.analyzers_manager.file_analyzers.capa_info - _unzip - INFO - Rules have been succesfully extracted
2025-08-01 16:07:22,184 - api_app.analyzers_manager.file_analyzers.capa_info - _download_signatures - INFO - Downloading signatures at /opt/deploy/files_required/capa/sigs now
2025-08-01 16:07:22,526 - api_app.analyzers_manager.file_analyzers.capa_info - _download_signatures - ERROR - Failed to download signature: Command '['/usr/bin/wget', '-O', '/opt/deploy/files_required/capa/sigs', 'https://raw.githubusercontent.com/mandiant/capa/master/sigs/1_flare_msvc_rtf_32_64.sig']' returned non-zero exit status 1.
2025-08-01 16:07:22,526 - api_app.analyzers_manager.file_analyzers.capa_info - update - ERROR - Failed to update capa rules with error: Failed to update signatures due to error: None

I tried to replicate the wget command from my command line and seems to work, am I missing something ?

@spoiicy spoiicy force-pushed the floss_capa_refactor branch from 9400e1c to ca4d856 Compare August 1, 2025 17:05
@spoiicy
Copy link
Member Author

spoiicy commented Aug 1, 2025

Hi @drosetti I've made some fixes. It should work now.

Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!


for signature in signatures_list:
try:
subprocess.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be extra sure, I'd like to make shlex handle splitting and quoting of parameters before passing them into run()

@drosetti
Copy link
Contributor

drosetti commented Aug 5, 2025

@spoiicy now I can download correctly the rules, but when I run the analysis I get an error:
Analyzer for main.out failed with error: ERROR capa: [Errno 13] Permission denied: main.py:678\n '/opt/deploy/intel_owl/.cache/capa

capture_output=True,
)

except subprocess.CalledProcessError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I would like to prefer avoid using subprocess for a tool (wget) that performs HTTPS connections that can be handled with classic HTTP libraries. Can you change this?

@spoiicy spoiicy requested a review from mlodic August 23, 2025 10:01
Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update migration number since we merged the other PR. Then we should be good 😃

@spoiicy spoiicy force-pushed the floss_capa_refactor branch from 767c04d to 918965d Compare August 24, 2025 17:41
Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job! I asked to do additional work regarding the management of the rules which is critical in terms of performance and actual effectiveness of this complex analyzer. Please let me know if you have any doubts about this.

AnalyzerConfig = apps.get_model("analyzers_manager", "AnalyzerConfig")

pm = PythonModule.objects.get(
module="capa_info.CapaInfo",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I remember correctly there is also a param called "docker_based" that should be set to False in the migration to both the analyzers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:

response = requests.get(file_url, stream=True)
logger.info(f"Started downloading rules from {file_url}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

logger.error(f"Failed to download rules with error: {e}")
raise AnalyzerRunException("Failed to download rules")

logger.info(f"Rules have been successfully downloaded at {RULES_LOCATION}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@classmethod
def _download_signatures(cls) -> None:
logger.info(f"Downloading signatures at {SIGNATURE_LOCATION} now")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "signatures" are less important than the rules ones. These are almost never updated while the "rules" are are updated often. Plus, most of the time, we don't want these signatures to execute either cause it would slow the Capa execution. The rules are always necessary because they are the core part of the tool while these one could not be necessary. Because of that, I would not re-update them once they are here, like you already do. But we need another additional parameter for the user to enable them explicitly otherwise it would be better if these signatures would be disabled by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, most of the time, we don't want these signatures to execute either cause it would slow the Capa execution.

Regarding your point, Actually I've tried executing the flare-capa without the signatures but it threw an error, when only executed with rules. So, I feel the signatures are necessary for it's execution.

Though, I can definitely make changes in the code that the signatures are only downloaded once or updated on-demand by the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the changes such that signatures are only downloaded the first or whenever force_pull_signatures is set to True.

@mlodic mlodic mentioned this pull request Aug 25, 2025
23 tasks
@spoiicy spoiicy force-pushed the floss_capa_refactor branch from 751159c to c57d234 Compare September 5, 2025 21:32
Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Worth considering though. View full project report here.



class AnalyzerRulesFileVersion(models.Model):
last_downloaded_version = models.CharField(max_length=50, blank=True, null=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
last_downloaded_version = models.CharField(max_length=50, blank=True, null=True)
last_downloaded_version = models.CharField(max_length=50, blank=True, default="")

null=True on a string field causes inconsistent data types because the value can be either str or None. This adds complexity and maybe bugs, but can be solved by replacing null=True with default="". More info.


class AnalyzerRulesFileVersion(models.Model):
last_downloaded_version = models.CharField(max_length=50, blank=True, null=True)
download_url = models.URLField(max_length=200, blank=True, null=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
download_url = models.URLField(max_length=200, blank=True, null=True)
download_url = models.URLField(max_length=200, blank=True, default="")

Similarly, consider replacing null=True with default="" (and blank=True to pass validation checks).

@spoiicy spoiicy force-pushed the floss_capa_refactor branch from ddeecc4 to 045bc30 Compare September 6, 2025 03:11
@spoiicy spoiicy requested review from mlodic and fgibertoni September 7, 2025 06:01
Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greatly improved! I'll let @mlodic take a look at this to see if it's how he expected to be 😃

p1 = Parameter(
name="timeout",
type="float",
description="Duration in seconds for which intelowl waits for capa to return results. Default is set to 15 seconds.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something, but have you set the default of the new parameter?

Copy link
Member Author

@spoiicy spoiicy Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I've not set the default for this one. No worries I'll update and add it. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants