Skip to content

Conversation

@mike-hunhoff
Copy link
Collaborator

Vibe coding a switch to PyGhidra for the Ghidra backend. PyGhidra is Ghidra's native Python 3 support, so we should use it.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mike-hunhoff, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Ghidra integration for capa by switching from the Ghidrathon extension to PyGhidra, Ghidra's official Python 3 binding. This change aims to provide a more robust and streamlined analysis experience, centralizing Ghidra API interactions through a new context management system and simplifying the user workflow by integrating directly into the capa CLI. The update also brings improvements to feature extraction, particularly for forwarded exports, and ensures better resource management.

Highlights

  • Migration to PyGhidra: The Ghidra backend has been migrated from using Ghidrathon-based scripting to PyGhidra, leveraging Ghidra's native Python 3 support for improved integration and stability.
  • Centralized Ghidra Context Management: A new capa/features/extractors/ghidra/context.py module has been introduced to centralize access to Ghidra's program, flat_api, and monitor objects, simplifying Ghidra API calls across the extractor.
  • Simplified User Experience: The Ghidra integration is now directly accessible via the capa command-line interface using capa -b ghidra /path/to/sample, eliminating the need for separate Ghidra scripts (capa_explorer.py and capa_ghidra.py) and their manual execution within Ghidra.
  • Enhanced Export Name Extraction: The file feature extractor now includes logic to detect and properly reformat forwarded exports, providing more accurate and detailed export information.
  • Improved Resource Management: The GhidraFeatureExtractor now manages the PyGhidra context and temporary directories, ensuring proper cleanup of resources after analysis. Test infrastructure has also been updated to handle PyGhidra session cleanup.
  • Ghidra Version Requirement Update: The minimum required Ghidra version for the capa integration has been updated to 11.4 or higher to align with PyGhidra compatibility.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/tests.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring to switch the Ghidra backend to PyGhidra. This change greatly improves the user experience by allowing capa to be run directly from the command line with Ghidra as a backend, rather than requiring scripts to be executed within the Ghidra environment. The changes are comprehensive, covering application logic, testing, and documentation.

I've identified a few areas for minor improvements related to code style and robustness, such as moving local imports to the module level for better clarity, restoring type hints for maintainability, and using more specific exception handling. Overall, this is an excellent contribution.

def __init__(self, ctx_manager=None, tmpdir=None):
self.ctx_manager = ctx_manager
self.tmpdir = tmpdir
import capa.features.extractors.ghidra.helpers as ghidra_helpers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import of ghidra_helpers is local to the __init__ method, which requires other methods in this class to re-import it. To improve code clarity and adhere to standard Python style (PEP 8), this import should be moved to the top of the file.

After moving this import, please remove the redundant local imports in get_base_address (line 62) and get_function (line 93).

return ints_to_bytes(getBytes(addr, length)) # type: ignore [name-defined] # noqa: F821
except RuntimeError:
return ints_to_bytes(get_flat_api().getBytes(addr, int(length)))
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide unexpected errors. The original RuntimeError was likely more specific to exceptions translated from the Java layer. It's better to be as specific as possible with exception handling. Please consider reverting to RuntimeError or catching a more specific set of exceptions if other types are expected.

Suggested change
except Exception:
except RuntimeError:

@github-actions github-actions bot dismissed their stale review December 9, 2025 00:36

CHANGELOG updated or no update needed, thanks! 😄

mike-hunhoff and others added 2 commits December 8, 2025 17:37
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@mike-hunhoff mike-hunhoff marked this pull request as draft December 9, 2025 01:03
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With tests passing (after fixing or xfailing them for now) this looks solid! May want to consult @colton-gabertan as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants