Skip to content

feat: add support for GitHubRepoForkerTool #1968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c3b4cfd
feat: add support for GitHubRepoForkerTool
srini047 Jun 17, 2025
549a548
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jun 17, 2025
efc1d8e
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jun 22, 2025
a28727e
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi Jun 23, 2025
73859a2
fix: remove extra params
srini047 Jun 28, 2025
46652f8
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jun 28, 2025
f8d7e2d
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi Jun 30, 2025
716584a
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jul 2, 2025
d293f27
fix: revert token check
srini047 Jul 2, 2025
cd5ff1d
fix: typing issues
srini047 Jul 2, 2025
0067644
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jul 4, 2025
47a1c1f
fix: test issue
srini047 Jul 12, 2025
4cd1b86
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jul 12, 2025
623f566
fix: revert as per comments
srini047 Jul 19, 2025
9460d55
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jul 19, 2025
e5cb0f6
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Jul 24, 2025
cbf001d
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Aug 1, 2025
651c304
Update integrations/github/src/haystack_integrations/components/conne…
sjrl Aug 7, 2025
38ac22e
Update integrations/github/src/haystack_integrations/tools/github/rep…
sjrl Aug 7, 2025
625dcb6
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Aug 8, 2025
f238483
Merge branch 'main' into github_repo_forker_tool_integration
srini047 Aug 15, 2025
5b975c3
fix: test failures
srini047 Aug 15, 2025
78cf9d2
fix: formatting issue
srini047 Aug 15, 2025
df75847
Merge branch 'main' into github_repo_forker_tool_integration
mpangrazzi Aug 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ def __init__(
:param auto_sync: If True, syncs fork with original repository if it already exists
:param create_branch: If True, creates a fix branch based on the issue number
"""
error_message = "github_token must be a Secret"
if not isinstance(github_token, Secret):
raise TypeError(error_message)
msg = "github_token must be a Secret"
raise TypeError(msg)

self.github_token = github_token
self.raise_on_failure = raise_on_failure
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from .file_editor_prompt import FILE_EDITOR_PROMPT, FILE_EDITOR_SCHEMA
from .issue_commenter_prompt import ISSUE_COMMENTER_PROMPT, ISSUE_COMMENTER_SCHEMA
from .pr_creator_prompt import PR_CREATOR_PROMPT, PR_CREATOR_SCHEMA
from .repo_forker_prompt import REPO_FORKER_PROMPT, REPO_FORKER_SCHEMA
from .repo_viewer_prompt import REPO_VIEWER_PROMPT, REPO_VIEWER_SCHEMA
from .system_prompt import SYSTEM_PROMPT

Expand All @@ -16,6 +17,8 @@
"ISSUE_COMMENTER_SCHEMA",
"PR_CREATOR_PROMPT",
"PR_CREATOR_SCHEMA",
"REPO_FORKER_PROMPT",
"REPO_FORKER_SCHEMA",
"REPO_VIEWER_PROMPT",
"REPO_VIEWER_SCHEMA",
"SYSTEM_PROMPT",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]>
#
# SPDX-License-Identifier: Apache-2.0
REPO_FORKER_PROMPT = """Haystack-Agent uses this tool to fork GitHub repositories in order to contribute to issues.
Haystack-Agent initiates a fork so it can freely make changes for contributions.
A fork is required to open a pull request to the upstream repository.
Haystack-Agent works by forking the repository associated with a given issue.

<usage>
Pass a `url` string for the GitHub issue you want to work on in a fork.
It is REQUIRED to pass `url` to use this tool.
The structure must be "https://github.com/<repo-owner>/<repo-name>/issues/<issue-number>".

Examples:

- {"url": "https://github.com/deepset-ai/haystack/issues/9343"}
- will fork the "deepset-ai/haystack" repository to work on issue 9343
- {"url": "https://github.com/deepset-ai/haystack-core-integrations/issues/1685"}
- will fork the "deepset-ai/haystack-core-integrations" repository to work on issue 1685
</usage>

Haystack-Agent uses the `repo_forker` tool to create a copy (fork) of the target repository into its own account.
Haystack-Agent ensures the issue URL is valid and points to a real GitHub issue.
It parses the URL to identify the correct repository.

<thinking>
- Does this issue belong to the repository I need to work on?
- Can I extract the owner and repository name from the URL?
- Why am I forking this repository? (e.g., to implement a fix, to add a feature)
- Is there anything special about the branch or base state I should be aware of?
</thinking>

Haystack-Agent reflects on the results after forking:
<thinking>
- Did the fork succeed? Is the fork visible in my account?
- Can I access, clone, and push to my fork?
- Are there any permissions or fork-specific settings to configure before proceeding?
- Which branch will I be working on in the fork?
</thinking>

IMPORTANT
Haystack-Agent ONLY forks the repository mentioned in the given issue URL.
Haystack-Agent does NOT attempt to fork organizations, user profiles, or non-issue URLs.
Haystack-Agent knows that forking is a prerequisite to contributing changes and creating pull requests.

Haystack-Agent takes notes after the fork:
<scratchpad>
- Record the URL of the forked repository
- Note the original issue being worked on
- Document any post-fork steps (e.g., git cloning, installing dependencies)
- Make note of any errors or special setup requirements
</scratchpad>
"""

REPO_FORKER_SCHEMA = {
"properties": {
"url": {"type": "string", "description": "URL of the GitHub issue to work on in the fork."},
},
"required": ["url"],
"type": "object",
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@
from .issue_commenter_tool import GitHubIssueCommenterTool
from .issue_viewer_tool import GitHubIssueViewerTool
from .pr_creator_tool import GitHubPRCreatorTool
from .repo_forker_tool import GitHubRepoForkerTool
from .repo_viewer_tool import GitHubRepoViewerTool

__all__ = [
"GitHubFileEditorTool",
"GitHubIssueCommenterTool",
"GitHubIssueViewerTool",
"GitHubPRCreatorTool",
"GitHubRepoForkerTool",
"GitHubRepoViewerTool",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]>
#
# SPDX-License-Identifier: Apache-2.0
from typing import Any, Callable, Dict, Optional, Union

from haystack.core.serialization import generate_qualified_class_name
from haystack.tools import ComponentTool
from haystack.utils import Secret, deserialize_secrets_inplace

from haystack_integrations.components.connectors.github.repo_forker import GitHubRepoForker
from haystack_integrations.prompts.github.repo_forker_prompt import REPO_FORKER_PROMPT, REPO_FORKER_SCHEMA
from haystack_integrations.tools.github.utils import deserialize_handlers, serialize_handlers


class GitHubRepoForkerTool(ComponentTool):
"""
A tool for forking Github repository.
"""

def __init__(
self,
*,
name: Optional[str] = "repo_forker",
description: Optional[str] = REPO_FORKER_PROMPT,
parameters: Optional[Dict[str, Any]] = REPO_FORKER_SCHEMA,
github_token: Secret = Secret.from_env_var("GITHUB_TOKEN"),
raise_on_failure: bool = True,
outputs_to_string: Optional[Dict[str, Union[str, Callable[[Any], str]]]] = None,
inputs_from_state: Optional[Dict[str, str]] = None,
outputs_to_state: Optional[Dict[str, Dict[str, Union[str, Callable]]]] = None,
):
"""
Initialize the GitHub Repo Forker tool.

:param name: Optional name for the tool.
:param description: Optional description.
:param parameters: Optional JSON schema defining the parameters expected by the Tool.
:param github_token: GitHub personal access token for API authentication
:param raise_on_failure: If True, raises exceptions on API errors
:param outputs_to_string:
Optional dictionary defining how a tool outputs should be converted into a string.
If the source is provided only the specified output key is sent to the handler.
If the source is omitted the whole tool result is sent to the handler.
Example: {
"source": "docs", "handler": format_documents
}
:param inputs_from_state:
Optional dictionary mapping state keys to tool parameter names.
Example: {"repository": "repo"} maps state's "repository" to tool's "repo" parameter.
:param outputs_to_state:
Optional dictionary defining how tool outputs map to keys within state as well as optional handlers.
If the source is provided only the specified output key is sent to the handler.
Example: {
"documents": {"source": "docs", "handler": custom_handler}
}
If the source is omitted the whole tool result is sent to the handler.
Example: {
"documents": {"handler": custom_handler}
}
"""
self.github_token = github_token
self.raise_on_failure = raise_on_failure
self.outputs_to_string = outputs_to_string
self.inputs_from_state = inputs_from_state
self.outputs_to_state = outputs_to_state

repo_forker = GitHubRepoForker(
github_token=github_token,
raise_on_failure=raise_on_failure,
)

super().__init__(
component=repo_forker,
name=name,
description=description,
parameters=parameters,
outputs_to_string=self.outputs_to_string,
inputs_from_state=self.inputs_from_state,
outputs_to_state=self.outputs_to_state,
)

def to_dict(self) -> Dict[str, Any]:
"""
Serializes the tool to a dictionary.

Returns:
Dictionary with serialized data.
"""
serialized = {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
"github_token": self.github_token.to_dict() if self.github_token else None,
"raise_on_failure": self.raise_on_failure,
"outputs_to_string": self.outputs_to_string,
"inputs_from_state": self.inputs_from_state,
"outputs_to_state": self.outputs_to_state,
}

serialize_handlers(serialized, self.outputs_to_state, self.outputs_to_string)
return {"type": generate_qualified_class_name(type(self)), "data": serialized}

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "GitHubRepoForkerTool":
"""
Deserializes the tool from a dictionary.

:param data:
Dictionary to deserialize from.
:returns:
Deserialized tool.
"""
inner_data = data["data"]
deserialize_secrets_inplace(inner_data, keys=["github_token"])
deserialize_handlers(inner_data)
return cls(**inner_data)
2 changes: 1 addition & 1 deletion integrations/github/tests/test_repo_forker.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ class TestGitHubRepoForker:
def test_init_default(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")

forker = GitHubRepoForker()
forker = GitHubRepoForker(github_token=Secret.from_env_var("GITHUB_TOKEN"))
assert forker.github_token is not None
assert forker.github_token.resolve_value() == "test-token"
assert forker.raise_on_failure is True
Expand Down
128 changes: 128 additions & 0 deletions integrations/github/tests/test_repo_forker_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]>
#
# SPDX-License-Identifier: Apache-2.0
from haystack.utils import Secret

from haystack_integrations.prompts.github.repo_forker_prompt import REPO_FORKER_PROMPT, REPO_FORKER_SCHEMA
from haystack_integrations.tools.github.repo_forker_tool import GitHubRepoForkerTool
from haystack_integrations.tools.github.utils import message_handler


class TestGitHubRepoForkerTool:
def test_init(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")

tool = GitHubRepoForkerTool()
assert tool.name == "repo_forker"
assert tool.description == REPO_FORKER_PROMPT
assert tool.parameters == REPO_FORKER_SCHEMA
assert tool.github_token == Secret.from_env_var("GITHUB_TOKEN")
assert tool.raise_on_failure is True
assert tool.outputs_to_string is None
assert tool.inputs_from_state is None
assert tool.outputs_to_state is None

def test_from_dict(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")
tool_dict = {
"type": "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool",
"data": {
"name": "repo_forker",
"description": REPO_FORKER_PROMPT,
"parameters": REPO_FORKER_SCHEMA,
"github_token": {"env_vars": ["GITHUB_TOKEN"], "strict": True, "type": "env_var"},
"raise_on_failure": True,
"outputs_to_string": None,
"inputs_from_state": None,
"outputs_to_state": None,
},
}
tool = GitHubRepoForkerTool.from_dict(tool_dict)
assert tool.name == "repo_forker"
assert tool.description == REPO_FORKER_PROMPT
assert tool.parameters == REPO_FORKER_SCHEMA
assert tool.github_token == Secret.from_env_var("GITHUB_TOKEN")
assert tool.raise_on_failure is True
assert tool.outputs_to_string is None
assert tool.inputs_from_state is None
assert tool.outputs_to_state is None

def test_to_dict(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")
tool = GitHubRepoForkerTool()
tool_dict = tool.to_dict()
assert tool_dict["type"] == "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool"
assert tool_dict["data"]["name"] == "repo_forker"
assert tool_dict["data"]["description"] == REPO_FORKER_PROMPT
assert tool_dict["data"]["parameters"] == REPO_FORKER_SCHEMA
assert tool_dict["data"]["github_token"] == {
"env_vars": ["GITHUB_TOKEN"],
"strict": True,
"type": "env_var",
}
assert tool_dict["data"]["raise_on_failure"] is True
assert tool_dict["data"]["outputs_to_string"] is None
assert tool_dict["data"]["inputs_from_state"] is None
assert tool_dict["data"]["outputs_to_state"] is None

def test_to_dict_with_extra_params(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")
tool = GitHubRepoForkerTool(
github_token=Secret.from_env_var("GITHUB_TOKEN"),
raise_on_failure=False,
outputs_to_string={"source": "docs", "handler": message_handler},
inputs_from_state={"repository": "repo"},
outputs_to_state={"documents": {"source": "docs", "handler": message_handler}},
)
tool_dict = tool.to_dict()
assert tool_dict["type"] == "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool"
assert tool_dict["data"]["name"] == "repo_forker"
assert tool_dict["data"]["description"] == REPO_FORKER_PROMPT
assert tool_dict["data"]["parameters"] == REPO_FORKER_SCHEMA
assert tool_dict["data"]["github_token"] == {
"env_vars": ["GITHUB_TOKEN"],
"strict": True,
"type": "env_var",
}
assert tool_dict["data"]["raise_on_failure"] is False
assert (
tool_dict["data"]["outputs_to_string"]["handler"]
== "haystack_integrations.tools.github.utils.message_handler"
)
assert tool_dict["data"]["inputs_from_state"] == {"repository": "repo"}
assert tool_dict["data"]["outputs_to_state"]["documents"]["source"] == "docs"
assert (
tool_dict["data"]["outputs_to_state"]["documents"]["handler"]
== "haystack_integrations.tools.github.utils.message_handler"
)

def test_from_dict_with_extra_params(self, monkeypatch):
monkeypatch.setenv("GITHUB_TOKEN", "test-token")
tool_dict = {
"type": "haystack_integrations.tools.github.repo_forker_tool.GitHubRepoForkerTool",
"data": {
"name": "repo_forker",
"description": REPO_FORKER_PROMPT,
"parameters": REPO_FORKER_SCHEMA,
"github_token": {"env_vars": ["GITHUB_TOKEN"], "strict": True, "type": "env_var"},
"raise_on_failure": False,
"outputs_to_string": {"handler": "haystack_integrations.tools.github.utils.message_handler"},
"inputs_from_state": {"repository": "repo"},
"outputs_to_state": {
"documents": {
"source": "docs",
"handler": "haystack_integrations.tools.github.utils.message_handler",
}
},
},
}
tool = GitHubRepoForkerTool.from_dict(tool_dict)
assert tool.name == "repo_forker"
assert tool.description == REPO_FORKER_PROMPT
assert tool.parameters == REPO_FORKER_SCHEMA
assert tool.github_token == Secret.from_env_var("GITHUB_TOKEN")
assert tool.raise_on_failure is False
assert tool.outputs_to_string["handler"] == message_handler
assert tool.inputs_from_state == {"repository": "repo"}
assert tool.outputs_to_state["documents"]["source"] == "docs"
assert tool.outputs_to_state["documents"]["handler"] == message_handler
Loading