Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions src/agentlab/agents/dynamic_prompting.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,7 @@ def __init__(self, obs, flags: ObsFlags) -> None:
visible=lambda: flags.use_html,
prefix="## ",
)
obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-place mutation of observation dictionary category Functionality

Tell me more
What is the issue?

The code modifies the observation dictionary in-place during object initialization, which could cause side effects if the same observation dictionary is used elsewhere or reused.

Why this matters

This mutation could lead to unexpected behavior in other parts of the system that expect the original unmodified observation data, potentially causing debugging difficulties and inconsistent state.

Suggested change ∙ Feature Preview

Create a copy of the observation or modify the axtree_txt after extraction:

obs = copy(obs)  # Add this line before modification
obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"])

Or modify it in the AXTree constructor:

self.ax_tree = AXTree(
    remove_ui_patterns(obs["axtree_txt"]),
    visible_elements_only=flags.filter_visible_elements_only,
    visible=lambda: flags.use_ax_tree,
    coord_type=flags.extract_coords,
    visible_tag=flags.extract_visible_tag,
    prefix="## ",
)
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

self.ax_tree = AXTree(
obs["axtree_txt"],
visible_elements_only=flags.filter_visible_elements_only,
Expand Down Expand Up @@ -874,3 +875,40 @@ def obs_mapping(obs: dict):
return obs

return obs_mapping




import re

def remove_ui_patterns(text):
"""
Remove lines containing specific UI patterns for ServiceNow accessibility tree text.

Args:
text (str): The input string containing the accessibility tree

Returns:
str: The cleaned string with lines containing UI patterns removed
"""
Comment on lines +884 to +893
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Purpose in UI Pattern Removal Docstring category Documentation

Tell me more
What is the issue?

The docstring lacks explanation of why these UI patterns need to be removed and what impact it has.

Why this matters

Without understanding the purpose, maintainers may inadvertently modify or remove this filtering which could impact the accessibility tree's usability.

Suggested change ∙ Feature Preview

def remove_ui_patterns(text):
"""
Remove ServiceNow UI management patterns from accessibility tree text to focus on core page content.

Filters out widget management controls that are not relevant for task completion
and would add noise to the accessibility tree representation.

Args:
    text (str): The input string containing the accessibility tree

Returns:
    str: The cleaned string with UI management patterns removed
"""
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.


# Words to look for
words_to_remove = ["Edit Widget", "Edit Widget Preferences", "Close Widget", "Add content"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded UI Pattern List category Readability

Tell me more
What is the issue?

UI patterns to remove are hardcoded as a list within the function, making it difficult to modify or reuse these patterns elsewhere.

Why this matters

If UI patterns need to be updated or reused in other parts of the code, developers would need to modify them in multiple places, increasing maintenance burden and risk of inconsistency.

Suggested change ∙ Feature Preview

Move the UI patterns to a module-level constant or configuration:

UI_PATTERNS_TO_REMOVE = [
    "Edit Widget",
    "Edit Widget Preferences", 
    "Close Widget",
    "Add content"
]

def remove_ui_patterns(text, patterns=UI_PATTERNS_TO_REMOVE):
    ...
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.


# Split text into lines
lines = text.split('\n')

# Keep lines that don't contain any of the words
filtered_lines = []
for line in lines:
should_keep = True
for word in words_to_remove:
if word in line:
should_keep = False
break
if should_keep:
filtered_lines.append(line)

# Join the remaining lines back together
return '\n'.join(filtered_lines)

42 changes: 42 additions & 0 deletions src/agentlab/agents/generic_agent/agent_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,48 @@
add_missparsed_messages=True,
)

# qwen3 default config - same as llama3-70b but with use_think_history=False
FLAGS_QWEN3 = GenericPromptFlags(
obs=dp.ObsFlags(
use_html=False,
use_ax_tree=True,
use_focused_element=True,
use_error_logs=False,
use_history=True,
use_past_error_logs=False,
use_action_history=True,
use_think_history=False, # Qwen3 doesn't include thinking history
use_diff=False,
html_type="pruned_html",
use_screenshot=False,
use_som=False,
extract_visible_tag=True,
extract_clickable_tag=False,
extract_coords="False",
filter_visible_elements_only=False,
),
action=dp.ActionFlags(
action_set=HighLevelActionSetArgs(
subsets=["bid"],
multiaction=False,
),
long_description=False,
individual_examples=True,
),
use_plan=False,
use_criticise=False,
use_thinking=True,
use_memory=False,
use_concrete_example=True,
use_abstract_example=True,
use_hints=True,
enable_chat=False,
max_prompt_tokens=40_000,
be_cautious=True,
extra_instructions=None,
add_missparsed_messages=True,
)
Comment on lines +153 to +192
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing AGENT_QWEN3 configuration category Functionality

Tell me more
What is the issue?

The FLAGS_QWEN3 configuration is defined but no corresponding AGENT_QWEN3 is created to use it.

Why this matters

Without a corresponding agent configuration, the FLAGS_QWEN3 cannot be used by the system, making the configuration effectively dead code that serves no functional purpose.

Suggested change ∙ Feature Preview

Add a corresponding agent configuration after the FLAGS_QWEN3 definition:

AGENT_QWEN3 = GenericAgentArgs(
    chat_model_args=CHAT_MODEL_ARGS_DICT["qwen3_model_key"],  # Replace with actual Qwen3 model key
    flags=FLAGS_QWEN3,
)
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +152 to +192
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration Duplication Instead of Inheritance category Design

Tell me more
What is the issue?

The code duplicates the entire configuration structure from FLAGS_LLAMA3_70B with only one parameter difference (use_think_history=False).

Why this matters

Code duplication increases maintenance burden and risk of inconsistencies when updates are needed. If the base configuration changes, both copies need to be updated.

Suggested change ∙ Feature Preview
# Create Qwen3 flags by inheriting from LLAMA3_70B flags
FLAGS_QWEN3 = FLAGS_LLAMA3_70B.copy()
FLAGS_QWEN3.obs.use_think_history = False  # Override only the different parameter
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.


AGENT_LLAMA3_70B = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["openrouter/meta-llama/llama-3-70b-instruct"],
flags=FLAGS_LLAMA3_70B,
Expand Down
4 changes: 2 additions & 2 deletions src/agentlab/agents/generic_agent/generic_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
class GenericAgentArgs(AgentArgs):
chat_model_args: BaseModelArgs = None
flags: GenericPromptFlags = None
max_retry: int = 4
max_retry: int = 1

def __post_init__(self):
try: # some attributes might be temporarily args.CrossProd for hyperparameter generation
Expand Down Expand Up @@ -77,7 +77,7 @@ def __init__(
self,
chat_model_args: BaseModelArgs,
flags: GenericPromptFlags,
max_retry: int = 4,
max_retry: int = 1,
):

self.chat_llm = chat_model_args.make_model()
Expand Down
71 changes: 70 additions & 1 deletion src/agentlab/llm/chat_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ def __call__(self, messages: list[dict], n_samples: int = 1, temperature: float
tracking.TRACKER.instance(input_tokens, output_tokens, cost)

if n_samples == 1:
res = AIMessage(completion.choices[0].message.content)
res = AIMessage(_extract_thinking_content_from_response(completion))
if self.log_probs:
res["log_probs"] = completion.choices[0].log_probs
return res
Expand Down Expand Up @@ -593,3 +593,72 @@ def make_model(self):
temperature=self.temperature,
max_tokens=self.max_new_tokens,
)




import logging

def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complex Function with Multiple Responsibilities category Design

Tell me more
What is the issue?

The function is too complex and handles multiple responsibilities: message normalization, reasoning extraction, content parsing with multiple formats, and content wrapping.

Why this matters

The high complexity makes the code harder to maintain, test, and modify. Each responsibility could change for different reasons, violating the Single Responsibility Principle.

Suggested change ∙ Feature Preview

Split the function into smaller, focused functions:

def _normalize_message(response):
    message = response.choices[0].message
    return message.to_dict() if not isinstance(message, dict) else message

def _extract_reasoning(message):
    return message.get('reasoning') or message.get('reasoning_content')

def _parse_content(raw_content):
    # Handle string vs list content parsing logic

def _format_response(reasoning_text, msg_text, msg_content, wrap_tag):
    # Handle wrapping and combining the parts
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

"""
Extracts the content from the message, including reasoning if available.
It wraps the reasoning around <think>...</think> for easy identification of reasoning content,
when LLM produces 'text' and 'reasoning' in the same message.
Comment on lines +602 to +606
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing response parameter structure documentation category Documentation

Tell me more
What is the issue?

The docstring lacks information about the expected format/structure of the 'response' parameter.

Why this matters

Without clear documentation of the expected response structure, developers may pass incorrectly formatted responses leading to runtime errors.

Suggested change ∙ Feature Preview

def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):
"""
Extracts the content from the message, including reasoning if available.
It wraps the reasoning around ... for easy identification of reasoning content,
when LLM produces 'text' and 'reasoning' in the same message.

Args:
    response: OpenAI-style completion response object with choices[0].message containing
             either a dict or object with 'content', 'text', 'reasoning', or 'reasoning_content' fields.
    wrap_tag: The tag name to wrap reasoning content (default: "think").

Returns:
    str: The extracted content with reasoning wrapped in specified tags.
"""
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.


Args:
response: The message object or dict containing content and reasoning.
wrap_tag: The tag name to wrap reasoning content (default: "think").

Returns:
str: The extracted content with reasoning wrapped in specified tags.
"""
# Normalize to dict
message = response.choices[0].message
if not isinstance(message, dict):
message = message.to_dict()

# --- Extract reasoning from either `reasoning` or `reasoning_content` (and optional metadata) ---
reasoning_text = (
message.get("reasoning")
or message.get("reasoning_content")
)

# --- Extract surface text/content (keeps your original behavior, but handles list-style `content`) ---
msg_text = message.get("text", "") # works for OpenRouter
raw_content = message.get("content", "")

if isinstance(raw_content, list):
# Concatenate text-like parts if provider returns content blocks
parts = []
for part in raw_content:
if isinstance(part, str):
parts.append(part)
elif isinstance(part, dict):
# Common shapes: {"type":"text","text":"..."} or {"type":"text","text":{"value":"..."}}
if part.get("type") == "text":
txt = part.get("text")
if isinstance(txt, dict) and isinstance(txt.get("value"), str):
parts.append(txt["value"])
elif isinstance(txt, str):
parts.append(txt)
else:
# Fallback: try a few likely keys
for k in ("content", "text", "value"):
v = part.get(k)
if isinstance(v, str):
parts.append(v)
break
else:
parts.append(str(part))
msg_content = "\n".join(p for p in parts if p)
Comment on lines +630 to +653
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient nested iteration in content processing category Performance

Tell me more
What is the issue?

Inefficient nested loops and repeated dictionary lookups when processing list-style content.

Why this matters

The nested loops with multiple dictionary get() calls and string type checks create O(n*m) complexity where n is the number of parts and m is the number of fallback keys. This will cause performance degradation when processing large content lists.

Suggested change ∙ Feature Preview

Extract the text extraction logic into a separate function and use early returns to avoid nested loops:

def _extract_text_from_part(part):
    if isinstance(part, str):
        return part
    if isinstance(part, dict):
        if part.get("type") == "text":
            txt = part.get("text")
            if isinstance(txt, dict):
                return txt.get("value", "")
            return txt or ""
        # Single loop for fallback keys
        for key in ("content", "text", "value"):
            value = part.get(key)
            if isinstance(value, str):
                return value
    return str(part)

# Then use: parts = [_extract_text_from_part(part) for part in raw_content]
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

else:
msg_content = raw_content if isinstance(raw_content, str) else str(raw_content or "")

# --- Wrap reasoning if present ---
if reasoning_text:
reasoning_wrapped = f"<{wrap_tag}>{reasoning_text}</{wrap_tag}>\n" if wrap_tag else (reasoning_text + "\n")
logging.debug("Extracting content from response.choices[i].message.(reasoning|reasoning_content)")
else:
reasoning_wrapped = ""

return f"{reasoning_wrapped}{msg_text}{msg_content}"
Loading