Skip to content

Conversation

@kbenoit
Copy link
Contributor

@kbenoit kbenoit commented Aug 26, 2025

Problem

AWS Bedrock occasionally returns duplicate identical JSON objects during structured data extraction, causing
extract_data() to fail with the error:

  Data extraction failed: 2 data results received.

This causes parallel_chat_structured() to fail completely when processing batches of prompts.

Evidence

See also the reprex based on my own data, in #732 (comment).

Captured failure case from real-world usage (Spanish political manifesto analysis), before I simplified the type_object by flattening it, as in the reprex linked above:

# The Turn object contains two identical ContentJson objects:
turn@contents:
  [[1]] <ellmer::ContentJson>
  @value: List of 9
$ Country_LLM         : chr "Spain"
$ Party_LLM           : chr "Partido Popular"
$ is_populist         : logi FALSE
$ is_populist_evidence: chr "The party does not consistently frame politics in populist terms..."
$ antielitism         :List of 2
..$ evidence: chr "The party criticizes the current government's policy of 'cesiones humillantes'..."
..$ score   : int 4
$ generalwill         :List of 2
..$ evidence: chr "The party emphasizes the importance of unity and the general will of the people..."
..$ score   : int 5
$ indivisible         :List of 2
..$ evidence: chr "The party portrays Spain as a single nation with a common identity..."
..$ score   : int 6
$ manichean           :List of 2
..$ evidence: chr "The party criticizes the current government's policies and actions..."
..$ score   : int 3
$ peoplecentrism      :List of 2
..$ evidence: chr "The party emphasizes the importance of listening to the people..."
..$ score   : int 5

[[2]] <ellmer::ContentJson>
  @value: List of 9

# IDENTICAL structure and values as [[1]]
# Verification: identical(json1, json2) == TRUE

Scale: Captured 11 failure cases vs 3 success cases during batch processing, indicating this is a significant
intermittent issue with Bedrock's tool calling mechanism.

Solution

Updated extract_data() in R/chat-structured.R to gracefully handle multiple JSON responses:

  if (n == 2) {
    # Check if the two JSON objects are identical (duplicate case)
    if (identical(val1, val2)) {
      warning("Found duplicate JSON responses, using the first one", call. = FALSE)
      out <- val1
    } else {
      # Different JSON objects - use the last one (likely the final response)  
      warning("Found multiple different JSON responses, using the last one", call. = FALSE)
      out <- val2
    }
  }

Tests

Added comprehensive tests in test-chat-structured.R:

  • Handle duplicate identical JSON responses with warning
  • Handle different JSON responses (use last one)
  • Include prompt index in warning messages for parallel operations
  • Error appropriately on >2 JSON responses

Impact

Before:

# This would fail completely
parallel_chat_structured(chat, prompts, type = my_type)
#> Error: Data extraction failed: 2 data results received.

After:

# This now succeeds with warning
result <- parallel_chat_structured(chat, prompts, type = my_type)
#> Warning: Found duplicate JSON responses, using the first one (prompt 43).
#> ... successful extraction continues

This ensures robust structured data extraction for production use, even when providers exhibit intermittent duplication behaviour.

Interactions with other (current) PRs

This works best when combined with #725. #705 does not fix this issue, but if the other two PRs are accepted, I will redo #705 and integrate them into a single, more robust parallel_chat_structured().

kbenoit and others added 5 commits August 26, 2025 11:44
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@hadley
Copy link
Member

hadley commented Oct 16, 2025

I think a more general approach here would be to just warn if there are more than two ContentJsons and then just use the first one. I think the original code existed out of an abundance of caution, and I don't have any evidence that it's actually important. Do you want to have a go at implementing that? I'm also happy to do it if you don't have the time.

@kbenoit
Copy link
Contributor Author

kbenoit commented Oct 17, 2025

I'm happy to have a go at this, once the other parallel error handling PRs have been merged into main, and I can update this branch.

@kbenoit
Copy link
Contributor Author

kbenoit commented Oct 19, 2025

I think a more general approach here would be to just warn if there are more than two ContentJsons and then just use the first one. I think the original code existed out of an abundance of caution, and I don't have any evidence that it's actually important. Do you want to have a go at implementing that? I'm also happy to do it if you don't have the time.

Implemented in bc75491 (although there was something to be said for the more informative version for handling duplicated pairs or n > 2)

@hadley
Copy link
Member

hadley commented Oct 20, 2025

I've simplified your implementation back to the bare minimum — I think it's better to start there and get more complicated as we need it as I don't think we should let buggy implementations overly guide the design of ellmer. If you do think informing the user about exactly which prompt had the problem, I think it would be better to attack the problem by wrapping the conditions, like this:

withCallingHandlers(
  warn("hi"),
  warning = function(w) {
    warn("bye", parent = w)
    invokeRestart("muffleWarning")
  }
)

That would also be helpful for errors too (assuming that it's actually the prompt that's causing the problem).

@hadley hadley merged commit 2e34298 into tidyverse:main Oct 20, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants