Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) #725

kbenoit · 2025-08-22T03:37:56Z

Solves #684

Root cause: When convert = TRUE and the schema is an array of objects that themselves contain nested objects (e.g., economic and social), then convert_from_type() produces data.frame columns that were themselves data frames. This triggered list2DF() conversion failures and made downstream handling (including tokens/cost) brittle.
Fix: Flatten nested object properties when converting arrays of objects:
- In R/chat-structured.R, convert_from_type(), for TypeArray(TypeObject) with declared properties, build columns for each property across items, then flatten any nested data.frame columns by prefixing with the parent property name before calling list2DF().
- Example: economic.score, economic.evidence, social.score, social.evidence — no nested data frames remain in the final
  data.frame.
Token robustness: In R/parallel-chat.R multi_convert(), coerce token fields to integer with as.integer(turn@tokens) before
aggregation. Some providers (e.g., Gemini) return numeric doubles; this prevents the vapply “values must be type 'integer'”
error under convert = TRUE.
Tests added:
- VCR-backed integration test (tests/testthat/test-parallel-chat-structured.R) using chat_openai_test() and short inline
  prompts validates:
- convert = TRUE returns a flattened data.frame with the expected nested column names and includes input/output/cached_input token columns and cost.
- convert = FALSE returns the raw list with the expected nested structure for comparison.
- Minimal offline unit test in the same file confirms flattening for one and multiple nested objects (no network needed).
A “Gemini-like” unit test constructs assistant Turns with tokens as doubles, exercising the as.integer() coercion and ensuring token columns attach cleanly. I added this because I was getting a different error crashing the conversion to data.frame with chat_google_gemini().
How it fixes #864 and relates to the linked example:
- For the schema the issue describes (top-level object with nested economic and social objects), the function now produces
  a flat, analysis-ready data.frame (e.g., economic.score, social.evidence), addressing the list/data.frame conversion failure
  reported in #864.
- It also should fix the problem reported by @dylanpieper in the PR (partially) Fix/628 robust parallel chat structured #705 [comment](reported in (partially) Fix/628 robust parallel chat structured #705 (comment) and validates the scenario), but as I didn't have access to that data, he will need to confirm.
Practical validation:
- Run the new integration test to see flattened columns and token/cost appended correctly.
- Alternatively, replicate the PR (partially) Fix/628 robust parallel chat structured #705 example locally: define a type_object with nested economic/social dimension objects
  and call parallel_chat_structured(..., convert=TRUE); check that the resulting data.frame has columns like economic.score,
  economic.evidence, social.score, social.evidence and that tokens/cost attach without errors.

From tidyverse#684

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

hadley · 2025-08-27T13:55:36Z

I don't think flattening is the right solution here — if list2DF() doesn't work with data frame columns, then we should use a different helper.

kbenoit · 2025-08-28T02:41:37Z

I suppose it depends on the interpretation of "convert", which for me, implies that a list would be flattened into a data.frame. If that's not what's wanted, then a user should arguably not specify convert = TRUE. Someone very wise once explained that in a data.frame-like dataset, columns should be single variables and each cell should contain a single value...

I've reduced the problem to a simpler reprex, added here.

The fix seems to be limited to one of:

flatten the data.frame, as in this PR
detect a nested type and override convert = TRUE and return a list, instead of doing the call and the crashing before conversion and causing the result to be lost. And explain this in the documentation of course
use an alternative to list2DF() in convert_from_type(), that resolves column names correctly for the nested object and makes some sort of sense for use cases that make sense (that I don't yet see) where the resulting tibble/data.frame can contain cells with lists or data.frames.

I'd be happy to revise the PR if you choose which fix you prefer. Leaving it without a fix though can cause frustrating (and costly) data loss when a user is unaware that a nested type will cause this error exit condition.

hadley · 2025-08-28T12:43:09Z

I think you can just leave this to me; I know exactly what data type this should be, and for now I'm pretty certain that's correct, even if it gives you a relatively unusual data structure. (There's no guarantee it will be tidy, but I don't think that's a guarantee that applies to tools like ellmer that need to interface with other systems; you certainly might want to use some tidyr afterward if you are looking for a tidy df).

kbenoit · 2025-08-28T13:24:27Z

OK sounds good. The main thing is to modify the conversion avoid the cost and time only to have it crash on executing the convert. Close up the PR or do with it what you will. Hopefully my test conditions will help.

Fix convert = TRUE errors in parallel_chat_structured()

0b29651

From tidyverse#684

kbenoit changed the title ~~Structured conversion: Robust convert = TRUE for parallel_chat_structured() (fixes #864)~~ Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) Aug 22, 2025

kbenoit and others added 13 commits August 22, 2025 12:06

Revert unintended changes to tests/testthat/test-batch-chat.R

b8b8f9c

Merge branch 'main' into fix/684-list2DF-failure

df9c73a

implement linting suggestions

2072502

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

a8081ee

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

ebbc19d

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

ba64f1d

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

07a13c1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

3310b21

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

5e103ae

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

d63d19d

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

263d707

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

727b7fd

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

implement linting suggestions

ca3c8df

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kbenoit mentioned this pull request Aug 26, 2025

Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

Open

kbenoit mentioned this pull request Aug 28, 2025

list2DF() failure crashes parallel_chat_structured(..., convert = TRUE) #684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) #725

Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) #725

Uh oh!

kbenoit commented Aug 22, 2025 •

edited

Loading

Uh oh!

hadley commented Aug 27, 2025

Uh oh!

kbenoit commented Aug 28, 2025 •

edited

Loading

Uh oh!

hadley commented Aug 28, 2025

Uh oh!

kbenoit commented Aug 28, 2025

Uh oh!

Uh oh!

Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) #725

Are you sure you want to change the base?

Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) #725

Uh oh!

Conversation

kbenoit commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hadley commented Aug 27, 2025

Uh oh!

kbenoit commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hadley commented Aug 28, 2025

Uh oh!

kbenoit commented Aug 28, 2025

Uh oh!

Uh oh!

kbenoit commented Aug 22, 2025 •

edited

Loading

kbenoit commented Aug 28, 2025 •

edited

Loading