Skip to content

Conversation

kbenoit
Copy link

@kbenoit kbenoit commented Aug 22, 2025

Solves #684

  • Root cause: When convert = TRUE and the schema is an array of objects that themselves contain nested objects (e.g., economic and social), then convert_from_type() produces data.frame columns that were themselves data frames. This triggered list2DF() conversion failures and made downstream handling (including tokens/cost) brittle.
  • Fix: Flatten nested object properties when converting arrays of objects:
    • In R/chat-structured.R, convert_from_type(), for TypeArray(TypeObject) with declared properties, build columns for each property across items, then flatten any nested data.frame columns by prefixing with the parent property name before calling list2DF().
    • Example: economic.score, economic.evidence, social.score, social.evidence — no nested data frames remain in the final
      data.frame.
  • Token robustness: In R/parallel-chat.R multi_convert(), coerce token fields to integer with as.integer(turn@tokens) before
    aggregation. Some providers (e.g., Gemini) return numeric doubles; this prevents the vapply “values must be type 'integer'”
    error under convert = TRUE.
  • Tests added:
    • VCR-backed integration test (tests/testthat/test-parallel-chat-structured.R) using chat_openai_test() and short inline
      prompts validates:
    • convert = TRUE returns a flattened data.frame with the expected nested column names and includes input/output/cached_input token columns and cost.
    • convert = FALSE returns the raw list with the expected nested structure for comparison.
    • Minimal offline unit test in the same file confirms flattening for one and multiple nested objects (no network needed).
  • A “Gemini-like” unit test constructs assistant Turns with tokens as doubles, exercising the as.integer() coercion and ensuring token columns attach cleanly. I added this because I was getting a different error crashing the conversion to data.frame with chat_google_gemini().
  • How it fixes #864 and relates to the linked example:
  • Practical validation:
    • Run the new integration test to see flattened columns and token/cost appended correctly.
    • Alternatively, replicate the PR (partially) Fix/628 robust parallel chat structured #705 example locally: define a type_object with nested economic/social dimension objects
      and call parallel_chat_structured(..., convert=TRUE); check that the resulting data.frame has columns like economic.score,
      economic.evidence, social.score, social.evidence and that tokens/cost attach without errors.

@kbenoit kbenoit changed the title Structured conversion: Robust convert = TRUE for parallel_chat_structured() (fixes #864) Robust parallel_chat_structured(..., convert = TRUE) (fixes #864) Aug 22, 2025
kbenoit and others added 13 commits August 22, 2025 12:06
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@hadley
Copy link
Member

hadley commented Aug 27, 2025

I don't think flattening is the right solution here — if list2DF() doesn't work with data frame columns, then we should use a different helper.

@kbenoit
Copy link
Author

kbenoit commented Aug 28, 2025

I suppose it depends on the interpretation of "convert", which for me, implies that a list would be flattened into a data.frame. If that's not what's wanted, then a user should arguably not specify convert = TRUE. Someone very wise once explained that in a data.frame-like dataset, columns should be single variables and each cell should contain a single value...

I've reduced the problem to a simpler reprex, added here.

The fix seems to be limited to one of:

  • flatten the data.frame, as in this PR
  • detect a nested type and override convert = TRUE and return a list, instead of doing the call and the crashing before conversion and causing the result to be lost. And explain this in the documentation of course
  • use an alternative to list2DF() in convert_from_type(), that resolves column names correctly for the nested object and makes some sort of sense for use cases that make sense (that I don't yet see) where the resulting tibble/data.frame can contain cells with lists or data.frames.

I'd be happy to revise the PR if you choose which fix you prefer. Leaving it without a fix though can cause frustrating (and costly) data loss when a user is unaware that a nested type will cause this error exit condition.

@hadley
Copy link
Member

hadley commented Aug 28, 2025

I think you can just leave this to me; I know exactly what data type this should be, and for now I'm pretty certain that's correct, even if it gives you a relatively unusual data structure. (There's no guarantee it will be tidy, but I don't think that's a guarantee that applies to tools like ellmer that need to interface with other systems; you certainly might want to use some tidyr afterward if you are looking for a tidy df).

@kbenoit
Copy link
Author

kbenoit commented Aug 28, 2025

OK sounds good. The main thing is to modify the conversion avoid the cost and time only to have it crash on executing the convert. Close up the PR or do with it what you will. Hopefully my test conditions will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants