Skip to content

Fix str encode in ractors #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

casperisfine
Copy link

Author is @luke-gruber, opening a draft to ease review.

encoding.c Outdated
Comment on lines 859 to 876
size_t input_len = strlen(name);
switch(input_len) {
case 5:
if (strncmp(name, string_UTF_8, 5) == 0) {
return ENCINDEX_UTF_8;
}
case 8:
if (strncmp(name, string_US_ASCII, 8) == 0) {
return ENCINDEX_US_ASCII;
}
case 10:
if (strncmp(name, string_ASCII_8BIT, 10) == 0) {
return ENCINDEX_ASCII_8BIT;
}
default:
break;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? If one care about perf, they're use Encoding objects (e.g. Encoding::UTF_8), not encoding names. Also all these encodings have tons of aliases so I'm not even sure this opt will match often.

>> Encoding.find("utf-8")
=> #<Encoding:UTF-8>
>> Encoding.find("UTF-8")
=> #<Encoding:UTF-8>

Copy link

@luke-gruber luke-gruber Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it was going through this path fairly often because when doing any kind of transcoding, only the names of the encodings are passed, not the rb_encodings . I added an internal API (static functions) that does allow passing encodings now, so this should be less necessary. However, other files still reference these other functions, so this is still a win imo, especially with ractors.
Edit: I changed it to use STRCASECMP

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to use STRCASECMP

I still really don't like this. We should fix whatever API is calling this, to pass a rb_encoding * instead.

It's not just about capitalization:

>> Encoding.find("ascii")
=> #<Encoding:US-ASCII>
>> Encoding.find("us-ascii")
=> #<Encoding:US-ASCII>

I understand that this short circuit pays off, but it's really not clean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this and will have a better solution in a separate PR for encoding/transcoding perf.

@luke-gruber luke-gruber force-pushed the fix_str_encode_in_ractors branch 4 times, most recently from e7c1df8 to ff72419 Compare August 6, 2025 18:40
@luke-gruber
Copy link

@Shopify/byroot It's ready for a re-review. I split my original branch into 2, this PR now includes only deadlock fixes (no perf gains). The other branch is built on top of this and includes perf. gains. I will make a PR for the other branch when it's ready if this gets merged. 🙇

@@ -6684,14 +6690,15 @@ env_shift(VALUE _)
VALUE result = Qnil;
VALUE key = Qnil;

rb_encoding *enc = env_encoding();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So env_encoding calls rb_locale_encoding which calls rb_locale_encindex(), which acquire the VM lock, so this isn't helping much.

I'd start by making rb_locale_encindex() lock-free in most case. A simple atomic check to see if the locale encoding was initialized should suffice.

Copy link

@luke-gruber luke-gruber Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more that env_encoding could load the encoding, and that needed to be done outside the VM lock or else it deadlocks but I'll look into making it atomic 👍 It shouldn't load the encoding often, but native exts could call setlocale() in which case it could.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That makes sense.

Perhaps it doesn't matter, as this is only used when accessing ENV, which hopefulyl wouldn't be in a hotspot. But yeah, the amount of work done every time is surprising.

I'll look into making it atomic

No need. Let's make if safe. We'll see about optimizing later.

@luke-gruber luke-gruber force-pushed the fix_str_encode_in_ractors branch 4 times, most recently from 49d1da5 to 6f5a722 Compare August 7, 2025 19:31
jhawthorn and others added 19 commits August 8, 2025 17:13
Previously, if GC was in progress when we're initially building the
id2ref table, it could see the empty table and then crash when trying to
remove ids from it. This commit fixes the bug by only publishing the
table after GC is done.

Co-authored-by: Aaron Patterson <[email protected]>
Small fix for a typo in the regular expression docs. The line of code above this change does not produce the output shown in the docs. With this change the docs will show the correct output for this example of using regex quantifiers.
doc currently indicates the return value as `new_array` but then in the first sentence explains "always returns +self+ (never a new array)".
on null device
(ruby/stringio#137)

Fixes segmentation fault when calling `seek` with `SEEK_END` on null
device StringIO created by
  `StringIO.new(nil)`.

```bash
ruby -e "require 'stringio'; StringIO.new(nil).seek(0, IO::SEEK_END)"
```

I tested with below versions.

```bash
[koh@Kohs-MacBook-Pro] ~
% ruby -v;gem info stringio;sw_vers
ruby 3.4.5 (2025-07-16 revision ruby/stringio@20cda200d3) +PRISM [arm64-darwin24]

*** LOCAL GEMS ***

stringio (3.1.2)
    Authors: Nobu Nakada, Charles Oliver Nutter
    Homepage: https://github.com/ruby/stringio
    Licenses: Ruby, BSD-2-Clause
    Installed at (default): /Users/koh/.local/share/mise/installs/ruby/3.4.5/lib/ruby/gems/3.4.0

    Pseudo IO on String
ProductName:            macOS
ProductVersion:         15.5
BuildVersion:           24F74
[koh@Kohs-MacBook-Pro] ~
%
```

ruby/stringio@9399747bf9
In such case the pointer need to be casted.
`echo off` affects the batch files called from this file as well.
It is used in more steps than `sh`.
Because ruby/setup-ruby is affected to test result.
BurdetteLamar and others added 6 commits August 11, 2025 09:24
Add locations to struct `RNode_IN`.

memo:

```bash
> ruby -e 'case 1; in 2 then 3; end' --parser=prism --dump=parsetree
@ ProgramNode (location: (1,0)-(1,24))
+-- locals: []
+-- statements:
    @ StatementsNode (location: (1,0)-(1,24))
    +-- body: (length: 1)
        +-- @ CaseMatchNode (location: (1,0)-(1,24))
            +-- predicate:
            |   @ IntegerNode (location: (1,5)-(1,6))
            |   +-- IntegerBaseFlags: decimal
            |   +-- value: 1
            +-- conditions: (length: 1)
            |   +-- @ InNode (location: (1,8)-(1,19))
            |       +-- pattern:
            |       |   @ IntegerNode (location: (1,11)-(1,12))
            |       |   +-- IntegerBaseFlags: decimal
            |       |   +-- value: 2
            |       +-- statements:
            |       |   @ StatementsNode (location: (1,18)-(1,19))
            |       |   +-- body: (length: 1)
            |       |       +-- @ IntegerNode (location: (1,18)-(1,19))
            |       |           +-- IntegerBaseFlags: decimal
            |       |           +-- value: 3
            |       +-- in_loc: (1,8)-(1,10) = "in"
            |       +-- then_loc: (1,13)-(1,17) = "then"
            +-- else_clause: nil
            +-- case_keyword_loc: (1,0)-(1,4) = "case"
            +-- end_keyword_loc: (1,21)-(1,24) = "end"
```
gc_config_set returned rb_gc_impl_config_get, but gc_config_get also added
the implementation key to the return value. This caused the return value
of GC.config to differ depending on whether the optional hash argument is
provided or not.
Make sure VM lock is not held when calling `load_transcoder_entry`, as
that causes deadlock inside ractors. `String#encode` now works inside
ractors, among others.

Atomic load the rb_encoding_list

Without this, wbcheck would sometimes hit a missing write barrier.

Co-authored-by: John Hawthorn <[email protected]>

Hold VM lock when iterating over global_enc_table.names

This st_table can be inserted into at runtime when autoloading
encodings.

minor optimization when calling Encoding.list
@luke-gruber luke-gruber force-pushed the fix_str_encode_in_ractors branch from d454447 to ec8327d Compare August 11, 2025 17:46
@luke-gruber
Copy link

This was merged into ruby/ruby, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.