Skip to content

Conversation

majocha
Copy link
Contributor

@majocha majocha commented Aug 29, 2025

implicitYieldExpressions, argInfoCache, typeDefinitelySubsumesNoCoerceCache, typeFeasibleEquivCache

Copy link
Contributor

github-actions bot commented Aug 29, 2025

❗ Release notes required

@majocha,

Caution

No release notes found for the changed paths (see table below).

Please make sure to add an entry with an informative description of the change as well as link to this pull request, issue and language suggestion if applicable. Release notes for this repository are based on Keep A Changelog format.

The following format is recommended for this repository:

* <Informative description>. ([PR #XXXXX](https://github.com/dotnet/fsharp/pull/XXXXX))

See examples in the files, listed in the table below or in th full documentation at https://fsharp.github.io/fsharp-compiler-docs/release-notes/About.html.

If you believe that release notes are not necessary for this PR, please add NO_RELEASE_NOTES label to the pull request.

You can open this PR in browser to add release notes: open in github.dev

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/10.0.100.md No release notes found or release notes format is not correct
vsintegration/src docs/release-notes/.VisualStudio/18.0.md No release notes found or release notes format is not correct

// Is it quadratic or quasi-quadratic?
if ValueIsUsedOrHasEffect cenv (fun () -> (freeInExpr (CollectLocalsWithStackGuard()) bodyR).FreeLocals) (bindR, bindingInfo) then
let collect expr = (freeInExpr (CollectLocalsWithStackGuard()) expr).FreeLocals
if ValueIsUsedOrHasEffect cenv (fun () -> (getFreeLocalsCache cenv).GetOrAdd(bodyR, collect)) (bindR, bindingInfo) then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives a modest 20% hit ratio in tests. May or may not help somewhat IRL but needs benchmarking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Optimization times when building FCS I don't see any speedup.

/// Usage:
/// let getValueFor = WeakMap.getOrCreate (fun () -> expensiveInit())
/// let v = getValueFor someKey
let getOrCreate valueFactory =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows to quickly and and-hoc attach a cache instance in various places. Currently we thread various caches along with context objects / records which is not that convenient when testing things out.

Comment on lines 958 to 961
let getArgInfoCache =
let options = Caches.CacheOptions.getDefault() |> Caches.CacheOptions.withNoEviction
let factory _ = new Caches.Cache<_, ArgReprInfo>(options, "argInfoCache")
WeakMap.getOrCreate factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these caches going to be cleared when asking FCS to clear its caches?

Copy link
Contributor Author

@majocha majocha Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not, I think since they're weakly attached to higher lever stuff, they'll just get GC'd with it. I captured some telemetry and the caches get disposed quite often during the test run.

@majocha
Copy link
Contributor Author

majocha commented Aug 30, 2025

Ahh, there's that memory leak in vs tests, again 😔

env.eCachedImplicitYieldExpressions.Add(synExpr1.Range, (synExpr1, expr1Ty, cachedExpr))
try TcExpr cenv overallTy env tpenv otherExpr
finally env.eCachedImplicitYieldExpressions.Remove synExpr1.Range
(getImplicitYieldExpressionsCache cenv).AddOrUpdate(synExpr1, (expr1Ty, cachedExpr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics w.r.t. to error handling is different now, aren't they?

Also it feels like the previous version had a race condition (FindAll and later Add), possibly the Multi map was chosen to workaround it by storing a list and not a single value?

Copy link
Contributor Author

@majocha majocha Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version had some concurrency problems, because Vlad at some point added ConcurrentDictionary as HashMultiMap backing store to fix them.

Semantics are different. Originally this just always removed the value in finally. I have to revisit this, I remember the change made sense to me but I forgot why, oh lol. I guess my thinking was eviction will be sufficient here. Now I also notice this was attached to env, not cenv.

This is not really tested in the test suite but there is a benchmark. I'll run it later to see if this still works.

@majocha
Copy link
Contributor Author

majocha commented Sep 19, 2025

I added printing cache metrics when --times option is set, to see what's going on during one off compilation.

@T-Gro
Copy link
Member

T-Gro commented Sep 19, 2025

With aditional type relations caches there seem to be a slight speedup against main:

DecentlySizedStandAloneFileBenchmark

cache-2

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 184.8 ms | 10.26 ms | 29.76 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.98 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 192.5 ms | 10.54 ms | 31.07 ms | 2000.0000 | 1000.0000 | 1000.0000 | 227.95 MB |

| Method | Mean     | Error    | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 189.5 ms | 10.63 ms | 30.84 ms | 2000.0000 | 1000.0000 | 1000.0000 |    228 MB |

main

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 203.5 ms | 8.23 ms | 24.01 ms | 3000.0000 | 2000.0000 | 1000.0000 | 227.19 MB |

| Method | Mean     | Error   | StdDev   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 204.7 ms | 8.61 ms | 24.99 ms | 3000.0000 | 2000.0000 | 1000.0000 |    227 MB |

Seeing it's for just 1MB increase in memory, this is very nice tradeoff 👍 .

(I assume the cache size counts into the allocated bucket)

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

(I assume the cache size counts into the allocated bucket)

Yes, it allocates the few caches that are there for each run, but they are probably not filled up very much. I guess HistoricalBenchmark is not great to test cache improvements, it just compiles a single not very large file, and does not use Transparent Compiler.

But I did run it for some older commits from around April, to see if things are not regressing unnoticed:

detached at f487d45 

| Method | Mean     | Error    | StdDev   | Median   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 269.1 ms | 18.50 ms | 53.98 ms | 241.1 ms | 5000.0000 | 3000.0000 | 2000.0000 | 293.12 MB |

detached at c549ebf

| Method | Mean     | Error    | StdDev   | Median   | Gen0      | Gen1      | Gen2      | Allocated |
|------- |---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Run    | 254.0 ms | 14.47 ms | 37.87 ms | 242.6 ms | 5000.0000 | 3000.0000 | 2000.0000 | 292.85 MB |

Thankfully, it's all good. We are better now than we were, both in allocations and speed.

ETA: this is not a benchmark to assess performance reliably. We can have a serious degraded typecheck times in real build that will not show up here at all.

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

MemoizationTable uses ConcurrentDictionary backing, we can swap this with non-evicting Cache, to see stats of how the various memoization tables perform in the compiler.

For example compiling FSharp.Compiler.Service\Release\net10.0

    ----------------------------------------------------------------------------------------------------------------------------
    | Cache name                          | hit-ratio |     adds |  updates |     hits |   misses | evictions | eviction-fails |
    ----------------------------------------------------------------------------------------------------------------------------
    |                     methodInfoCache |    76.99% |     5624 |        0 |    18820 |     5624 |         0 |              0 |
    | mostSpecificOverrideMethodInfoCache |    73.24% |      365 |        0 |      999 |      365 |         0 |              0 |
    |                      eventInfoCache |     0.00% |        1 |        0 |        0 |        1 |         0 |              0 |
    |                   propertyInfoCache |    38.66% |     1764 |        0 |     1112 |     1764 |         0 |              0 |
    |            implicitYieldExpressions |     0.00% |        0 |        0 |        0 |   717864 |         0 |              0 |
    |             implicitConversionCache |    92.72% |     1368 |        0 |    17436 |     1368 |         0 |              0 |
    |                    ilFieldInfoCache |    75.19% |       32 |        0 |       97 |       32 |         0 |              0 |
    |            entireTypeHierarchyCache |    90.04% |     3428 |        0 |    30983 |     3428 |         0 |              0 |
    |                      v_memoize_file |    99.90% |      380 |        0 |   383211 |      380 |         0 |              0 |
    |           recdOrClassFieldInfoCache |    46.21% |      795 |        0 |      683 |      795 |         0 |              0 |
    |              typeFeasibleEquivCache |    26.66% |    53985 |        0 |    19625 |    53985 |         0 |              0 |
    |                typeSubsumptionCache |    50.88% |   252652 |        0 |   261756 |   252652 |         0 |              0 |
    | typeDefinitelySubsumesNoCoerceCache |    63.60% |    46590 |        0 |    81421 |    46590 |         0 |              0 |
    |                     namedItemsCache |    56.63% |    15838 |        0 |    20681 |    15838 |         0 |              0 |
    |           primaryTypeHierarchyCache |    50.00% |       14 |        0 |       14 |       14 |         0 |              0 |
    |           rawDataValueTypeGenerator |    96.40% |       39 |        0 |     1045 |       39 |         0 |              0 |
    |                        argInfoCache |    26.43% |    42013 |        0 |    15096 |    42013 |         0 |              0 |
    ----------------------------------------------------------------------------------------------------------------------------

@majocha
Copy link
Contributor Author

majocha commented Sep 20, 2025

I'll document here how I estimate the performance, because the existing benchmarks and not suitable.
The steps are:

  • make some changes
  • rebuild boostrap
  • build FCS net10.0 single threaded
  • search for |Typecheck in the binlog to view the result generated with --times switch:
dotnet publish .\proto.proj -c Proto

dotnet build  .\src\Compiler\ -c Release -f net10.0 -m:1 --no-incremental -bl

.\msbuild.binlog

@T-Gro
Copy link
Member

T-Gro commented Sep 25, 2025

For example compiling FSharp.Compiler.Service\Release\net10.0

I like how this brings in clarity into the various caches.
Even if some of them end up performing roughly the same wo/ an improvement, I would propose to keep them (with non-evicting config) simply for the added insights 👍

@T-Gro
Copy link
Member

T-Gro commented Sep 25, 2025

Also globalizing them could help tooling scenarios with multiproject solutions, especially if there are many shared dependencies 👍 .

@majocha
Copy link
Contributor Author

majocha commented Sep 25, 2025

Globalizing is something to try, currently the stats that show up are cumulative, simply by cache name.
There can be many instances, some caches are one per TcGlobals, some MemoizeTables are recreated per checked file IIRC. This PR is a testing ground of various things, but I could extract the MemoizationTable stuff. It should be very safe and negligible on performance, because no eviction cache is just a ConcurrentDictionary with some metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

3 participants