feat: Use "natural" stop sequences to condense GL derived limits #1271

jzimbel-mbta · 2025-06-13T12:21:27Z

Summary of changes

Asana Ticket: 🏹 Improve limits derivation for complex GL disruptions

Summary of change

The overall idea is to compare stops visited by exported service against "natural" stop sequences, instead of canonical stop sequences.

For example, a disruption extending from Babcock to North Station is not possible to describe against one GL route's stop sequence because Green-B turns around at Gov Ctr. The old logic would produce 2 (or more) limits for this--Babcock to Gov Ctr, and Gov Ctr to North Station.

The new logic uses @arkadyan's unrooted_polytree data structure to convert a set of canonical stop sequences for a line to a set of "natural" stop sequences that represent the longest possible runs from one end of the line to the other, ignoring how many intra-line transfers you'd need to make.

Then, we compare exported service against these and do some additional steps to condense the resulting limits into a minimal number of maximally-long segments.

Example

Here's how this improves limits derivation for 2025-spring-GLBabcockNorthStation-v2.zip -- see associated asana ticket

Left is output of the existing logic in main branch, right is output of the new logic.

TO DO

Update ExportUploadTest.build_gtfs/1 to add platform <-> parent station relations for all inserted stops, since these are required by the new logic.

Reviewer Checklist

Meets ticket's acceptance criteria
Any new or changed functions have typespecs
Tests were added for any new functionality (don't just rely on Codecov)
This branch was deployed to the staging environment and is currently running with no unexpected increase in warnings, and no errors or crashes.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1210734138209410

Whoops · 2025-06-23T17:28:03Z

Outside of the code, I would like @shantigonzales and @fsaid90 to chime in on this. If this is better really depends on what this feature is designed to be. Basically, my concern is this: limits (as in limits we put in Arrow, not (necessarily) these derived limits) are a fairly simple, crude concept. If I have a limit between A and B, that limit looks for trips that visit both A and B, and cuts out the A->B segment, replacing that trip with 0 - 2 new trips. A limit won't affect trips that don't visit both stops. So if for example we have a limit between Government Center and Heath Street (E line) it won't affect B, C, D trips between Government Center and Kenmore because they don't hit Heath Street, even though they share a segment that's probably closed (Government Center -> Copley). So, to fully model the example outage, you need two limits: GC->Copley (B, C, D, E) and Copley->Heath (E).

So, looking at the "new" derived limits of this feature, we see:

Babcock (B) -> Copley (B,C,D,E) = B trips affected
Babcock (B) -> North Station (D, E) = No trips affected
Heath Street (E) -> North Station (D, E) = E trips affected

If we think of this feature as helping someone looking at a track diagram understand what's closed, this is definitely easier to follow than the original (moreso if the redundant Babcock -> Copley is removed, it's a subset of Babcock -> North Station, but let's not get obsessive here).

On the other hand, if we think of this as a literal documentation of the equivalent "limits" someone trying to apply these limits manually to replicate the outage would end up missing the C and D line trips because none of those trips hit both endpoints of any of these limits. Which is what is being capture in the original implementation:

North Station (D, E) -> Government Center (B, C, D, E) = (D, E)
Government Center (B, C, D, E) -> Boylston (B, C, D, E) = (B, C, D, E)
Copley (B,C,D,E) -> Heath Street (E) = (E)
Government Center (B, C, D, E) -> Babcock (B) = (B)
Government Center (B, C, D, E) -> Kenmore (B, C, D) = (B, C, D)
North Station (D, E) -> Kenmore (B, C, D) = (D)
North Station (D, E) -> Heath Street (E) = (E)

Whoops · 2025-06-23T17:41:42Z

No code concerns. For now, I'm withholding the ✅ on confirmation from our stakeholders that they prefer this version, but barring that I think this is good to go.

fsaid90 · 2025-06-23T20:37:46Z

Outside of the code, I would like @shantigonzales and @fsaid90 to chime in on this. If this is better really depends on what this feature is designed to be. Basically, my concern is this: limits (as in limits we put in Arrow, not (necessarily) these derived limits) are a fairly simple, crude concept. If I have a limit between A and B, that limit looks for trips that visit both A and B, and cuts out the A->B segment, replacing that trip with 0 - 2 new trips. A limit won't affect trips that don't visit both stops. So if for example we have a limit between Government Center and Heath Street (E line) it won't affect B, C, D trips between Government Center and Kenmore because they don't hit Heath Street, even though they share a segment that's probably closed (Government Center -> Copley). So, to fully model the example outage, you need two limits: GC->Copley (B, C, D, E) and Copley->Heath (E).

So, looking at the "new" derived limits of this feature, we see:
Babcock (B) -> Copley (B,C,D,E) = B trips affected
Babcock (B) -> North Station (D, E) = No trips affected
Heath Street (E) -> North Station (D, E) = E trips affected
If we think of this feature as helping someone looking at a track diagram understand what's closed, this is definitely easier to follow than the original (moreso if the redundant Babcock -> Copley is removed, it's a subset of Babcock -> North Station, but let's not get obsessive here).

On the other hand, if we think of this as a literal documentation of the equivalent "limits" someone trying to apply these limits manually to replicate the outage would end up missing the C and D line trips because none of those trips hit both endpoints of any of these limits. Which is what is being capture in the original implementation:
North Station (D, E) -> Government Center (B, C, D, E) = (D, E)
Government Center (B, C, D, E) -> Boylston (B, C, D, E) = (B, C, D, E)
Copley (B,C,D,E) -> Heath Street (E) = (E)
Government Center (B, C, D, E) -> Babcock (B) = (B)
Government Center (B, C, D, E) -> Kenmore (B, C, D) = (B, C, D)
North Station (D, E) -> Kenmore (B, C, D) = (D)
North Station (D, E) -> Heath Street (E) = (E)

Those are good points Walton - and I'd definitely want to hear Shanti's thoughts as well before I give my own thoughts (I don't want to bias her!), BUT I do also wonder:

Since we are deriving the actual GL branch, perhaps grouping them in the UI per branch (and indicating the actual branch) might make things more neat and tidy in general (per Jon's original limits derivation implementation).

What do you both think? (Jon and Walton, while Shanti's out :) )

Whoops · 2025-07-07T14:17:16Z

One factor that occurred to me in the discussion this morning, this new version is much closer to how we talk about limits and put them in tickets. So if the purpose is just validating the HASTUS export, does what we expect it to do, this is no doubt superior.

shantigonzales · 2025-07-07T17:16:56Z

My 2c, which echoes what Walton said to some extent: this is much more intuitive to me as a non-technical user, and will be helpful in streamlining validation. In Walton's example, I'm not entirely sure how those C + D line trips get captured, but that's more technical than program. From a non-code perspective, I like how this is approaching the problem.

jzimbel-mbta and others added 4 commits June 13, 2025 07:54

feat: Use "natural" stop sequences to condense GL derived limits

7e62737

Update tests & fixtures for new derived limits logic

cb7b2a2

Appease credo: reduce function body nesting

62f6cec

Merge branch 'master' into jz-better-derived-limits

d623a03

jzimbel-mbta marked this pull request as ready for review June 23, 2025 11:35

jzimbel-mbta requested a review from a team as a code owner June 23, 2025 11:35

jzimbel-mbta requested review from Whoops and removed request for a team June 23, 2025 11:35

jzimbel-mbta mentioned this pull request Jun 26, 2025

fix: non-deterministic tests in export_upload_test #1270

Merged

4 tasks

jzimbel-mbta marked this pull request as draft July 8, 2025 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Use "natural" stop sequences to condense GL derived limits #1271

feat: Use "natural" stop sequences to condense GL derived limits #1271

Uh oh!

jzimbel-mbta commented Jun 13, 2025 •

edited

Loading

Uh oh!

Whoops commented Jun 23, 2025

Uh oh!

Whoops commented Jun 23, 2025

Uh oh!

fsaid90 commented Jun 23, 2025

Uh oh!

Whoops commented Jul 7, 2025

Uh oh!

shantigonzales commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Use "natural" stop sequences to condense GL derived limits #1271

Are you sure you want to change the base?

feat: Use "natural" stop sequences to condense GL derived limits #1271

Uh oh!

Conversation

jzimbel-mbta commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Summary of change

Example

TO DO

Reviewer Checklist

Uh oh!

Whoops commented Jun 23, 2025

Uh oh!

Whoops commented Jun 23, 2025

Uh oh!

fsaid90 commented Jun 23, 2025

Uh oh!

Whoops commented Jul 7, 2025

Uh oh!

shantigonzales commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jzimbel-mbta commented Jun 13, 2025 •

edited

Loading