1, The correspondence between the CMP segments network and stops (from GTFS?) should be pulled out to a separate step, and the result should be output to a CSV file. Then we don't need to re-run this part every time during testing
2. Why is the auto-matching between stops and CMP segments so bad? Dig deeper
3. Why are we looping over dataframes with Python? That defeats the entire purpose of using dataframes/arrays.
4. How is the postprocessing_overlap_pairs CSV file generated and used? (It's for stops/segments that cannot be automatched, so potentially a more robust step 2 can solve this problem too and we could maybe remove the need for this file.)