Explaining Dashboard Metrics: Subway Reliability

What does it mean for the subway to be reliable?

Our goal in measuring performance at the MBTA is to accurately measure the quality of the passenger experience on MBTA buses and trains using the data we have. This post will explain the details behind how we measure subway and light rail reliability, which is the most advanced of our metrics. Future posts will explain bus and commuter rail reliability metrics.

Traditional metrics for reliability measure the arrival times of trains or the intervals between when trains arrive at a station (the headway). On the MBTA Performance Dashboard, we use a more advanced metric which shows the percent of passengers who waited less time than the scheduled headway. We call this metric Wait Time Reliability. In order to measure this, we need three main types of data: where passengers enter the system, where trains are, and, finally, the scheduled time between trains as the threshold to measure against. We’ll start with the latter.

MBTA subway services are scheduled to run at least every 10 minutes (on the “trunk” of branched routes) throughout most of the day – around every 5 minutes on most lines during rush hour and every 8 to 10 minutes the rest of the day and on weekends. Most subway passengers don’t usually plan on a specific train arrival time, but rather expect to arrive at a station and have a train show up within a few minutes. When customers arrive at stations expecting that a train will arrive shortly, the evenness of service matters more than the actual time the train arrives. Good service then means that trains are arriving consistently at the scheduled interval.

Diagram illustrating train arrivals within the scheduled interval and beyond the scheduled interval.

Calculating reliability for each train

The next input used to calculate Wait Time Reliability comes from the MBTA’s train location data. Every time a train arrives or departs a station, a record is written to a database. From there we can calculate the time since the last train left the same station and compare it to the scheduled headway. If the actual interval is lower than the scheduled interval, then we assume that no one who boarded that train waited longer than they were supposed to. However, if the interval between trains is longer than what’s scheduled, we then estimate how many people had an excessive wait time.

One known limitation of this approach is that it assumes everyone is able to board the same train, which we know is not necessarily true when there are large gaps during high demand periods. Additionally, this methodology does not currently account for major disruptions or diversions where stations are not served at all. MBTA staff are working with researchers at the Massachusetts Institute of Technology (MIT) to model and address these limitations.

Estimating the number of passengers affected

Because passenger arrival patterns at each stop are predictable (in the aggregate), we use an estimate of how many people are arriving at each station per minute to estimate the number of people who had excess wait time. If the scheduled interval between trains is five minutes, but a train is seven minutes behind the previous one, then people who arrived during the first two minutes of that interval waited too long. Passengers arriving when the train is less than five minutes away wait less than the headway and are not counted here.

Diagram comparing wait times between two hypothetical passengers.

If the arrival rate at that station going in that direction at that time of day is 20 passengers per minute, we can estimate that this two-minute delay caused 40 people to wait too long. Doing this for every train at every station over the course of the day produces an estimate of people with excess waits, which we divide by the total number of people riding the line to get the percent who waited too long. Subtracting that from 100 is the percent of people with acceptable waits, resulting in our performance metric.

Now let’s go back to the rate of people per minute arriving at station — the passenger arrival rate. This is a critical piece of data that makes the metric work. It’s what separates this metric from others that simply measure the intervals between trains. Using an advanced piece of software developed by researchers at MIT called the ODX model, which stands for Origin, Destination, Transfer, we estimate different arrival rates for each station in the system. These rates also vary by time of day.*  With different passenger arrival rates we can take into account that two extra minutes between trains at rush hour downtown affect more passengers than two extra minutes at 11 a.m. near the end of a line.


So why do we go through all this trouble? Because it allows us to more accurately reflect our passengers’ experiences on the MBTA by weighting performance by the number of passengers affected. Taking into account the fact that people arrive at different times between trains and are not equally affected by gaps in service, as well as the variations in demand from station to station and hour to hour over the course of the day, results in a more comprehensive and nuanced reliability metric for subway service. This metric reflects the true passenger experience in the best way we currently can.

* In fact, this arrival rate for a station can be further broken down into a rate for each station people are going to – e.g. number of passengers entering South Station bound for Quincy Center Station at 5:30 p.m., which allows us to take into account the fact that passengers who live on one branch of a line cannot take a train serving the other branch, and thus must wait until a train for their branch arrives.