Update: The code for this post is available at https://github.com/samurdhilbk/Schedule-a-Ride.
1. To maximize any figure, just click on it. If this doesn't work, refresh the page and try again.
2. I am using the Python version of the igraph library for this post. This will allow us to focus on the problem solving aspect and ease the implementation load.
3. If you need help coding, email me at samu dot karu at outlook dot com and I'll see what I can do.

We know that Uber has a “schedule a ride” option where you can schedule a ride days-in-advance by giving a start time, pick-up location and destination. Given a set of “schedule a ride” requests for a given day, engineers at Uber have to figure out

1. 1.

Whether or not all those requests can be satisfied

2. 2.

How many drivers will be needed if all requests are to be satisfied?

3. 3.

Given that only $K$ drivers are available, how many requests can be satisfied at maximum?

In this post, we will try to figure out how such a task may be accomplished given the Los Angeles Uber Movement dataset obtained and prepared as below.

Go to Uber Movement website and download data of Travel Times by Month (All Days), 2019 Quarter 4, for Los Angeles area. The dataset contains pairwise traveling time statistics between most pairs of points in the Los Angeles area. Points on the map are represented by unique IDs. To understand the correspondence between map IDs and areas, download Geo Boundaries file from the same website. This file contains latitudes and longitudes of the corners of the polygons circumscribing each area. To be specific, if an area is represented by a polygon with 5 corners, then you have a 5x2 matrix of the latitudes and longitudes, each row of which represents latitude and longitude of one corner.

Read the dataset at hand, and build a graph (in igraph) in which nodes correspond to locations, and undirected weighted edges correspond to the mean traveling times between each pair of locations (only December). Add the mean of the coordinates of each region's polygon corners (a 2-D vector) as an attribute to the corresponding vertex. The graph will contain some isolated nodes (extra nodes existing in the Geo Boundaries JSON file) and a few small connected components. Remove such nodes and just keep the largest connected component of the graph. In addition, merge duplicate edges by averaging their weights. We will refer to this cleaned graph as $\hat{G}$ afterwards.

1 Problem formulation

$\hat{G}$ (derived above) denotes the universe of possible pick-up locations (sources) and destinations, and the average time it takes to travel between a source and destination. We are given $N$ customer requests $R$ in the form {start time, source, destination}. i.e. $R=\{(t,s,d)\in T\times S\times D;t\in T,s\in S,d\in D\}$ where the $t\in T$ are within a 24-hour period starting and ending at midnight. As with any other company, Uber wants to avoid negative reviews and complaints by customers as much as possible—so much so that it is willing to deny some requests if it expects that any given customer will have to wait more than $\Delta$ minutes for their driver to arrive.

2 Generating $R$

The set of customer requests $R$ is not something available in the Uber dataset. So we will attempt to generate a realistic $R$ that takes into account everyday traffic conditions. Specifically, we want to simulate

1. 1.

Morning rush-hour traffic peaking at around 8.30 AM.

2. 2.

Lunch-hour traffic peaking at around 12.30 PM.

3. 3.

Evening traffic peaking at around 5.30 PM.

Taking these factors into account, we sample $N=1000$ pick-up times $T$ from the tri-modal distribution depicted in Fig. 1a, generated by combining three Gaussian distributions with means $\{510\leavevmode\nobreak\ \text{min},780\leavevmode\nobreak\ \text{min},1050% \leavevmode\nobreak\ \text{min}\}$, corresponding to {8.30 AM, 12.30 PM, 5.30 PM} respectively (note that times in this post will be in the range $[0,1440)$, denoting a 24-hour period as previously mentioned). The corresponding $S$ and $D$ are generated randomly from $\hat{G}$. A histogram representing $T$ is depicted in Fig. 1b; as expected, it resembles the sampling distribution in Fig. 1a.

3 Generating the feasibility graph $G$

To integrate the constraints such as travel time and maximum wait time ($\Delta$) we form a feasibility graph $G$ in the following way.

• Each vertex represents a unique request $r\in R$

• A directed edge from vertex $r_{i}=\{t_{i},s_{i},d_{i}\}$ to vertex $r_{j}=\{t_{j},s_{j},d_{j}\}$ exists if a driver can finish trip $r_{i}$ and be at the start point of trip $r_{j}$ so that that customer does not have to wait more than $\Delta$ minutes.

That is $G=(V,E)$ where

$\begin{gather*} V=R\\ E=\{(r_i,r_j) \in R \times R; t_i+\text{shortest}(s_i,d_i)+\text{shortest}(d_i,s_j) <= t_j+\Delta\} \end{gather*}$

Here, $\text{shortest}(\cdot)$ is a function that gives the shortest distance between two vertices in $\hat{G}$. Note that for ($\Delta<<\text{minimum travel time between any two locations in }\hat{G}$), $G$ is a directed acyclic graph.

4 Subtask 1: What is the maximum number of requests that can be satisfied if only one driver is available?

Although not a realistic concern that Uber may face, a good starting point is to evaluate how many of these $N$ requests we can satisfy if it has only 1 driver available (so that Uber can reject the rest of the requests). Given $G$, the solution is simply the longest path in $G$, which we can find using topological ordering and dynamic programming as described in Algorithm 1.

The set of pick-up locations and destinations corresponding to requests in $R$ is plotted in Fig. 2, where an edge of each shade of blue represents a single request. It is clear that these requests uniformly cover locations in $\hat{G}$. The path corresponding to the solution obtained by using Algorithm 1 is plotted in Fig. 3, where increasing width of the line indicates progression (start and end points are also marked). The driver will cover 36 customers in this path, which as expected with only one driver, is much less than the 1000 requests received. The wait times experienced by each customer are plotted in Fig. 4, where the maximum wait time is $4.905\leavevmode\nobreak\ \text{min}\leq(\Delta=5)$. We can see in a majority of cases, the driver arrives less than 10 minutes before the pick-up time, meaning that optimal solution squeezes as much customers as possible by minimizing the idle time of the driver.

5 Subtask 2: What is the minimum number of drivers needed if all requests are to be accomodated?

Uber also has a logistics concern of making sure that enough drivers are available on a given day so that the requests for that day could be accommodated. So the next part is to find the minimum number of drivers needed to satisfy all requests.

It is clear that the solution to this problem is the minimum vertex-disjoint path cover of $G$. i.e. We need to find a set of vertex-disjoint paths $P$ that cover all vertices in $G$. This ensures that all requests are accommodated and that a single request is not assigned to more than one driver.

We note that the minimum vertex-disjoint path cover of $G$ could be found by formulating it as a maximum bipartite matching. To see this, first form a bipartite graph $G^{\prime\prime}$ where the first partition is denoted by $\{x_{1},x_{2},\dots,x_{N}\}$ and the second partition is denoted by $\{y_{1},y_{2},\dots,y_{N}\}$ where $x_{i}$ and $y_{i}$ both represent the node $v_{i}\in V$. Now draw an edge $(x_{i},y_{j})$ if the edge $(v_{i},v_{j})$ exists in $G$. Now, in the maximum bipartite matching (MBM) of $G^{\prime\prime}$, each remaining edge will represent an edge belonging to some path in the minimum path cover. Figuring out which edges belong to which path is also straightforward. If $y_{i}$ does not have an incident edge in the maximum bipartite matching, then $v_{i}$ must be a starting point of one of the paths (similarly, if $x_{i}$ does not have an outgoing edge then $v_{i}$ must be an ending point). For such a $y_{i}$, the second vertex in the path will be $v_{j}$, where $x_{i}$ is matched to $y_{j}$ in MBM. The third vertex will be $v_{k}$, where $x_{j}$ is matched to $y_{k}$ in the MBM. When we finally find a $y_{h}$ where $x_{h}$ is not matched to any $y$ in the MBM, we know that $v_{h}$ is the the final vertex of this path. There could also be degenerate cases where both $x_{i}$ and $y_{i}$ do not have any edges connected to them in the MBM, which means that $v_{i}$ will be covered by a zero-length path (i.e. a driver will have to be assigned exclusively for serving request $r_{i}$).

To find the maximum bipartite matching between $\{x_{1},x_{2},\dots,x_{n}\}$ and $\{y_{1},y_{2},\dots,y_{n}\}$ using max-flow algorithms, we form a graph $G^{\prime}=(V^{\prime},E^{\prime})$ where

$\begin{gather*} V'=\{v_0\} \cup \{x_1,x_2,\dots,x_n\} \cup \{y_1,y_2,\dots,y_n\} \cup \{v_{\infty}\}\\ E'=\{(v_0,x_i) | v_i \in V\} \cup \{(y_i, v_{\infty}) | v_i \in V\} \cup \{(x_i,y_j); (v_i,v_j) \in E\} \end{gather*}$

That is, we simply introduce a source node $v_{0}$ and draw edges from $v_{0}$ to all vertices in the first partition of $G^{\prime\prime}$ and introduce a sink node $v_{\infty}$ and draw edges from all vertices in the second partition of $G^{\prime\prime}$ to $v_{\infty}$. If each edge in $G^{\prime}$ is assigned a capacity of 1, $(x_{i},y_{j})$ will be an edge in the maximum bipartite matching of $G^{\prime\prime}$ if and only if edge $(x_{i},y_{j})$ has a flow of 1 in the max-flow solution for $G^{\prime}$. This completes our solution.

To evaluate our solution, we form $G^{\prime}$ and find the max-flow solution by using the maxflow() function available in the igraph library. We found that 83 drivers are needed to satisfy all customer requests. From Fig. 6a it is clear that all requests in Fig. 2 are covered. Fig. 5b denotes the number of customers each driver is assigned on his/her path. We can see that while most drivers are assigned more than 10 customers, some only get to take 2,3 or 4 customers. Some of these longer paths are visualized in Fig. 6b and some of these shorter paths are visualized in Fig. 6c. Fig. 5a gives an idea about the maximum wait times experienced by a customer in each driver’s path.

6 Subtask 3: If only $K$ drivers are available, what is the maximum number of requests that can be accomodated?

A much more realistic concern for Uber is the limited number of drivers available on a given day. It has an idea about the number $K$ of drivers available on the day, and wants to know how many requests can be satisified at maximum with these $K$ drivers. It would also like to know which requests are satisfiable, so that it can reject the rest.

It is straightforward to see that this problem is equivalent to finding the maximum number of vertices than can be covered by using exactly $K$ vertex-disjoint paths in $G$. To attempt a solution, first form the graph $H=(V_{H},E_{H})$ with edge-flows $f(\cdot)$, edge-costs $w(\cdot)$ and edge capacities $c(\cdot)$ as follows:

1. 1.

Split each vertex $v$ in $G$ to two verices $v^{\text{in}}\in V_{H}$ and $v^{\text{out}}\in V_{H}$

2. 2.

Introduce a source node $s\in V_{H}$ and sink node $t\in V_{H}$

3. 3.

For each $v\in V$ in $G$

1. (a)

Draw an edge $(v^{\text{in}},v^{\text{out}})\in E_{H}$ with cost $w((v^{\text{in}},v^{\text{out}}))=-1$

2. (b)

Draw an edge $(s,v^{\text{in}})\in E_{H}$ with cost $w((s,v^{\text{in}}))=0$

3. (c)

Draw an edge $(v^{\text{out}},t)\in E_{H}$ with cost $w((v^{\text{out}},t))=0$

4. 4.

For each edge $e=(u,v)\in E$ in $G$ draw an edge $(u^{\text{out}},v^{\text{in}})\in E_{H}$ with cost $w((u^{\text{out}},v^{\text{in}}))=0$

5. 5.

For each edge $e\in E_{H}$ set $c(e)=1$

Then we send a min-cost flow of $K$ from $s$ to $t$. Now, for edge $(u,v)\in E$, if $(u^{\text{out}},v^{\text{in}})\in E_{H}$ has a flow of 1 in the min-cost flow solution, $(u,v)$ will be a part of the maximum vertex cover of $G$. If the flow is 0 instead, $(u,v)$ will not be a part of the maximum vertex cover. Once we know which edges are part of the maximum vertex cover, it is straightforward to recover the paths they form. This means that we have found both the number of requests that can be satisfied with $K$ drivers as well which requests they are.

The optimally of the above solution can be explained simply. A set of exactly $K$ vertex-disjoint paths covering the maximum number of vertices in $G$ is equivalent to a set of exactly $K$ edge-disjoint paths covering the maximum number of vertices in $H$. We know the solution to the min-cost flow problem is a set of edge-disjoint paths in $H$. Since we set the cost of edges of the form $(v^{\text{in}},v^{\text{out}})\in E_{H}$ to -1, the min-cost flow solution will try to make these paths as long as possible so that the flow can go through as many -1 cost edges as possible. This will result in as many vertices as possible being covered by these $K$ paths.

igraph does not have an inbuilt min-cost flow algorithm implementation. So we attempt to solve this problem by representing the min-cost flow problem as a linear program.

\begin{align*} \begin{array}{lll} \text{minimize} & \sum_{(u,v) \in E_{H}} f(u,v) \cdot w(u,v) & \\ \text{subject to}& f(u,v) \leq c(u,v) & \forall (u,v) \in E_{H} \\ & \sum_{(u,v) \in E_{H}} f(u,v) = \sum_{(v,w) \in E_{H}} f(v,w) & \forall v \in V_{H}\\ & \sum_{(s,v) \in E_{H}} f(s,v) = \sum_{(w,t) \in E_{H}} f(w,t) = K & \end{array} \end{align*}

It is known that for integer inputs of $w$, $c$ and $K$, the min-cost flow always has a solution where all flows take integer values. However, there are also non-integer solutions, and simply solving this linear program will likely yield those solutions. So we used cvxpy, with the mixed-integer solver GPLK_MI provided by the cvxopt package.

We first evaluate the solution for $K=50$. In total, 894 requests could be satisfied by using these 50 drivers. The paths they took are plotted in Fig. 7a, where black lines denote requests that could not be satisfied. For clarity, a few sample paths are plotted in Fig. 7b. From Fig. 8a we can see that every driver was assigned at least 14 drivers, which is in sharp contrast to Fig. 5b, where they were drivers with as few as 2,3 customers. This justifies our previous argument that our min-cost flow formulation will force longer paths to be selected.

For completeness, and as a sanity check, we finally evaluated the solution with the number of drivers $K$ set to $\{50,55,60,65,70,75,80,81,82,83,85\}$. We can see that as expected, the number of customers that can be satisfied rises monotonously with the number of drivers, and that all 1000 customers can be satisfied starting from when 83 drivers are available. Recall that 83 was the number of drivers found as the solution to Subtask 2.

2+

Published in Algorithms