Optimizing linear equalities over iid random variables and mixed integer programming
Tobias Fritz and myself wrote a preprint on optimizing linear equalities over iid random variables. A non-trivial instance discussed in the paper is the following found by Tobias: Maximize with respect to a probability measure \(\mu\) on \([0,+\infty)\) the probability \[
P(
X_1 + X_2 + X_3 < 2X_4
)
\] where \(X_1,X_2,X_3,X_4\) are iid random variables with distribution \(\mu\). More precisely, the goal is to find the exact value, or alternatively a lower and upper bound as sharp as possible, of the quantity \[
C^* = \sup_{\mu}
P_{X_1,...,X_4\sim^{iid}\mu}(
X_1 + X_2 + X_3 < 2X_4
).
\]
This page focuses on upper bound on \(C^*\): we would like to show that no probability measure on \([0,+\infty)\) can make this probability larger than some threshold. A first naive observation, assuming that the distribution is continuous to avoid ties for simplicity, is that \(C^*\le 0.5\) thanks to the following argument: If \(X_1+X_2+X_3 < 2X_4\) holds then the rank of \(X_4\) among \(X_1,X_2,X_3,X_4\) is 3 or 4, which happens with probability 0.5 by symmetry. Here, symmetry refers to the fact that since the \(X_i\) are iid, any permutation of the indices \((1,2,3,4)\) leads to the same distribution and the rank of \(X_4\) is uniformly distributed on \(\{1,2,3,4\}\).
General upper bounds by exchangeability
Sample \(X_1,...,X_m\) accordingly to a given distribution \(\mu\) and let \(p_\mu = P(X_{1} + X_{2} + X_{3} < 2X_{4}\). Then since all variables are distributed according to \(\mu\), \[
p_\mu
=
P(X_{i} + X_{j} + X_{k} < 2X_{l}
)
\] as long as \(i,j,k,l\in \{1,...,m\}\) are distinct indices. Thus for any injective function \(h: \{1,2,3,4\} \to \{1,...,m\}\), \[p_\mu = P(X_{i(1)} + X_{h(2)} + X_{h(3)} < 2X_{h(4)}) =\mathbb{E}[I\{X_{h(1)} + X_{h(2)} + X_{h(3)} < 2X_{h(4)}\} ] \] where \(E[\cdot]\) is the expectation sign and \(I\{\cdot\}\) is the indicator function in \(\{0,1\}\) The equality thus holds for the average over all such injective functions: \[
p_\mu = \frac{1}{N} \mathbb{E}\Big[ \sum_{\substack{h:\{1,2,3,4\}\to\{1,...,m\}\\ \text{injective}}}
I\{X_{h(1)} + X_{h(2)} + X_{h(3)} < 2X_{h(4)}\} \Big]
\] where \(N=m(m-1)(m-2)(m-3)\) is the number of injective functions from a set of size 4 to a set of size \(m\). At this point, we can bound from above the random variable inside the expectation by the supremum over all deterministic non-negative numbers \(x_1,...,x_m\): \[
p_\mu
\le \sup_{x_1,...,x_m\ge 0}
\frac{1}{N}\Big[ \sum_{\substack{h:\{1,2,3,4\}\to\{1,...,m\}\\ \text{injective}}}
I\{x_{h(1)} + x_{h(2)} + x_{h(3)} < 2x_{h(4)}\} \Big]
\] At this point the inequality holds between two deterministic quantities. The optimization on the right hand side is over non-negative reals \(x_1,...,x_m\ge 0\) and requires to satisfy as many inequalities \(x_{h(1)} + x_{h(2)} + x_{h(3)} < 2x_{h(4)}\) as possible where \(h:\{1,2,3,4\}\to\{1,...,m\}\) is an injective function. Without loss of generality, we assume from now on that \[
0 \le x_1 \le x_2 \le \dots \le x_m
\] since the average over all injective functions is invariant by permutation of the indices \(\{1,...,m\}\).
Some injection functions \(h:\{1,2,3,4\}\to\{1,...,m\}\) can be grouped together as they lead to the same inequality. Indeed \[
x_{h(1)} + x_{h(2)} + x_{h(3)} < 2x_{h(4)}
\] only depends on \(h(1),h(2),h(3)\) through the set \(\{h(1),h(2),h(3)\}\) but not on a particular ordering of these three indices among all 6 possible orderings. Grouping together all such injections with ${i,j,k} = \(\{h(1),h(2),h(3)\}\) and \(l=h(l)\), we get \[
p_\mu
\le \sup_{x_1,...,x_m\ge 0}
\frac{6}{N}\Big[ \sum_{1\le i<j<k \le m} \sum_{\substack{l=1: l\notin\{i,j,k\}}}^m
I\{x_i + x_j + x_k < 2 x_l\} \Big].
\] Half the terms in the sum can discarded right away since if \(l < k\) then the inequality \(x_i + x_j + x_k < 2 x_l\) cannot hold. Exactly half of the terms are discarded this way because \(l < k\) if and only if the rank of \(l\) among \(\{i,j,k,l\}\) is 1 or 2, and the rank of \(l\) is uniformly distributed among \(\{1,2,3,4\}\) by symmetry.
Upper bounds by mixed integer linear programming
We will maximize the right-hand side above over non-negative reals \(x_1,...,x_m\ge 0\), below with \(m=12\) to illustrate how such maximization problems can be solved using mixed integer linear programming (MILP). We will use the Gurobi solver via its Python interface.
Start with initializing the MILP model and defining the variables \(x_1,x_2,...,x_m\).
import gurobipy as gpfrom gurobipy import GRBm =12model = gp.Model("MILP_violation_indicators")xs = model.addVars(range(1,m+1), vtype=GRB.CONTINUOUS, lb=0, ub=1, name="x")# Generate the order constraints x_1 <= x_2 <= ... <= x_m = 1for i inrange(1,m): model.addConstr(xs[i] <= xs[i +1], f"order_{i}")# Fix x_m to 1 without loss of generalitymodel.addConstr(xs[m] ==1, "x_m_is_1")
<gurobi.Constr *Awaiting Model Update*>
Next, for each term \(x_i + x_j + x_k < 2 x_l\) with \(1 \le i < j < k \le m\) and \(l \notin \{i,j,k\}\) and \(j < l\), we define a binary variable \(y_{i,j,k,l}\) which is equal to 1 if the inequality is satisfied. Because numerical solvers do not handle strict inequality, we introduce a small threshold.
# Define the y variables: one variable y_t for each tuple t = (i,j,k,l) in T, with corresponding inequality X_i + X_j + X_k < 2 X_l T = [ (i,j,k,l)for i inrange(1,m+1)for j inrange(1,m+1)for k inrange(1,m+1)for l inrange(1,m+1)if i < j and j < k and l notin [i,j,k] and j < l ]ys = model.addVars(T, vtype=GRB.BINARY, name=[f"X{t[0]} + X{t[1]} + X{t[2]} < 2*X{t[3]}"for t in T])threshold =2e-5for i,j,k,l in T:# https://docs.gurobi.com/projects/optimizer/en/12.0/reference/cpp/model.html#_CPPv4N8GRBModel21addGenConstrIndicatorE6GRBVariRK13GRBTempConstr6string model.addGenConstrIndicator( ys[(i,j,k,l)], # Indicator variable1, # If y[i, j, k, l] == 1, then the following inequality must hold: xs[i] + xs[j] + xs[k] -2* xs[l], GRB.LESS_EQUAL, -threshold, name=f"indicator_{i}_{j}_{k}_{l}" )
It remains to define the objective function (to maximize the number of indicator functions) and to solve the MILP.
We can further analyze the results: print the values of the \(x_i\) variables found as well as the list of satisfied and violated inequalities.
if model.status == GRB.OPTIMAL:print(f'Optimal Objective Value: {model.objVal}\n')# values for the x variables which achieve a maximum feasible subsystem# shift the indices by one to match the paper's notationfor i inrange(1,m+1):print(f'x_{i} = {xs[i].x}')# Summarize, using that the total number of versions of the inequality is 2 * |T|print("\nIn total, the fraction of satisfied inequalities \nis up to", int(sum(y.x for y in ys.values())), "/", 2*len(T),f'≈ {sum(y.x for y in ys.values())/(2*len(T)):.3f} at m = {m}')
Optimal Objective Value: 850.0
x_1 = 0.0
x_2 = 0.0
x_3 = 4.0000000000110856e-05
x_4 = 6.0000000000201704e-05
x_5 = 6.000000000029256e-05
x_6 = 0.5000700000000002
x_7 = 0.7500750000000002
x_8 = 0.8750775000000002
x_9 = 0.9375787500000002
x_10 = 0.9688293750000001
x_11 = 0.9844546875000003
x_12 = 1.0
In total, the fraction of satisfied inequalities
is up to 850 / 1980 ≈ 0.429 at m = 12
if model.status == GRB.OPTIMAL:# List the satisfied and violated inequalities separatelyprint(f'\nSatisfied inequalities (modulo threshold {threshold}):')for y in ys.values():if y.x >0.5:print(f'{y.varName}')print(f'\nViolated inequalities (modulo threshold {threshold}):')for y in ys.values():if y.x <=0.5:print(f'{y.varName}')
(Output not shown here for brevity.)
Redundant constraints
A common technique in MILP is to add redundant constraints. Redundant constraints do not change the feasible set of solutions, but can help the solver to prune branches of the search tree more efficiently. One interpretation is that such redundant constraints can inform the solver about the structure of the problem.
Let us define the same MILP as above:
import gurobipy as gpfrom gurobipy import GRBm =12model = gp.Model("MILP_violation_indicators")xs = model.addVars(range(1,m+1), vtype=GRB.CONTINUOUS, lb=0, ub=1, name="x")# Generate the order constraints x_1 <= x_2 <= ... <= x_m = 1for i inrange(1,m): model.addConstr(xs[i] <= xs[i +1], f"order_{i}")# Fix x_m to 1 without loss of generalitymodel.addConstr(xs[m] ==1, "x_m_is_1") T = [ (i,j,k,l)for i inrange(1,m+1)for j inrange(1,m+1)for k inrange(1,m+1)for l inrange(1,m+1)if i < j and j < k and l notin [i,j,k] and j < l ]ys = model.addVars(T, vtype=GRB.BINARY, name=[f"X{t[0]} + X{t[1]} + X{t[2]} < 2*X{t[3]}"for t in T])threshold =2e-5for i,j,k,l in T: model.addGenConstrIndicator( ys[(i,j,k,l)], # Indicator variable1, # If y[i, j, k, l] == 1, then the following inequality must hold: xs[i] + xs[j] + xs[k] -2* xs[l], GRB.LESS_EQUAL, -threshold, name=f"indicator_{i}_{j}_{k}_{l}" )
A first family of redundant constraints is obtained looking by forming \((i,j,k,l)\) and \((a,b,c,d)\) two tuples in \(T\), such that the implication \(x_a + x_b + x_c < 2*x_d \Rightarrow x_i + x_j + x_k < 2*x_l\) holds merely by the monotonicity \(0 \le x_1 \le ... \le x_m = 1\).
def children(t): i,j,k,l = t possible_children = [ (i+1, j, k, l), (i, j+1, k, l), (i, j, k+1, l), (i, j, k, l-1), ]return [tt for tt in possible_children if tt inset(T)]for t in T: y = ys[t]for tt in children(t): yy = ys[tt]# if t=(i,j,k,l) does not hold (y=0) then tt=(a,c,d,d) cannot hold either model.addConstr(yy <= y)
Anoter family of redundant constraints is obtained by noting that given six indices \(i_0<i_2<\dots<i_5\), the inequalities \(x_0 + x_1+x_4 < 2 x_2\) and \(x_2 + x_3 + x_5 < 2 x_4\) are not simultaneously satisfiable, so the sum of the two corresponding indicator variables is at most 1.
from itertools import combinationsfor subset in combinations(range(1, m+1), 6): subset =sorted(subset) i0, i1, i2, i3, i4, i5 = subset#{((0, 1, 4, 2), (2, 3, 5, 4))} is not satisfiable y = ys[(i0, i1, i4, i2)] yy = ys[(i2, i3, i5, i4)] model.addConstr( y + yy <=1 ) # not jointly satisfiable
We now set the objective function and solve the MILP as before to compare the speedup obtained by adding these redundant constraints.
We can further analyze the results: print the values of the \(x_i\) variables found
if model.status == GRB.OPTIMAL:print(f'Optimal Objective Value: {model.objVal}\n')for i inrange(1,m+1):print(f'x_{i} = {xs[i].x}')print("\nIn total, the fraction of satisfied inequalities \nis up to", int(sum(y.x for y in ys.values())), "/", 2*len(T),f'≈ {sum(y.x for y in ys.values())/(2*len(T)):.3f} at m = {m}')
Optimal Objective Value: 850.0
x_1 = 0.0
x_2 = 0.0
x_3 = 3.999999999990898e-05
x_4 = 5.999999999981796e-05
x_5 = 5.999999999981796e-05
x_6 = 0.5132116699219463
x_7 = 0.7632166699219464
x_8 = 0.8882191699219466
x_9 = 0.9507204199219468
x_10 = 0.9819710449219468
x_11 = 1.0
x_12 = 1.0
In total, the fraction of satisfied inequalities
is up to 850 / 1980 ≈ 0.429 at m = 12
Going further
The preprint Bellec and Fritz (2024) contains more details on this approach, including how to scale the resolution of such MILP to \(m\approx 25\), how to obtain verifiable upper bounds from numerical solutions such as the one above, and the latest lower and upper bounds known to us for this problem. The final version of the MILP above that we use to solve larger values of \(m\) is available in the ancilliary files of the arXiv preprint https://arxiv.org/abs/2412.15179.
References
Bellec, Pierre C, and Tobias Fritz. 2024. “Optimizing over Iid Distributions and the Beat the Average Game.”arXiv Preprint arXiv:2412.15179.