Bi-level optimization problems seek to find a minimizer of an outer objective function constrained to the minimizers set of an inner optimization problem. These problems, both in the convex and non-convex settings, have diverse applications in the fields of machine learning, signal processing and many more. Since these problems inherently don’t satisfy Slater’s condition, exact solutions are mostly difficult to find. Consequently, existing algorithms only focus on approximating them. Most algorithms for bi-level optimization problems are based on regularization techniques, which regularize the outer objective function with the inner objective function in a certain way. Moreover, in the convex setting, all the proposed algorithms provide (if any) a convergence rate result only in terms of the inner objective function, but only under restrictive assumptions such as the outer objective function is smooth and/or strongly convex (except a very recent work that propose a very complicated algorithm).
In this thesis, we propose the Bi-Sub-Gradient (Bi-SG) method, which is a generalization of the classical sub-gradient method to the setting of bi-level optimization problems. This is a first-order method that is very easy to implement in the sense that it requires only a computation of the associated proximal mapping or a sub-gradient of the outer objective function, in addition to a proximal gradient step on the inner optimization problem (a step that is shared by many algorithms). We show, under very mild assumptions, that Bi-SG tackles bi-level optimization problems and achieves several theoretical guarantees. First, Bi-SG enjoys sub-linear rates both in terms of the inner and outer objective functions. Moreover, if the outer objective function is additionally strongly convex (could be still non-smooth), the outer rate can be improved to a linear rate. Last, we prove that the distance of the generated sequence to the set of optimal solutions of the bi-level problem converges to zero.
It should be noted that our mild assumptions do not include any differentiablity of the outer objective function nor its strong convexity. For instance, any smooth or Lipschitz continuous outer objective function satisfies the needed assumptions. Finally, we demonstrate Bi-SG’s performances in an extensive numerical comparison.