Abstract
This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of stochastic samples allowed. The RSPG algorithm also employs a general distance function to allow taking advantage of the geometry of the feasible region. Complexity of this algorithm is established in a unified setting, which shows nearly optimal complexity of the algorithm for convex stochastic programming. A post-optimization phase is also proposed to significantly reduce the variance of the solutions returned by the algorithm. In addition, based on the RSPG algorithm, a stochastic gradient free algorithm, which only uses the stochastic zeroth-order information, has been also discussed. Some preliminary numerical results are also provided.

Similar content being viewed by others
References
Andradóttir, S.: A review of simulation optimization techniques. In: Proceedings of the Winter Simulation Conference, pp. 151–158 (1998)
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Ben-Tal, A., Margalit, T., Nemirovski, A.S.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 79–108 (2001)
Bregman, L.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)
Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)
Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)
Dang, C.D., Lan, G.: On the Convergence Properties of Non-Euclidean Extragradient Methods for Variational Inequalities with Generalized Monotone Operators, manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, April 2012. Available on http://www.optimization-online.org/
Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22, 674–701 (2012)
Duchi, J.C., Shalev-shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory (2010)
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. J. Am. Stat. Assoc., 385–394 (2005). Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Fu, M.: Gradient estimation. In: Henderson, S.G., Nelson, B.L. (eds). Handbooks in Operations Research and Management Science: Simulation. Elsevier, Amsterdam, pp. 575–616 (2008)
Fu, M.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14, 192–215 (2002)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Ghadimi, S., Lan, G.: Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Optimization. Technical Report. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA (2013)
Glasserman, P.: Gradient Estimation Via Perturbation Analysis. Kluwer, Boston, MA (2003)
Juditsky, A., Nemirovski, A.S.: Large Deviations of Vector-Valued Martingales in \(2\)-Smooth Normed Spaces, manuscript, Georgia Institute of Technology, Atlanta, GA, E-print: www2.isye.gatech.edu/~nemirovs/LargeDevSubmitted.pdf (2008)
Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization. I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge, MA (2011)
Kelton, R.P.S.W.D., Sturrock, D.T.: Simulation with Arena, 4th edn. McGraw-Hill, New York (2007)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)
LÉcuyer, P.: A unified view of the IPA, SF, and LR gradient estimation techniques. Manag. Sci. 36(11), 1364–1383 (1990)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)
Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, Chichester, New York (1983)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, MA (2004)
Nesterov, Y.E.: Random Gradient-Free Minimization of Convex Functions, Technical Report. Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (2010)
Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh. 7, 98–107 (1990)
Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rockafellar, R.T., Wets, R.J.-B.: Variational analysis, ser. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)
Rubinstein, R., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, New York (1993)
Schmidt, M., Roux, N.L., Bach, F.: Minimizing Finite Sums with the Stochastic Average Gradient, Technical Report (2013)
Spall, J.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken, NJ (2003)
Sra, S.: Scalable nonconvex inexact proximal splitting. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 530–538. Curran Associates, Inc. (2012)
Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7, 1069–1083 (1997)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was partially supported by NSF grants CMMI-1000347, CMMI-1254446, DMS-1319050, DMS-1016204 and ONR grant N00014-13-1-0036.
Rights and permissions
About this article
Cite this article
Ghadimi, S., Lan, G. & Zhang, H. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155, 267–305 (2016). https://doi.org/10.1007/s10107-014-0846-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-014-0846-1
Keywords
- Constrained stochastic programming
- Mini-batch of samples
- Stochastic approximation
- Nonconvex optimization
- Stochastic programming
- First-order method
- Zeroth-order method