www-ai.cs.tu-dortmund.de/LEHRE/FACHPROJEKT/SS12/paper/classification/bottou2010.pdf
time in practice. A smart selection of the gains γt helps achieving the promised performance (Xu, 2010).
8 Léon Bottou
Algorithm Time Test Error
Hinge loss SVM, λ = 10−4. SVMLight 23,642 s. 6.02 % SVMPerf [...] algorithms on a variety of linear systems. We use gains γt = γ0(1 + λγ0t)
−1 for SGD and, following (Xu, 2010), γt = γ0(1 + λγ0t)
−0.75 for ASGD. The initial gains γ0 were set manually by observing the performance [...] HOFF, M. E. (1960): Adaptive switching circuits. IRE WESCON Conv. Record, Part 4., 96-104.
XU, W. (2010): Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent. Journal …