Tag: softmax
-
The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel
The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel Softmax Regression is simply Logistic Regression extended to multiple classes. By computing one linear score per class and normalizing them with Softmax, we obtain multiclass probabilities without changing the core logic. The loss, the gradients, and the optimization remain the same. Only the number…
-
Learning Triton One Kernel at a Time: Softmax
Learning Triton One Kernel at a Time: Softmax All you need to know about a fast, readable and PyTorch-ready softmax kernel The post Learning Triton One Kernel at a Time: Softmax appeared first on Towards Data Science. Ryan Pégoud Go to original source
-
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration
Box-Constrained Softmax Function and Its Application for Post-Hoc Calibration arXiv:2506.10572v1 Announce Type: new Abstract: Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities,…
-
Convergence Rates for Softmax Gating Mixture of Experts
Convergence Rates for Softmax Gating Mixture of Experts arXiv:2503.03213v1 Announce Type: new Abstract: Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax…
-
Linearizing Attention
Linearizing Attention Breaking the quadratic barrier: modern alternatives to softmax attention Large Languange Models are great but they have a slight drawback that they use softmax attention which can be computationally intensive. In this article we will explore if there is a way we can replace the softmax somehow to achieve linear time complexity. Image…