Why does classification cnns have max-pooling?

As one might imagine, this is for a position in computer vision. Because your feature maps are smaller after pooling, max-pooling in a CNN helps you to reduce computation. Because you’re using the maximal activation, you don’t lose too much semantic information. There’s also a theory that max-pooling contributes a bit to giving CNNs more translation in-variance. Check out this great video from Andrew Ng on the benefits of max-pooling.