Activation functions determine;
the output of a deep learning model,
its accuracy
the computational efficiency of training a model—which can make or break a large scale neural network.
Activation functions can affect the;
network’s ability to converge and
the convergence speed, or
in some cases, activation functions might prevent neural networks from converging in the first place
A lot of experiments have been done already on which activation functions are the best for CNN and LSTM.
All CNN’s now use ReLU. (The same for feed-forward neural networks).
LSTM’s use sigmoid and hyperbolic tangent.
A good experiment to do on your own would be to test ReLU and sigmoid on a simple neural network with synthetic (simulated) data. You will see that with ReLU the back propagation algorithm converges faster.