**Activation functions determine;**

the output of a deep learning model,

its accuracy

the computational efficiency of training a model—which can make or break a large scale neural network.

**Activation functions can affect the;**

network’s ability to converge and

the convergence speed, or

in some cases, activation functions might prevent neural networks from converging in the first place

A lot of experiments have been done already on which activation functions are the best for CNN and LSTM.

All CNN’s now use ReLU. (The same for feed-forward neural networks).

LSTM’s use sigmoid and hyperbolic tangent.

A good experiment to do on your own would be to test ReLU and sigmoid on a simple neural network with synthetic (simulated) data. You will see that with ReLU the back propagation algorithm converges faster.