Deep network means you add more layers: your data goes through more levels of features.
Convolutional network means you do your layers differently: your features are shared between your data, which is organized in a 1D (e.g. a time series), 2D (e.g. a picture), 3D (e.g. a video) structure. For example, instead of computing something on a full 400x400 pixel image, you would run the same computation over 3x3 segments of the image, transforming its 3 features (colors, encoded as e.g. yuv) into e.g. 12 features, getting another “image” that may be your output but, more likely, is the input for your next layer in your deep convolutional network. Because, more often than not, CNNs are also deep networks