What if a column has really a lot of categorical values. Say way above 100. Will you convert this column into features using one hot encoding? The model performance can drastically drop if there are too many inputs to it. [more on this in a different thread]
For this, try using tensorflow’s embedding layer.
Embedding layer converts sparse indexes into a vector.
The pre-built embedding_layer
instance can then be added to a Sequential
model (e.g. model.add(embedding_layer)
), called in a Functional model (e.g. x = embedding_layer(x)
), or used in a subclassed model.