Why is it necessary to avoid dummy variable trap

One hot encoding will create exact number of features as the number of categories, as yes or no responses to those categories. For e.g. if there are 4 categories, it’s clear that if some record doesn’t correspond to any of the 3 categories, then the 4th category is applicable to that record.

Hence for n categories, the last one feature will be 100, linearly dependent on the rest n-1 features.

Now why do we remove highly correlated features or redundant features, is not to improve the results, but to gain better model interpretability.