How different is data in Kaggle competitions from real data?

In the real-world you don’t download datasets, you are the one creating them.

Most models are currently being sourced from relational databases.

So, when you are given a problem, the data is often in a database. You author the SQL necessary to extract that data and cleanse it for modeling.

Additionally, the data may be in several locations. For example, your data could be in a SQL Server DB, an Oracle DB and in some text files. You’ll be the one creating the solution to amalgamate that data.