- Write, test, and run code and algorithms on the data to ensure they run and function as planned.
- The creation of a data pipeline/API/microservice.
- Check in with the team for 15-30 minutes to go over progress and problems.
- Infrastructure Installation/Maintenance
- Examine work performed by others on the team to ensure it adheres to best practices and functions as planned.
- Bug fixes, code enhancements, and documentation
- Ad hoc meetings with individuals to work out any issues or roadblocks.
- Also, organize work for the team for the future days and weeks, and consult with management on decisions.
You will work on various code components depending on the project phase: new features, debugging, maintenance, and stability.
It’s also important to remember that coding is about “less” — eliminating code — as well as “more” (adding lines of code). A good example is to look at the top Apache Spark committers here. We can observe that the majority of them have a negative ratio; they eliminated more lines than they added!
Most of the time, data engineers are sitting between the hammer (data consumers aka data analysts/data scientists/business/microservice) and the anvil (data producers). If something goes wrong for the data consumer, the data engineers will be the first to blame.