Hadoop is an open-source processing framework that manages data processing and storage for big data applications running on pooled systems.
Apache Hadoop is a collection of open-source utility software that makes it easy to use a network of multiple computers to solve problems involving large amounts of data and computation. It provides a software framework for distributed storage and big data processing using the MapReduce programming model.
Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packets of code to nodes to process the data in parallel. This allows the data set to be processed faster and more efficiently than if conventional supercomputing architecture were used.