Dask vs. Ray

chayan-kathuria · 11 October 2021 10:59

Dask (as a lower-level scheduler) and Ray overlap quite a bit in their goal of making it easier to execute Python code in parallel across clusters of machines. Dask focuses more on the data science world, providing higher-level APIs that in turn provide partial replacements for Pandas, NumPy, and scikit-learn, in addition to a low-level scheduling and cluster management framework.

The creators of Dask and Ray discuss how the libraries compare in this GitHub thread, and they conclude that the scheduling strategy is one of the key differentiators. Dask uses a centralized scheduler to share work across multiple cores, while Ray uses distributed bottom-up scheduling.