If you are having 4GB RAM in your machine and you want to train your model on 10GB data set. How would you go about this problem

If you are having 4GB RAM in your machine and you want to train your model on 10GB data set. How would you go about this problem.

First of all you have to ask which ML model you want to train.

For Neural networks: Batch size with Numpy array will work.

Steps:

  1. Load the whole data in Numpy array. Numpy array has property to create mapping of complete data set, it doesn’t load complete data set in memory.
  2. You can pass index to Numpy array to get required data.
  3. Use this data to pass to Neural network.
  4. Have small batch size.

For SVM: Partial fit will work

Steps:

  1. Divide one big data set in small size data sets.
  2. Use partial fit method of SVM, it requires subset of complete data set.
  3. Repeat step 2 for other subsets.
    for more interview qns on ML do read the below artice:
    https://towardsdatascience.com/top-30-data-science-interview-questions-7dd9a96d3f5c