Multinode Training with PyCaffe

Now Intel Caffe (release 1.1.1) supports multinode training via PyCaffe interface. To speed up training by multinode on Intel CPUs, you can simply inject 2 lines of code into python code. Here takes LeNet as an example. For single node, the python code is:

     ...
     solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
     ...
     for it in range(niter):
        solver.step(1)  # SGD by Caffe
        ...

To support multinode, you need to inject 2 lines of code to initialize MultiSync object as below:

     sync = caffe.MultiSync(solver)
     sync.init()

Sample code for multinode with the lines of code above:

     ...
     solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
     sync = caffe.MultiSync(solver)
     sync.init()
     ...
     for it in range(niter):
        solver.step(1)  # SGD by Caffe
        ...

And to achieve better performance, we recommend calling update_and_forward, clear_param_diffs and backward functions, instead of step function only, to overlap gradient synchronization and update with forward:

    ...
    # we need to call step once as test net used shared weights of train net
    solver.step(1)
    for it in range(niter):
        sync.solver.update_and_forward()
        sync.solver.clear_param_diffs()
        sync.solver.backward()
        solver.increment_iter()
        ...

Sample code for single node

    ...
    solver = caffe.SGDSolver('examples/mnist/lenet_auto_solver.prototxt')
    ...
    # we need to call step once as test net used shared weights of train net
    solver.step(1)
    for it in range(niter):
        solver.net.clear_param_diffs()
        solver.net.forward()
        solver.net.backward()
        solver.apply_update()
        solver.increment_iter()
        ...

Uh oh!

Multinode Training with PyCaffe

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally