Xavier Weight Initialization

The xavier initialization method is calculated as a random number with a uniform probability distribution (U) between the range -(1/sqrt(n)) and 1/sqrt(n), where n is the number of inputs to the node.

weight = U [-(1/sqrt(n)), 1/sqrt(n)]
We can implement this directly in Python.

The example below assumes 10 inputs to a node, then calculates the lower and upper bounds of the range and calculates 1,000 initial weight values that could be used for the nodes in a layer or a network that uses the sigmoid or tanh activation function.

After calculating the weights, the lower and upper bounds are printed as are the min, max, mean, and standard deviation of the generated weights.

The complete example is listed below.

example of the xavier weight initialization

from math import sqrt
from numpy import mean
from numpy.random import rand

number of nodes in the previous layer

n = 10

calculate the range for the weights

lower, upper = -(1.0 / sqrt(n)), (1.0 / sqrt(n))

generate random numbers

numbers = rand(1000)

scale to the desired range

scaled = lower + numbers * (upper - lower)

summarize

print(lower, upper)
print(scaled.min(), scaled.max())
print(scaled.mean(), scaled.std())

example of the xavier weight initialization

from math import sqrt
from numpy import mean
from numpy.random import rand

number of nodes in the previous layer

n = 10

calculate the range for the weights

lower, upper = -(1.0 / sqrt(n)), (1.0 / sqrt(n))

generate random numbers

numbers = rand(1000)

scale to the desired range

scaled = lower + numbers * (upper - lower)

summarize

print(lower, upper)
print(scaled.min(), scaled.max())
print(scaled.mean(), scaled.std())
Running the example generates the weights and prints the summary statistics.

We can see that the bounds of the weight values are about -0.316 and 0.316. These bounds would become wider with fewer inputs and more narrow with more inputs.

We can see that the generated weights respect these bounds and that the mean weight value is close to zero with the standard deviation close to 0.17.

-0.31622776601683794 0.31622776601683794
-0.3157663248679193 0.3160839282916222
0.006806069733149146 0.17777128902976705

-0.31622776601683794 0.31622776601683794
-0.3157663248679193 0.3160839282916222
0.006806069733149146 0.17777128902976705
It can also help to see how the spread of the weights changes with the number of inputs.

For this, we can calculate the bounds on the weight initialization with different numbers of inputs from 1 to 100 and plot the result.

The complete example is listed below.

plot of the bounds on xavier weight initialization for different numbers of inputs

from math import sqrt
from matplotlib import pyplot

define the number of inputs from 1 to 100

values = [i for i in range(1, 101)]

calculate the range for each number of inputs

results = [1.0 / sqrt(n) for n in values]

create an error bar plot centered on 0 for each number of inputs

pyplot.errorbar(values, [0.0 for _ in values], yerr=results)
pyplot.show()

plot of the bounds on xavier weight initialization for different numbers of inputs

from math import sqrt
from matplotlib import pyplot

define the number of inputs from 1 to 100

values = [i for i in range(1, 101)]

calculate the range for each number of inputs

results = [1.0 / sqrt(n) for n in values]

create an error bar plot centered on 0 for each number of inputs

pyplot.errorbar(values, [0.0 for _ in values], yerr=results)
pyplot.show()
Running the example creates a plot that allows us to compare the range of weights with different numbers of input values.