Initializers

※ Followed by Plot function

In [18]:
fig, axes = galleryplot(
    func=visualizeInitialKernel,
    argnames="kernel_name",
    iterator=list(KerasyInitializerFunctions.keys()),
    sharex="all",
)

Zeros

Generates values initialized to $0$. This is often used as the bias initializer so that your network could learn without affected by bias parameter.

Warning

If you apply this initializer to edge weights between layers, loss will not convey (back propagation), so notwork could not learn anymore.

def kerasy.initializers.Zeros(shape, dtype=None)
In [29]:
plotMonoInitializer("zeros", xmin=-0.3, xmax=1.3)

Ones

Generates values initialized to $1$.

def kerasy.initializers.Ones(shape, dtype=None)
In [33]:
plotMonoInitializer("ones", xmin=-0.3, xmax=1.3)

Constant

Generates values initialized to user defined constant (=value).

def kerasy.initializers.Constant(shape, value=0, dtype=None)
In [32]:
plotMonoInitializer("constant", xmin=-0.3, xmax=1.3)

Random Normal

Generates values from Normal distribution.

def kerasy.initializers.RandomNormal(shape, mean=0, stddev=0.05, dtype=None, seed=None)
In [34]:
plotMonoInitializer("random_normal")

Random Uniform

Generates values from Uniform distribution.

def kerasy.initializers.RandomUniform(shape, minval=-0.05, maxval=0.05, dtype=None, seed=None)
In [35]:
plotMonoInitializer("random_uniform")

Truncated Normal

Generates values from Truncated Normal distribution, which is derived from a Normal distribution.

This distribution leave only the values within the standard deviation from the distribution that follows the Normal distribution.

Hint

It is recommended to use this distribution for initializing the edge weights between layers.

def kerasy.initializers.TruncatedNormal(shape, mean=0.0, stddev=0.05, dtype=None, seed=None)
In [36]:
plotMonoInitializer("truncated_normal")

Variance Scaling

Performs scaling according to the size of the weight and generates values from Truncated Normal or Uniform distributions.

def kerasy.initializers.VarianceScaling(shape, scale=1.0, mode='fan_in', distribution='normal', dtype=None, seed=None)
In [37]:
plotMonoInitializer("variance_scaling")

Orthogonal

Generates values as an Orthogonal matrix.

def kerasy.initializers.Orthogonal(shape, gain=1.0, dtype=None, seed=None)
In [38]:
plotMonoInitializer("orthogonal")

Identity

Generates values as an Identity matrix.

def kerasy.initializers.Identity(shape, dtype=None, gain=1.0)
In [39]:
plotMonoInitializer("identity")

Warning

This initialization can be used when values is a 2-dimentional square matrix.

Glorot Normal

Generates values from the following distribution where

  • $n_{\text{in}}$: the number of input units
  • $n_{\text{out}}$: the number of output units
$$\text{Truncated Normal}\left(0, \frac{\sqrt{2}}{\sqrt{n_{\text{in}}} + \sqrt{n_{\text{out}}}}\right)$$

Glorot's objective is to validate

  • Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[z^i\right] = \mathrm{Var}\left[z^{i^{\prime}}\right]\quad (8)$$
  • Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^i}\right] = \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^{i^{\prime}}}\right]\quad (9)$$

These two conditions transform to:

$$ \begin{aligned} \forall i, n_i\mathrm{Var}\left[W^i\right] &= 1 & (10)\\ \forall i, n_{i+1}\mathrm{Var}\left[W^i\right] &= 1 & (11) \end{aligned} $$

As a compromise between these two constraints, He took

$$\forall i, \mathrm{Var}\left[W^i\right] = \frac{2}{n_i + n_{i+1}} \quad(12)$$
def kerasy.initializers.GlorotNormal(shape, dtype=None, seed=None)
In [40]:
plotMonoInitializer("glorot_normal")

Glorot Uniform

Generates values from the following distribution

$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}\right]$$

The idea is same with the Glorot Normal distribution.

def kerasy.initializer.GlorotUniform(shape, dtype=None, seed=None)
In [41]:
plotMonoInitializer("glorot_uniform")

He Normal

Generates values from the following distribution

$$\text{Truncated Normal}\left(0, \frac{\sqrt{2}}{\sqrt{n_{\text{in}}}}\right)$$

As the Glorot Normal has the restriction that it valids only when the activation function is "origin-symmetric" and "linear near the origin", this distribution was invented to deal with the exceptions:

  • ReLU Activation Function
  • Convolutional Layer
def kerasy.initializer.HeNormal(shape, dtype=None, seed=None)
In [42]:
plotMonoInitializer("he_normal")

LeCun Normal

Generates the values from He Normal distribution scaling to $1/\sqrt{2}$.

def kerasy.initializer.LeCunNormal(shape, dtype=None, seed=None)
In [43]:
plotMonoInitializer("lecun_normal")

He Uniform

Generates values from the following distribution

$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}\right]$$

The idea is same with the He Normal distribution.

def kerasy.initializer.HeUniform(shape, dtype=None, seed=None)
In [44]:
plotMonoInitializer("he_uniform")

LeCun Uniform

Generates the values from He Uniform distribution scaling to $1/\sqrt{2}$.

def kerasy.initializer.LeCunUniform(shape, dtype=None, seed=None)
In [45]:
plotMonoInitializer("lecun_uniform")

Reference


Plot function

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from kerasy.models import Sequential
from kerasy.layers import Input, Dense
from kerasy.utils import galleryplot
from kerasy.initializers import KerasyInitializerFunctions
In [2]:
def visualizeInitialKernel(kernel_name, input_units=1000, bins=30, ax=None, xmin=-0.3, xmax=0.3):
    if ax is None:
        fig,ax=plt.subplots()

    # Create the model.
    model = Sequential()
    model.add(Input(input_shape=(input_units,)))
    model.add(Dense(input_units, kernel_initializer=kernel_name))
    model.compile(optimizer='sgd', loss="mean_squared_error")

    # Get the layer's kernel(weights).
    freq = np.ravel(model.layers[-1].kernel)
    ax.hist(freq, bins=bins, density=True, color="#722f37")
    ax.set_xlim(xmin, xmax)
    ax.set_title(kernel_name, fontsize=16)

    return ax
In [3]:
def plotMonoInitializer(kernel_name, input_units=1000, bins=30, xmin=-0.3, xmax=0.3):
    ax = visualizeInitialKernel(kernel_name, input_units=input_units, bins=bins, xmin=xmin, xmax=xmax)
    ax.set_xlabel("Value", fontsize=14), ax.set_ylabel("Frequency", fontsize=14)
    ax.set_title(f"Initialize Method: {kernel_name}", fontsize=14)
    plt.show()