Initializers

2019-08-28(Wed)
initializers.py

Available initializers

Zeros
Ones
Constant
RandomNormal
RandomUniform
TruncatedNormal
VarianceScaling
Orthogonal
Identity
GlorotNormal
GlorotUniform
HeNormal
LeCunNormal
HeUniform
LeCunUniform

※ Followed by Plot function

In [18]:

fig, axes = galleryplot(
    func=visualizeInitialKernel,
    argnames="kernel_name",
    iterator=list(KerasyInitializerFunctions.keys()),
    sharex="all",
)

Zeros¶

Generates values initialized to $0$. This is often used as the bias initializer so that your network could learn without affected by bias parameter.

Warning

If you apply this initializer to edge weights between layers, loss will not convey (back propagation), so notwork could not learn anymore.

def kerasy.initializers.Zeros(shape, dtype=None)

In [29]:

plotMonoInitializer("zeros", xmin=-0.3, xmax=1.3)

Ones¶

Generates values initialized to $1$.

def kerasy.initializers.Ones(shape, dtype=None)

In [33]:

plotMonoInitializer("ones", xmin=-0.3, xmax=1.3)

Constant¶

Generates values initialized to user defined constant (=value).

def kerasy.initializers.Constant(shape, value=0, dtype=None)

In [32]:

plotMonoInitializer("constant", xmin=-0.3, xmax=1.3)

Random Normal¶

Generates values from Normal distribution.

def kerasy.initializers.RandomNormal(shape, mean=0, stddev=0.05, dtype=None, seed=None)

In [34]:

plotMonoInitializer("random_normal")

Random Uniform¶

Generates values from Uniform distribution.

def kerasy.initializers.RandomUniform(shape, minval=-0.05, maxval=0.05, dtype=None, seed=None)

In [35]:

plotMonoInitializer("random_uniform")

Truncated Normal¶

Generates values from Truncated Normal distribution, which is derived from a Normal distribution.

This distribution leave only the values within the standard deviation from the distribution that follows the Normal distribution.

Hint

It is recommended to use this distribution for initializing the edge weights between layers.

def kerasy.initializers.TruncatedNormal(shape, mean=0.0, stddev=0.05, dtype=None, seed=None)

In [36]:

plotMonoInitializer("truncated_normal")

Variance Scaling¶

Performs scaling according to the size of the weight and generates values from Truncated Normal or Uniform distributions.

def kerasy.initializers.VarianceScaling(shape, scale=1.0, mode='fan_in', distribution='normal', dtype=None, seed=None)

In [37]:

plotMonoInitializer("variance_scaling")

Orthogonal¶

Generates values as an Orthogonal matrix.

def kerasy.initializers.Orthogonal(shape, gain=1.0, dtype=None, seed=None)

In [38]:

plotMonoInitializer("orthogonal")

Reference

Exact solutions to the nonlinear dynamics of learning in deeplinear neural networks

Identity¶

Generates values as an Identity matrix.

def kerasy.initializers.Identity(shape, dtype=None, gain=1.0)

In [39]:

plotMonoInitializer("identity")

Warning

This initialization can be used when values is a 2-dimentional square matrix.

Glorot Normal¶

Generates values from the following distribution where

$n_{\text{in}}$: the number of input units
$n_{\text{out}}$: the number of output units

$$\text{Truncated Normal}\left(0, \frac{\sqrt{2}}{\sqrt{n_{\text{in}}} + \sqrt{n_{\text{out}}}}\right)$$

Glorot's objective is to validate

Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[z^i\right] = \mathrm{Var}\left[z^{i^{\prime}}\right]\quad (8)$$
Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^i}\right] = \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^{i^{\prime}}}\right]\quad (9)$$

These two conditions transform to:

$$ \begin{aligned} \forall i, n_i\mathrm{Var}\left[W^i\right] &= 1 & (10)\\ \forall i, n_{i+1}\mathrm{Var}\left[W^i\right] &= 1 & (11) \end{aligned} $$

As a compromise between these two constraints, He took

$$\forall i, \mathrm{Var}\left[W^i\right] = \frac{2}{n_i + n_{i+1}} \quad(12)$$

def kerasy.initializers.GlorotNormal(shape, dtype=None, seed=None)

In [40]:

plotMonoInitializer("glorot_normal")

Reference

Understanding the difficulty of training deep feedforward neural networks

Glorot Uniform¶

Generates values from the following distribution

$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}\right]$$

The idea is same with the Glorot Normal distribution.

def kerasy.initializer.GlorotUniform(shape, dtype=None, seed=None)

In [41]:

plotMonoInitializer("glorot_uniform")

Reference

Understanding the difficulty of training deep feedforward neural networks

He Normal¶

Generates values from the following distribution

$$\text{Truncated Normal}\left(0, \frac{\sqrt{2}}{\sqrt{n_{\text{in}}}}\right)$$

As the Glorot Normal has the restriction that it valids only when the activation function is "origin-symmetric" and "linear near the origin", this distribution was invented to deal with the exceptions:

ReLU Activation Function
Convolutional Layer

def kerasy.initializer.HeNormal(shape, dtype=None, seed=None)

In [42]:

plotMonoInitializer("he_normal")

Reference

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

LeCun Normal¶

Generates the values from He Normal distribution scaling to $1/\sqrt{2}$.

def kerasy.initializer.LeCunNormal(shape, dtype=None, seed=None)

In [43]:

plotMonoInitializer("lecun_normal")

Reference

He Uniform¶

Generates values from the following distribution

$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}\right]$$

The idea is same with the He Normal distribution.

def kerasy.initializer.HeUniform(shape, dtype=None, seed=None)

In [44]:

plotMonoInitializer("he_uniform")

Reference

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

LeCun Uniform¶

Generates the values from He Uniform distribution scaling to $1/\sqrt{2}$.

def kerasy.initializer.LeCunUniform(shape, dtype=None, seed=None)

In [45]:

plotMonoInitializer("lecun_uniform")

Reference

Efficient BackProp

Plot function¶

In [1]:

import numpy as np
import matplotlib.pyplot as plt

from kerasy.models import Sequential
from kerasy.layers import Input, Dense
from kerasy.utils import galleryplot
from kerasy.initializers import KerasyInitializerFunctions

In [2]:

def visualizeInitialKernel(kernel_name, input_units=1000, bins=30, ax=None, xmin=-0.3, xmax=0.3):
    if ax is None:
        fig,ax=plt.subplots()

    # Create the model.
    model = Sequential()
    model.add(Input(input_shape=(input_units,)))
    model.add(Dense(input_units, kernel_initializer=kernel_name))
    model.compile(optimizer='sgd', loss="mean_squared_error")

    # Get the layer's kernel(weights).
    freq = np.ravel(model.layers[-1].kernel)
    ax.hist(freq, bins=bins, density=True, color="#722f37")
    ax.set_xlim(xmin, xmax)
    ax.set_title(kernel_name, fontsize=16)

    return ax

In [3]:

def plotMonoInitializer(kernel_name, input_units=1000, bins=30, xmin=-0.3, xmax=0.3):
    ax = visualizeInitialKernel(kernel_name, input_units=input_units, bins=bins, xmin=xmin, xmax=xmax)
    ax.set_xlabel("Value", fontsize=14), ax.set_ylabel("Frequency", fontsize=14)
    ax.set_title(f"Initialize Method: {kernel_name}", fontsize=14)
    plt.show()