Initializers
※ Followed by Plot function
fig, axes = galleryplot(
func=visualizeInitialKernel,
argnames="kernel_name",
iterator=list(KerasyInitializerFunctions.keys()),
sharex="all",
)
Zeros¶
Generates values initialized to $0$. This is often used as the bias initializer so that your network could learn without affected by bias parameter.
Warning
If you apply this initializer to edge weights between layers, loss will not convey (back propagation), so notwork could not learn anymore.
def kerasy.initializers.Zeros(shape, dtype=None)
plotMonoInitializer("zeros", xmin=-0.3, xmax=1.3)
Ones¶
Generates values initialized to $1$.
def kerasy.initializers.Ones(shape, dtype=None)
plotMonoInitializer("ones", xmin=-0.3, xmax=1.3)
Constant¶
Generates values initialized to user defined constant (=value
).
def kerasy.initializers.Constant(shape, value=0, dtype=None)
plotMonoInitializer("constant", xmin=-0.3, xmax=1.3)
Random Normal¶
Generates values from Normal distribution.
def kerasy.initializers.RandomNormal(shape, mean=0, stddev=0.05, dtype=None, seed=None)
plotMonoInitializer("random_normal")
Random Uniform¶
Generates values from Uniform distribution.
def kerasy.initializers.RandomUniform(shape, minval=-0.05, maxval=0.05, dtype=None, seed=None)
plotMonoInitializer("random_uniform")
Truncated Normal¶
Generates values from Truncated Normal distribution, which is derived from a Normal distribution.
This distribution leave only the values within the standard deviation from the distribution that follows the Normal distribution.
Hint
It is recommended to use this distribution for initializing the edge weights between layers.
def kerasy.initializers.TruncatedNormal(shape, mean=0.0, stddev=0.05, dtype=None, seed=None)
plotMonoInitializer("truncated_normal")
Variance Scaling¶
Performs scaling according to the size of the weight and generates values from Truncated Normal or Uniform distributions.
def kerasy.initializers.VarianceScaling(shape, scale=1.0, mode='fan_in', distribution='normal', dtype=None, seed=None)
plotMonoInitializer("variance_scaling")
Orthogonal¶
Generates values as an Orthogonal matrix.
def kerasy.initializers.Orthogonal(shape, gain=1.0, dtype=None, seed=None)
plotMonoInitializer("orthogonal")
Identity¶
Generates values as an Identity matrix.
def kerasy.initializers.Identity(shape, dtype=None, gain=1.0)
plotMonoInitializer("identity")
Warning
This initialization can be used when values is a 2-dimentional square matrix.
Glorot Normal¶
Generates values from the following distribution where
- $n_{\text{in}}$: the number of input units
- $n_{\text{out}}$: the number of output units
Glorot's objective is to validate
- Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[z^i\right] = \mathrm{Var}\left[z^{i^{\prime}}\right]\quad (8)$$
- Each "activation variances" in a layer has the same variance. $$\forall(i,i^{\prime}), \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^i}\right] = \mathrm{Var}\left[\frac{\partial \text{Cost}}{\partial s^{i^{\prime}}}\right]\quad (9)$$
These two conditions transform to:
$$ \begin{aligned} \forall i, n_i\mathrm{Var}\left[W^i\right] &= 1 & (10)\\ \forall i, n_{i+1}\mathrm{Var}\left[W^i\right] &= 1 & (11) \end{aligned} $$As a compromise between these two constraints, He took
$$\forall i, \mathrm{Var}\left[W^i\right] = \frac{2}{n_i + n_{i+1}} \quad(12)$$def kerasy.initializers.GlorotNormal(shape, dtype=None, seed=None)
plotMonoInitializer("glorot_normal")
Glorot Uniform¶
Generates values from the following distribution
$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}} + n_{\text{out}}}}\right]$$The idea is same with the Glorot Normal distribution.
def kerasy.initializer.GlorotUniform(shape, dtype=None, seed=None)
plotMonoInitializer("glorot_uniform")
He Normal¶
Generates values from the following distribution
As the Glorot Normal has the restriction that it valids only when the activation function is "origin-symmetric" and "linear near the origin", this distribution was invented to deal with the exceptions:
- ReLU Activation Function
- Convolutional Layer
def kerasy.initializer.HeNormal(shape, dtype=None, seed=None)
plotMonoInitializer("he_normal")
LeCun Normal¶
Generates the values from He Normal distribution scaling to $1/\sqrt{2}$.
def kerasy.initializer.LeCunNormal(shape, dtype=None, seed=None)
plotMonoInitializer("lecun_normal")
He Uniform¶
Generates values from the following distribution
$$\mathrm{Uni}\left[-\frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}, \frac{\sqrt{6}}{\sqrt{n_{\text{in}}}}\right]$$The idea is same with the He Normal distribution.
def kerasy.initializer.HeUniform(shape, dtype=None, seed=None)
plotMonoInitializer("he_uniform")
LeCun Uniform¶
Generates the values from He Uniform distribution scaling to $1/\sqrt{2}$.
def kerasy.initializer.LeCunUniform(shape, dtype=None, seed=None)
plotMonoInitializer("lecun_uniform")
Reference
Plot function¶
import numpy as np
import matplotlib.pyplot as plt
from kerasy.models import Sequential
from kerasy.layers import Input, Dense
from kerasy.utils import galleryplot
from kerasy.initializers import KerasyInitializerFunctions
def visualizeInitialKernel(kernel_name, input_units=1000, bins=30, ax=None, xmin=-0.3, xmax=0.3):
if ax is None:
fig,ax=plt.subplots()
# Create the model.
model = Sequential()
model.add(Input(input_shape=(input_units,)))
model.add(Dense(input_units, kernel_initializer=kernel_name))
model.compile(optimizer='sgd', loss="mean_squared_error")
# Get the layer's kernel(weights).
freq = np.ravel(model.layers[-1].kernel)
ax.hist(freq, bins=bins, density=True, color="#722f37")
ax.set_xlim(xmin, xmax)
ax.set_title(kernel_name, fontsize=16)
return ax
def plotMonoInitializer(kernel_name, input_units=1000, bins=30, xmin=-0.3, xmax=0.3):
ax = visualizeInitialKernel(kernel_name, input_units=input_units, bins=bins, xmin=xmin, xmax=xmax)
ax.set_xlabel("Value", fontsize=14), ax.set_ylabel("Frequency", fontsize=14)
ax.set_title(f"Initialize Method: {kernel_name}", fontsize=14)
plt.show()