TPUで学習率減衰させる方法

Posted On 2019-02-17

4.5k{icon} {views}

TPUで学習率減衰したいが、TensorFlowのオプティマイザーを使うべきか、tf.kerasのオプティマイザーを使うべきか、あるいはKerasのオプティマイザーを使うべきか非常にややこしいことがあります。TPUで学習率を減衰させる方法を再現しました。

結論から

TPU環境でtf.kerasのLearningRateSchedulerで学習率減衰させるなら、tf.kerasのオプティマイザーを使うのが正解

ということです。以前はちゃんと機能しなかったような記憶がありましたが、いつのまにかバグFixされていたみたいですね。同じtf.keras同士でわかりやすいです。

KerasのオプティマイザーとTensorFlowのオプティマイザーは学習率の表記が違う

まずこれは抑えておきたいところです。これはTPU関係なくおこります。Kerasのオプティマイザーは学習率は「lr」、TensorFlowのオプティマイザーは「learning_rate」と表記されています。

この表記ゆれがなぜ困るのかというと、例えばtf.kerasのAPIから学習率減衰（LearningRateSchedulerなど）でTensorFlowのオプティマイザーに対して作用させるとなると、「lrというパラメーターがない」と怒られてしまいます。

例えばサインカーブを線形近似させる例です。

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import LearningRateScheduler
import numpy as np

input = layers.Input((1,))
x = layers.Dense(1)(input)
model = Model(input, x)

X = np.random.uniform(0, 10, (100,1))
y = np.sin(X)

model.compile(tf.train.GradientDescentOptimizer(0.01), "mean_squared_error")

def decay(epoch):
    if epoch >= 5: return 0.001
    if epoch >= 10: return 0.0001

scheduler = LearningRateScheduler(decay, verbose=1)
model.fit(X, y, callbacks=[scheduler], epochs=15)

学習率の減衰は一般的なLearningRateSchedulerを使っています。ここで、オプティマイザーはTensorFlowのオプティマイザーを使っていることに注目してください。ちなみにこれは学習率減衰をしない（callbacks=…を外す）と正しく動きます。

Epoch 1/15
100/100 [==============================] - 0s 1ms/sample - loss: 6.3527
Epoch 2/15
100/100 [==============================] - 0s 78us/sample - loss: 0.4083
Epoch 3/15
100/100 [==============================] - 0s 80us/sample - loss: 0.4108
Epoch 4/15
100/100 [==============================] - 0s 74us/sample - loss: 0.4022
Epoch 5/15
100/100 [==============================] - 0s 62us/sample - loss: 0.4019

しかし、学習率減衰を入れるとエラーになります。

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, max_queue_size, workers, use_multiprocessing, **kwargs)
    878           initial_epoch=initial_epoch,
    879           steps_per_epoch=steps_per_epoch,
--> 880           validation_steps=validation_steps)
    881 
    882   def evaluate(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, mode, validation_in_fit, **kwargs)
    250     epoch_logs = {}
    251     model.reset_metrics()
--> 252     callbacks.on_epoch_begin(epoch, epoch_logs, mode=mode)
    253     progbar.on_epoch_begin(epoch, epoch_logs)
    254 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py in on_epoch_begin(self, epoch, logs, mode)
    235       logs = logs or {}
    236       for callback in self.callbacks:
--> 237         callback.on_epoch_begin(epoch, logs)
    238     self._reset_batch_timing()
    239 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py in on_epoch_begin(self, epoch, logs)
    810   def on_epoch_begin(self, epoch, logs=None):
    811     if not hasattr(self.model.optimizer, 'lr'):
--> 812       raise ValueError('Optimizer must have a "lr" attribute.')
    813     try:  # new API
    814       lr = float(K.get_value(self.model.optimizer.lr))

ValueError: Optimizer must have a "lr" attribute.

これはなぜなら、Kerasの学習率はlrであるのに対して、TensorFlowの学習率はlearning_rateだからです。表記の違いの問題です。

KerasがTensorFlowに統合されてあまり時間が経っていないのでここらへんは仕方ないかなという感じがあります。

TPUで学習率減衰が実装できている例

GoogleCloundが出している公式実装ですが、これはtf.kerasを使って学習率減衰ができています。
https://colab.research.google.com/github/GoogleCloudPlatform/training-data-analyst/blob/master/courses/fast-and-lean-data-science/01_MNIST_TPU_Keras.ipynb#scrollTo=56y8UNFQIVwj

要点だけかいつまむと、

model.compile(optimizer='adam', # learning rate will be set by LearningRateScheduler
              loss='categorical_crossentropy',
              metrics=['accuracy'])

このようにoptimizerを文字列で指定し、学習率は指定しません。学習率減衰と初期値のセットをLearningRateSchedulerを使って同時にやっています。

lr_decay = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 0.0001 + 0.02 * math.pow(0.5, 1+epoch), verbose=True)

これは指数減衰ですが、初期学習率の定義がLearningRateSchedulerにあることを確認してください。

この例のメッセージを読んでいくと、どうもこのKerasCrossShardやtensorflow.python.keras.optimizerだと学習率減衰ができていることに気づきます。

INFO:tensorflow:CPU -> TPU lr: 0.010099999606609344 {0.0101}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: epsilon
WARNING:tensorflow:Cannot update non-variable config: amsgrad
463/468 [============================>.] - ETA: 0s - loss: 0.0601 - acc: 0.9817INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(10000,), dtype=tf.int32, name=None), TensorSpec(shape=(10000, 784), dtype=tf.float32, name=None), TensorSpec(shape=(10000, 10), dtype=tf.float32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.010099999606609344, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for reshape_4_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f69b63db080> []

このKerasCrossShardとはなんだろうと考えると、おそらくですが、(1)文字列で指定したオプティマイザーをコンパイル時に関数やインスタンスとして呼び出します。これはtf.keras.optimizer.getなんとかみたいな関数があったはずです。ただこれはCPUのオプティマイザーなので、(2)それをTensorFlowのTPUのオプティマイザーと何らかの形で同期させる仕組みがある。学習率減衰がTPU側で反映されるのもこのおかげ。特に(2)がKerasCrossShardのやっていることではないでしょうか。

じゃあ文字列ではなくtf.kerasのオプティマイザーなら、LearningRateSchedulerとセットで使えるからなんでもいいのではないか、ということに気づきます。それを試してみましょう。

tf.kerasのオプティマイザー＋LearningRateScheduler

オプティマイザーのロード。これは普通のKerasの場合と同じですね。ただtf.kerasから呼び出しているのに気をつけましょう。

from tensorflow.keras.optimizers import SGD

コンパイル。これもKerasと同じです。モメンタムオプティマイザーとして使います。

model.compile(SGD(0.1, 0.9), "categorical_crossentropy", ["acc"])

次にLeraningRateSchedulerの関数。

def step_decay(epoch):
    x = 0.1
    if epoch >= 2: x /= 5.0
    if epoch >= 4: x /= 5.0
    if epoch >= 6: x /= 5.0
    return x

わかりやすいように2,4,6エポック後にdecayさせています。あとは普通にLearningRateSchedulerを定義し

decay = LearningRateScheduler(step_decay, verbose=1)

model.fitのcallbackに放り込みます。このへんはもう普通のKerasと同じです。

model.fit(……, callbacks=[decay])

実際に試してみた

CIFAR-10をやってみました。コードとかは省略します。

Epoch 00001: LearningRateScheduler reducing learning rate to 0.1.
Epoch 1/250
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(16,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(16, 32, 32, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(16, 10), dtype=tf.float32, name='dense_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
INFO:tensorflow:Remapping placeholder for input_1
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/keras_support.py:302: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.SGD object at 0x7feff5c5d978> []
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 7.1205525398254395 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.10000000149011612 {0.1}
INFO:tensorflow:CPU -> TPU momentum: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: nesterov
379/391 [============================>.] - ETA: 1s - loss: 1.6532 - acc: 0.3963INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(10,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(10, 32, 32, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(10, 10), dtype=tf.float32, name='dense_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input_1
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.SGD object at 0x7feff5c5d978> [<tf.Variable 'tpu_140668637141536/SGD/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff378e6d8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff378e9b0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff378efd0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3721ef0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3707d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3670240>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3638470>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff35db390>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff35a8a58>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3534dd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff34ffe80>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff349bda0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3464ef0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff33d39b0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3399dd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff33bdf98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3386dd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff32938d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff325c1d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3278d30>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3245ac8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff31b3588>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3178e48>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff311c2e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3109c88>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff3075c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff303c4a8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2fdc8d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2fa7a58>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2f39a20>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2efef60>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2e9ee48>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2e67ba8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2dd3a90>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2d9dcc0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2dc1898>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2d88eb8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7feff2c97b38>]
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 5.8280065059661865 secs
389/391 [============================>.] - ETA: 0s - loss: 1.6448 - acc: 0.3993INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(16,), dtype=tf.int32, name='core_id_10'), TensorSpec(shape=(16, 32, 32, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(16, 10), dtype=tf.float32, name='dense_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
INFO:tensorflow:Remapping placeholder for input_1
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.SGD object at 0x7fefefc39cc0> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 4.235922336578369 secs
75/79 [===========================>..] - ETA: 0s - loss: 1.9349 - acc: 0.3799INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_10'), TensorSpec(shape=(2, 32, 32, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(2, 10), dtype=tf.float32, name='dense_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input_1
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.SGD object at 0x7fefefc39cc0> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 2.825878858566284 secs
79/79 [==============================] - 12s 149ms/step - loss: 1.9321 - acc: 0.3798
391/391 [==============================] - 66s 170ms/step - loss: 1.6432 - acc: 0.4000 - val_loss: 1.9321 - val_acc: 0.3798

Epoch 00002: LearningRateScheduler reducing learning rate to 0.1.
Epoch 2/250
79/79 [==============================] - 2s 20ms/step - loss: 1.3690 - acc: 0.5634
391/391 [==============================] - 25s 63ms/step - loss: 1.1529 - acc: 0.5911 - val_loss: 1.3690 - val_acc: 0.5634

Epoch 00003: LearningRateScheduler reducing learning rate to 0.02.
Epoch 3/250
79/79 [==============================] - 2s 20ms/step - loss: 0.8036 - acc: 0.7183
391/391 [==============================] - 25s 63ms/step - loss: 0.8332 - acc: 0.7093 - val_loss: 0.8036 - val_acc: 0.7183

Epoch 00004: LearningRateScheduler reducing learning rate to 0.02.
Epoch 4/250
79/79 [==============================] - 2s 21ms/step - loss: 0.7545 - acc: 0.7434
391/391 [==============================] - 25s 63ms/step - loss: 0.7377 - acc: 0.7442 - val_loss: 0.7545 - val_acc: 0.7434

Epoch 00005: LearningRateScheduler reducing learning rate to 0.004.
Epoch 5/250
79/79 [==============================] - 2s 21ms/step - loss: 0.6223 - acc: 0.7877
391/391 [==============================] - 25s 63ms/step - loss: 0.6473 - acc: 0.7782 - val_loss: 0.6223 - val_acc: 0.7877

Epoch 00006: LearningRateScheduler reducing learning rate to 0.004.
Epoch 6/250
79/79 [==============================] - 2s 19ms/step - loss: 0.5958 - acc: 0.7986
391/391 [==============================] - 25s 63ms/step - loss: 0.6235 - acc: 0.7863 - val_loss: 0.5958 - val_acc: 0.7986

Epoch 00007: LearningRateScheduler reducing learning rate to 0.0008.
Epoch 7/250
79/79 [==============================] - 2s 20ms/step - loss: 0.5708 - acc: 0.8051
391/391 [==============================] - 25s 63ms/step - loss: 0.5994 - acc: 0.7948 - val_loss: 0.5708 - val_acc: 0.8051

Epoch 00008: LearningRateScheduler reducing learning rate to 0.0008.
Epoch 8/250
79/79 [==============================] - 2s 20ms/step - loss: 0.5586 - acc: 0.8109
391/391 [==============================] - 25s 63ms/step - loss: 0.5940 - acc: 0.7976 - val_loss: 0.5586 - val_acc: 0.8109

Epoch 00009: LearningRateScheduler reducing learning rate to 0.0008.
Epoch 9/250
79/79 [==============================] - 2s 20ms/step - loss: 0.5518 - acc: 0.8130
391/391 [==============================] - 25s 64ms/step - loss: 0.5943 - acc: 0.7945 - val_loss: 0.5518 - val_acc: 0.8130

Epoch 00010: LearningRateScheduler reducing learning rate to 0.0008.
Epoch 10/250
79/79 [==============================] - 2s 20ms/step - loss: 0.5594 - acc: 0.8088
391/391 [==============================] - 25s 63ms/step - loss: 0.5889 - acc: 0.7983 - val_loss: 0.5594 - val_acc: 0.8088

Epoch 00011: LearningRateScheduler reducing learning rate to 0.0008.
Epoch 11/250
79/79 [==============================] - 2s 20ms/step - loss: 0.5568 - acc: 0.8117
391/391 [==============================] - 26s 66ms/step - loss: 0.5834 - acc: 0.8001 - val_loss: 0.5568 - val_acc: 0.8117

なんかうまく行った。見せかけ上、下がっているのではなくて、実際に学習率が下がっているのは精度の上がり幅が明らかに小さくなっているのからわかります。

ちなみに学習率減衰させない場合はエラーになる（対応法あり）

ちなみに、このtf.kerasのオプティマイザーを使う方法、LearningRateSchedulerを入れないと次のようなエラーになります。

INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
[<tensorflow.python.keras.callbacks.History object at 0x7fd03e868e48>]
Epoch 1/100
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(16,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(16, 32, 32, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(16, 10), dtype=tf.float32, name='dense_target_30')]
INFO:tensorflow:Overriding default placeholder.
---------------------------------------------------------------------------

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _assert_fetchable(self, graph, op)
    495     if not graph.is_fetchable(op):
    496       raise ValueError(
--> 497           'Operation %r has been marked as not fetchable.' % op.name)
    498 
    499   def fetches(self):

ValueError: Operation 'tpu_140532515705128/VarIsInitializedOp' has been marked as not fetchable.

どうもこのtf.kerasのオプティマイザー、TPUで呼び出すと、例えばSGD(0.1)のようにインスタンスを作ったときの学習率を反映していないんですよね。実際、学習率を減衰したくない場合にも、常に学習率が同じになるようなSchedulerを作り、SGD(0.1)で定義してこのスケジューラーが常に0.05を返すようにすると、

same_lr = LearningRateScheduler(lambda epoch: 0.05, verbose=1)

INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
INFO:tensorflow:Cloning SGD {'lr': 0.10000000149011612, 'momentum': 0.8999999761581421, 'decay': 0.0, 'nesterov': False}
[<tensorflow.python.keras.callbacks.History object at 0x7fd05c60a6a0>, <tensorflow.python.keras.callbacks.LearningRateScheduler object at 0x7fd04b867dd8>]

Epoch 00001: LearningRateScheduler reducing learning rate to 0.05.
Epoch 1/10

学習率を上書きしてSchedulerのほうの学習率を使っているということになりますね。これだとエラーが出ないから、Fetchできないのはおそらくtf.kerasのオプティマイザーの学習率なんでしょうね。

結論

LearningRateScheduler＋tf.kerasのオプティマイザーを使え。学習率減衰しない場合でもLearningRateSchedulerは使え

以上。

Shikoan's ML Blogの中の人が運営しているサークル「じゅ～しぃ～すくりぷと」の本のご案内

技術書コーナー

北海道の駅巡りコーナー

Tags:CNN, DeepLearning, Google Colaboratory, TensorFlow, TPU