我想针对vgg16 conv4_3层内核的每个特征图(3x3x512x512尺寸矩阵)计算指定损失的一阶和二阶导数(Hessian的对角线部分).我知道如何根据如何计算Tensorflow中的所有二阶导数(仅是Hessian矩阵的对角线)? 但是,当排名上升时,我就迷失了方向.
I would like to compute the first and the second derivatives(diagonal part of Hessian) of my specified Loss with respect to each feature map of a vgg16 conv4_3 layer's kernel which is a 3x3x512x512 dimensional matrix. I know how to compute derivatives if it is respected to a low-rank one according to How to compute all second derivatives (only the diagonal of the Hessian matrix) in Tensorflow? However, when it turns to higher-rank, I got completed lost.
# Inspecting variables under Ipython notebook In : Loss Out : <tf.Tensor 'local/total_losses:0' shape=() dtype=float32> In : conv4_3_kernel.get_shape() Out : TensorShape([Dimension(3), Dimension(3), Dimension(512), Dimension(512)]) ## Compute derivatives Grad = tfpute_gradients(Loss, conv4_3_kernel) Hessian = tfpute_gradients(Grad, conv4_3_kernel) In : Grad Out : [<tf.Tensor 'gradients/vgg/conv4_3/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 512, 512) dtype=float32>] In : Hessian Out : [<tf.Tensor 'gradients_2/vgg/conv4_3/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 512, 512) dtype=float32>]请帮助我检查我的理解.因此,对于conv4_3_kernel,每个[Kx,Ky,in_channels,out_channels]的暗淡代表,因此相对于每个特征图中的每个元素(像素),Grad应该是Loss的偏导数. Hessian是二阶导数.
Please help me to check my understandings. So, for conv4_3_kernel, each dim stand for [Kx, Ky, in_channels, out_channels], so Grad should be partial derivatives of Loss with respect to each element(pixel) in the each feature maps. And Hessian is the second derivatives.
但是,Hessian计算所有导数,如何只计算对角线部分?我应该使用tf.diag_part()吗?提前非常感谢!
But, Hessian computes all the derivatives, how can I only compute only the diagonal part? should I use tf.diag_part()? Many thanks in advance!
推荐答案tfpute_gradients计算标量的导数.如果提供的数量不是标量,则通过汇总示例中所发生的成分将其变成标量
tfpute_gradients computes derivative of a scalar quantity. If the quantity provided isn't scalar, it turns it into scalar by summing up the components which is what's happening in your example
要计算完整的Hessian,您需要n调用tf.gradients,示例为此处.如果只需要对角线部分,则修改对tf.gradients的第i个调用的参数,以区分第i个变量而不是所有变量.
To compute full Hessian you need n calls to tf.gradients, The example is here. If you want just the diagonal part, then modify arguments to ith call to tf.gradients to differentiate with respect to ith variable, rather than all variables.