matrixid矩阵向量求导法则

matrix identities

sam roweis (revisedJune 1999)

note that a , b , c and A , B , C do not depend on X , Y , x , y or z

0.1basic formulae

A (B +C ) =AB +AC (A +B ) T =A T +B T

(AB ) T =B T A T

if individual inverses exist

(AB ) −1=B −1A −1(A −1) T =(A T ) −1

(1a)(1b)(1c)(1d)(1e)

0.2trace, determinant and rank

|AB |=|A ||B |

1

|A −1|=

|A | |A |=evals

Tr [A ]=evals

if the cyclic products are well defined,

(2a)(2b)(2c)(2d)

Tr [ABC ... ]=Tr [BC ... A ]=Tr [C ... AB ]=... (2e)

T rank [A ]=rank A A =rank AA T (2f)

biggest eval

(2g)condition number =γ=

smallest eval

derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward.

1

0.3derivatives of traces

∂Tr [X ]∂X

∂Tr [XA ]∂Tr [AX ]

=

∂X ∂X T T

∂Tr X A ∂Tr AX

=

∂X X ∂T

∂Tr X AX

∂X −1∂Tr X A

∂X

=I =A T =A

=(A +A T ) X =−X −1A T X −1

(3a)(3b)(3c)(3d)(3e)

0.4derivatives of determinants

∂|AXB |

=|AXB |(X −1) T =|AXB |(X T ) −1

∂X

(4a)

∂ln |X |

=(X −1) T =(X T ) −1(4b)∂X

∂ln |X (z ) |∂X

=Tr X −1(4c)

∂z∂z∂|X T AX |

=|X T AX |(AX (X T AX ) −1+A T X (X T A T X ) −1) (4d)

∂X

0.5derivatives of scalar forms

∂(a T x ) ∂(x T a )

=∂x ∂x

T ∂(x Ax ) ∂x ∂(a T Xb ) ∂X T ∂(a X T b ) ∂X

∂(a T Xa ) ∂(a T X T a )

=

∂X ∂X

∂(a T X T CXb )

∂X T ∂(Xa +b ) C (Xa +b ) ∂X

2=a

=(A +A T ) x =ab T =ba T =aa T

=C T Xab T +CXba T =(C +C T )(Xa +b ) a T

(5a)(5b)(5c)(5d)(5e)(5f)(5g)

the derivative of one vector y with respect to another vector x is a matrix whose (i, j ) th element is ∂y(j ) /∂x(i ). such a derivative should be written as ∂y T /∂x in which case it is the Jacobian matrix of y wrt x . its determinant represents the ratio of the hypervolume d y to that of d x so that f (y ) d y =

f (y (x )) |∂y T /∂x |d x . however, the sloppy forms ∂y /∂x , ∂y T /∂x T and ∂y /∂x T are often used for this Jacobain matrix.

0.6derivatives of vector/matrixforms

∂(X −1) ∂z∂(Ax ) ∂z∂(XY ) ∂z∂(AXB ) ∂z∂(x T A ) ∂x ∂(x T ) ∂x T ∂(x Axx T ) ∂x

=−X −1=A

∂X −1

X ∂z

(6a)(6b)(6c)(6d)(6e)(6f)(6g)

∂x ∂z

∂Y ∂X =X +Y

∂z∂z∂X =A B

∂z=A =I

=(A +A T ) xx T +x T AxI

0.7constrained maximization

1

µT x −x T A −1x

2

the maximum over x of the quadratic form:

(7a)

subject to the J conditions c j (x ) =0is given by:

A µ+ACΛ,

Λ=−4(C T AC ) C T A µ

(7b)

where the j th column of C is ∂cj (x ) /∂x

0.8symmetric matrices

have real eigenvalues, though perhaps not distinct and can always be diag-onalized to the form:

A =CΛCT (8)

3

where the columns of C are (orthonormal)eigenvectors (i.e.CC T =I ) and the diagonal of Λhas the eigenvalues

0.9block matrices

for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices

however, determinants and inverses of block matrices are very tricky; for 2blocks by 2blocks the results are: A 11A 12 (9a) A 21A 22 =|A 22|·|F 11|=|A 11|·|F 22| −1 1−1−1A 11A 12F −−A A F 12221111=(9b)−1−1−1A 21A 22−F 22A 21A 11F 22

−1 1−1−1−1−1A 11+A −A F A A −F A A [**************]1=−1−1−1−11−1

−A 22A 21F 11A 22+A 22A 21F −11A 12A 22where

−1

F 11=A 11−A 12A 22A 21

1

F 22=A 22−A 21A −11A 12

for block diagonal matrices things are much easier:

A 110

0A 22 =|A 11||A 22|

−1 −1

0A 110A 11

=1

0A 220A −22

(9d)(9e)

0.10matrix inversion lemma (sherman-morrison-woodbury)

using the above results for block matrices we can make some substitutions

and get the following important results:

(A +XBX T ) −1=A −1−A −1X (B −1+X T A −1X ) −1X T A −1

|A +XBX T |=|B ||A ||B −1+X T A −1X |

(10)(11)

where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be con-verted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns

4

matrix identities

sam roweis (revisedJune 1999)

note that a , b , c and A , B , C do not depend on X , Y , x , y or z

0.1basic formulae

A (B +C ) =AB +AC (A +B ) T =A T +B T

(AB ) T =B T A T

if individual inverses exist

(AB ) −1=B −1A −1(A −1) T =(A T ) −1

(1a)(1b)(1c)(1d)(1e)

0.2trace, determinant and rank

|AB |=|A ||B |

1

|A −1|=

|A | |A |=evals

Tr [A ]=evals

if the cyclic products are well defined,

(2a)(2b)(2c)(2d)

Tr [ABC ... ]=Tr [BC ... A ]=Tr [C ... AB ]=... (2e)

T rank [A ]=rank A A =rank AA T (2f)

biggest eval

(2g)condition number =γ=

smallest eval

derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward.

1

0.3derivatives of traces

∂Tr [X ]∂X

∂Tr [XA ]∂Tr [AX ]

=

∂X ∂X T T

∂Tr X A ∂Tr AX

=

∂X X ∂T

∂Tr X AX

∂X −1∂Tr X A

∂X

=I =A T =A

=(A +A T ) X =−X −1A T X −1

(3a)(3b)(3c)(3d)(3e)

0.4derivatives of determinants

∂|AXB |

=|AXB |(X −1) T =|AXB |(X T ) −1

∂X

(4a)

∂ln |X |

=(X −1) T =(X T ) −1(4b)∂X

∂ln |X (z ) |∂X

=Tr X −1(4c)

∂z∂z∂|X T AX |

=|X T AX |(AX (X T AX ) −1+A T X (X T A T X ) −1) (4d)

∂X

0.5derivatives of scalar forms

∂(a T x ) ∂(x T a )

=∂x ∂x

T ∂(x Ax ) ∂x ∂(a T Xb ) ∂X T ∂(a X T b ) ∂X

∂(a T Xa ) ∂(a T X T a )

=

∂X ∂X

∂(a T X T CXb )

∂X T ∂(Xa +b ) C (Xa +b ) ∂X

2=a

=(A +A T ) x =ab T =ba T =aa T

=C T Xab T +CXba T =(C +C T )(Xa +b ) a T

(5a)(5b)(5c)(5d)(5e)(5f)(5g)

the derivative of one vector y with respect to another vector x is a matrix whose (i, j ) th element is ∂y(j ) /∂x(i ). such a derivative should be written as ∂y T /∂x in which case it is the Jacobian matrix of y wrt x . its determinant represents the ratio of the hypervolume d y to that of d x so that f (y ) d y =

f (y (x )) |∂y T /∂x |d x . however, the sloppy forms ∂y /∂x , ∂y T /∂x T and ∂y /∂x T are often used for this Jacobain matrix.

0.6derivatives of vector/matrixforms

∂(X −1) ∂z∂(Ax ) ∂z∂(XY ) ∂z∂(AXB ) ∂z∂(x T A ) ∂x ∂(x T ) ∂x T ∂(x Axx T ) ∂x

=−X −1=A

∂X −1

X ∂z

(6a)(6b)(6c)(6d)(6e)(6f)(6g)

∂x ∂z

∂Y ∂X =X +Y

∂z∂z∂X =A B

∂z=A =I

=(A +A T ) xx T +x T AxI

0.7constrained maximization

1

µT x −x T A −1x

2

the maximum over x of the quadratic form:

(7a)

subject to the J conditions c j (x ) =0is given by:

A µ+ACΛ,

Λ=−4(C T AC ) C T A µ

(7b)

where the j th column of C is ∂cj (x ) /∂x

0.8symmetric matrices

have real eigenvalues, though perhaps not distinct and can always be diag-onalized to the form:

A =CΛCT (8)

3

where the columns of C are (orthonormal)eigenvectors (i.e.CC T =I ) and the diagonal of Λhas the eigenvalues

0.9block matrices

for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices

however, determinants and inverses of block matrices are very tricky; for 2blocks by 2blocks the results are: A 11A 12 (9a) A 21A 22 =|A 22|·|F 11|=|A 11|·|F 22| −1 1−1−1A 11A 12F −−A A F 12221111=(9b)−1−1−1A 21A 22−F 22A 21A 11F 22

−1 1−1−1−1−1A 11+A −A F A A −F A A [**************]1=−1−1−1−11−1

−A 22A 21F 11A 22+A 22A 21F −11A 12A 22where

−1

F 11=A 11−A 12A 22A 21

1

F 22=A 22−A 21A −11A 12

for block diagonal matrices things are much easier:

A 110

0A 22 =|A 11||A 22|

−1 −1

0A 110A 11

=1

0A 220A −22

(9d)(9e)

0.10matrix inversion lemma (sherman-morrison-woodbury)

using the above results for block matrices we can make some substitutions

and get the following important results:

(A +XBX T ) −1=A −1−A −1X (B −1+X T A −1X ) −1X T A −1

|A +XBX T |=|B ||A ||B −1+X T A −1X |

(10)(11)

where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be con-verted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns

4


相关文章

  • 617 数学分析
  • 617 数学分析 三.考试形式一)试卷满分及考试时间 本试卷满分为150分,考试时间为180分钟. (二)答题方式 答题方式为闭卷.笔试.试卷由试题和答题纸组成,所有题目的答案必须写在答题纸相应的位置上.考生不得携带具有存储功能的计算器. ...查看


  • 四川省普通高等学校专升本
  • 四川省普通高等学校专升本 <大学计算机基础>考试大纲 一. 总体要求 要求考生掌握必备的有关的计算机基础知识和基本应用能力,掌握微机的基本操作和使用方法,并为以后的计算机课程学习打下必要的计算机知识基础.具体要求为: 1. 了解 ...查看


  • 高等数学教学大纲模板
  • [高职教学大纲模板] 江西工商职业技术学院 系 < 高等数学 >课程教学大纲 一.课程的性质与任务 (一)本课程的性质 (高等数学是高等职业院校经济类专科科各专业学生的一门必修的重要基础理论课,它是为培养我国社会主义现代化建设所 ...查看


  • 四川省专升本考试大纲
  • 四川省普通高等学校 <高等数学>考试大纲(理工类) 总要求 考生应理解或了解<高等数学>中函数.极限.连续.一元函数微分学.一元函数积分学.向量代数与空间解析几何.多元函数微积分学.无穷级数.常微分方程以及<线 ...查看


  • 2018年考研数学二大纲
  • 2018年考研数学(二)考试大纲 2018年数学一考试大纲 考试科目:线性代数.概率论与数理统计 高等数学 一.函数.极限.连续 考试内容 函数的概念及表示法 函数的有界性.单调性.周期性和奇偶性 复合函数.反函数.分段函数和隐函数 基本初 ...查看


  • 2015研究生数学一考试大纲
  • 2015年数学一考试大纲 考试科目:高等数学.线性代数.概率论与数理统计 考试形式和试卷结构: 一.试卷满分及考试时间 试卷满分为150分,考试时间为180分钟. 二.答题方式 答题方式为闭卷.笔试. 三.试卷内容结构 高等教学 约56% ...查看


  • 士研究生入学考试[数学](含高等数学.线性代数) 考试
  • 华中科技大学硕士研究生入学考试<数学>(含高等数学.线性代数) 考试大纲 一.函数.极限.连续 考试内容 函数的概念及表示法 函数的有界性.单调性.周期性和奇偶性 复合函数.反函数.分段函数和隐函数 基本初等函数的性质及其图形 ...查看


  • 高等数学大纲(物理类)
  • <高等数学>教学大纲 课程名称:高等数学 适用层次.专业:理科.工科各专业 学 时:320学时 学 分:20学分 课程类型:通识教育平台课 课 程 性 质:必修课 一.课程的教学目标与任务 高等数学是理.工.管等相关专业的第一基 ...查看


  • 中国精算师资格考试
  • 遵循国际惯例,中国精算师资格考试分为两个层次:第一个层次为准精算师考试,考试内容为精算人员必须掌握的精算理论和技能,以及基础的精算实务知识:第二个层次为精算师资格考试,内容以精算实务为主,涉及财务会计制度.社会保障制度.保险法规等.只有通过 ...查看


热门内容