matrix identities
sam roweis (revisedJune 1999)
note that a , b , c and A , B , C do not depend on X , Y , x , y or z
0.1basic formulae
A (B +C ) =AB +AC (A +B ) T =A T +B T
(AB ) T =B T A T
if individual inverses exist
(AB ) −1=B −1A −1(A −1) T =(A T ) −1
(1a)(1b)(1c)(1d)(1e)
0.2trace, determinant and rank
|AB |=|A ||B |
1
|A −1|=
|A | |A |=evals
Tr [A ]=evals
if the cyclic products are well defined,
(2a)(2b)(2c)(2d)
Tr [ABC ... ]=Tr [BC ... A ]=Tr [C ... AB ]=... (2e)
T rank [A ]=rank A A =rank AA T (2f)
biggest eval
(2g)condition number =γ=
smallest eval
derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward.
1
0.3derivatives of traces
∂Tr [X ]∂X
∂Tr [XA ]∂Tr [AX ]
=
∂X ∂X T T
∂Tr X A ∂Tr AX
=
∂X X ∂T
∂Tr X AX
∂X −1∂Tr X A
∂X
=I =A T =A
=(A +A T ) X =−X −1A T X −1
(3a)(3b)(3c)(3d)(3e)
0.4derivatives of determinants
∂|AXB |
=|AXB |(X −1) T =|AXB |(X T ) −1
∂X
(4a)
∂ln |X |
=(X −1) T =(X T ) −1(4b)∂X
∂ln |X (z ) |∂X
=Tr X −1(4c)
∂z∂z∂|X T AX |
=|X T AX |(AX (X T AX ) −1+A T X (X T A T X ) −1) (4d)
∂X
0.5derivatives of scalar forms
∂(a T x ) ∂(x T a )
=∂x ∂x
T ∂(x Ax ) ∂x ∂(a T Xb ) ∂X T ∂(a X T b ) ∂X
∂(a T Xa ) ∂(a T X T a )
=
∂X ∂X
∂(a T X T CXb )
∂X T ∂(Xa +b ) C (Xa +b ) ∂X
2=a
=(A +A T ) x =ab T =ba T =aa T
=C T Xab T +CXba T =(C +C T )(Xa +b ) a T
(5a)(5b)(5c)(5d)(5e)(5f)(5g)
the derivative of one vector y with respect to another vector x is a matrix whose (i, j ) th element is ∂y(j ) /∂x(i ). such a derivative should be written as ∂y T /∂x in which case it is the Jacobian matrix of y wrt x . its determinant represents the ratio of the hypervolume d y to that of d x so that f (y ) d y =
f (y (x )) |∂y T /∂x |d x . however, the sloppy forms ∂y /∂x , ∂y T /∂x T and ∂y /∂x T are often used for this Jacobain matrix.
0.6derivatives of vector/matrixforms
∂(X −1) ∂z∂(Ax ) ∂z∂(XY ) ∂z∂(AXB ) ∂z∂(x T A ) ∂x ∂(x T ) ∂x T ∂(x Axx T ) ∂x
=−X −1=A
∂X −1
X ∂z
(6a)(6b)(6c)(6d)(6e)(6f)(6g)
∂x ∂z
∂Y ∂X =X +Y
∂z∂z∂X =A B
∂z=A =I
=(A +A T ) xx T +x T AxI
0.7constrained maximization
1
µT x −x T A −1x
2
the maximum over x of the quadratic form:
(7a)
subject to the J conditions c j (x ) =0is given by:
A µ+ACΛ,
Λ=−4(C T AC ) C T A µ
(7b)
where the j th column of C is ∂cj (x ) /∂x
0.8symmetric matrices
have real eigenvalues, though perhaps not distinct and can always be diag-onalized to the form:
A =CΛCT (8)
3
where the columns of C are (orthonormal)eigenvectors (i.e.CC T =I ) and the diagonal of Λhas the eigenvalues
0.9block matrices
for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices
however, determinants and inverses of block matrices are very tricky; for 2blocks by 2blocks the results are: A 11A 12 (9a) A 21A 22 =|A 22|·|F 11|=|A 11|·|F 22| −1 1−1−1A 11A 12F −−A A F 12221111=(9b)−1−1−1A 21A 22−F 22A 21A 11F 22
−1 1−1−1−1−1A 11+A −A F A A −F A A [**************]1=−1−1−1−11−1
−A 22A 21F 11A 22+A 22A 21F −11A 12A 22where
−1
F 11=A 11−A 12A 22A 21
1
F 22=A 22−A 21A −11A 12
for block diagonal matrices things are much easier:
A 110
0A 22 =|A 11||A 22|
−1 −1
0A 110A 11
=1
0A 220A −22
(9d)(9e)
0.10matrix inversion lemma (sherman-morrison-woodbury)
using the above results for block matrices we can make some substitutions
and get the following important results:
(A +XBX T ) −1=A −1−A −1X (B −1+X T A −1X ) −1X T A −1
|A +XBX T |=|B ||A ||B −1+X T A −1X |
(10)(11)
where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be con-verted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns
4
matrix identities
sam roweis (revisedJune 1999)
note that a , b , c and A , B , C do not depend on X , Y , x , y or z
0.1basic formulae
A (B +C ) =AB +AC (A +B ) T =A T +B T
(AB ) T =B T A T
if individual inverses exist
(AB ) −1=B −1A −1(A −1) T =(A T ) −1
(1a)(1b)(1c)(1d)(1e)
0.2trace, determinant and rank
|AB |=|A ||B |
1
|A −1|=
|A | |A |=evals
Tr [A ]=evals
if the cyclic products are well defined,
(2a)(2b)(2c)(2d)
Tr [ABC ... ]=Tr [BC ... A ]=Tr [C ... AB ]=... (2e)
T rank [A ]=rank A A =rank AA T (2f)
biggest eval
(2g)condition number =γ=
smallest eval
derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward.
1
0.3derivatives of traces
∂Tr [X ]∂X
∂Tr [XA ]∂Tr [AX ]
=
∂X ∂X T T
∂Tr X A ∂Tr AX
=
∂X X ∂T
∂Tr X AX
∂X −1∂Tr X A
∂X
=I =A T =A
=(A +A T ) X =−X −1A T X −1
(3a)(3b)(3c)(3d)(3e)
0.4derivatives of determinants
∂|AXB |
=|AXB |(X −1) T =|AXB |(X T ) −1
∂X
(4a)
∂ln |X |
=(X −1) T =(X T ) −1(4b)∂X
∂ln |X (z ) |∂X
=Tr X −1(4c)
∂z∂z∂|X T AX |
=|X T AX |(AX (X T AX ) −1+A T X (X T A T X ) −1) (4d)
∂X
0.5derivatives of scalar forms
∂(a T x ) ∂(x T a )
=∂x ∂x
T ∂(x Ax ) ∂x ∂(a T Xb ) ∂X T ∂(a X T b ) ∂X
∂(a T Xa ) ∂(a T X T a )
=
∂X ∂X
∂(a T X T CXb )
∂X T ∂(Xa +b ) C (Xa +b ) ∂X
2=a
=(A +A T ) x =ab T =ba T =aa T
=C T Xab T +CXba T =(C +C T )(Xa +b ) a T
(5a)(5b)(5c)(5d)(5e)(5f)(5g)
the derivative of one vector y with respect to another vector x is a matrix whose (i, j ) th element is ∂y(j ) /∂x(i ). such a derivative should be written as ∂y T /∂x in which case it is the Jacobian matrix of y wrt x . its determinant represents the ratio of the hypervolume d y to that of d x so that f (y ) d y =
f (y (x )) |∂y T /∂x |d x . however, the sloppy forms ∂y /∂x , ∂y T /∂x T and ∂y /∂x T are often used for this Jacobain matrix.
0.6derivatives of vector/matrixforms
∂(X −1) ∂z∂(Ax ) ∂z∂(XY ) ∂z∂(AXB ) ∂z∂(x T A ) ∂x ∂(x T ) ∂x T ∂(x Axx T ) ∂x
=−X −1=A
∂X −1
X ∂z
(6a)(6b)(6c)(6d)(6e)(6f)(6g)
∂x ∂z
∂Y ∂X =X +Y
∂z∂z∂X =A B
∂z=A =I
=(A +A T ) xx T +x T AxI
0.7constrained maximization
1
µT x −x T A −1x
2
the maximum over x of the quadratic form:
(7a)
subject to the J conditions c j (x ) =0is given by:
A µ+ACΛ,
Λ=−4(C T AC ) C T A µ
(7b)
where the j th column of C is ∂cj (x ) /∂x
0.8symmetric matrices
have real eigenvalues, though perhaps not distinct and can always be diag-onalized to the form:
A =CΛCT (8)
3
where the columns of C are (orthonormal)eigenvectors (i.e.CC T =I ) and the diagonal of Λhas the eigenvalues
0.9block matrices
for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices
however, determinants and inverses of block matrices are very tricky; for 2blocks by 2blocks the results are: A 11A 12 (9a) A 21A 22 =|A 22|·|F 11|=|A 11|·|F 22| −1 1−1−1A 11A 12F −−A A F 12221111=(9b)−1−1−1A 21A 22−F 22A 21A 11F 22
−1 1−1−1−1−1A 11+A −A F A A −F A A [**************]1=−1−1−1−11−1
−A 22A 21F 11A 22+A 22A 21F −11A 12A 22where
−1
F 11=A 11−A 12A 22A 21
1
F 22=A 22−A 21A −11A 12
for block diagonal matrices things are much easier:
A 110
0A 22 =|A 11||A 22|
−1 −1
0A 110A 11
=1
0A 220A −22
(9d)(9e)
0.10matrix inversion lemma (sherman-morrison-woodbury)
using the above results for block matrices we can make some substitutions
and get the following important results:
(A +XBX T ) −1=A −1−A −1X (B −1+X T A −1X ) −1X T A −1
|A +XBX T |=|B ||A ||B −1+X T A −1X |
(10)(11)
where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be con-verted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns
4