Understanding Special Relativity

What is the difference between SR (special relativity) and GR (general relativity)?

SR is the theory of relativity in the absence of gravity. This can be stated equivalently as saying that SR is the theory of relativity in flat spacetime. GR is really the theory of gravity. It is the theory of relativity in curved spacetime.

To understand SR, you can start with the postulate that the speed of light has to be constant for all observers (inertial reference frames). This can be stated equivalently, as saying that the length of spacetime displacement vector (the vector representing motion of light from one point to another) is invariant between two inertial reference frames under the Minkowski metric. i.e.:

\[\begin{equation} ds^2 = (c dt)^2 - (dx)^2 - (dy)^2 - (dz)^2 = (c dt')^2 - (dx')^2 - (dy')^2 - (dz')^2 \end{equation}\]

The prime symbol ' is used to denote a co-ordinate in frame 2. The Lorentz transformation can then be derived from this assumption. See [1] for some discussion on this whether its right. In essence, above constraint translates to following equivalent constraint on the Lorentz transformation matrix \(\Lambda\):

\[\begin{equation} \eta^T = \Lambda^T \eta^T \Lambda \end{equation}\]

where \(\eta\) is the Minknowski metric (or matix depending on how you want to look at it):

\[\begin{equation} \eta = \begin{pmatrix} 1 && 0 && 0 && 0 \\ 0 && -1 && 0 && 0 \\ 0 && 0 && -1 && 0 \\ 0 && 0 && 0 && -1 \end{pmatrix} \end{equation}\]

Remember 1 and 2 are equivalent. You can start from 1 and derive 2 or go in the reverse direction if you so desire.

As I was reading Weinberg p. 35 I was very confused. He says:

for any quantity that undergoes the transformation \( \mathbf{V'} = \mathbf{\Lambda V}\), such a \( \mathbf{V}\) should be called a contravariant 4 vector to distinguish it from a covariant 4 vector defined as a quantity \( \mathbf{U}\) whose transformation rule is \( \mathbf{U'} = \mathbf{\Lambda^{-T} U}\)

— S. Weinberg
Gravitation and Cosmology p. 35

How can he decide at whim that some vectors he will transform as \( \mathbf{\Lambda V}\) and that others he will transform as \( \mathbf{U'} = \mathbf{\Lambda^{-T}U}\) ?

This bothered me to no end and kept me up for many days until I realized that the Lorentz transformation matrix \( \mathbf{\Lambda}\) tells how co-ordinates transform from one inertial reference frame to another. This is not to be confused with how a quantity such as a 4 vector transforms between the two reference frames. To his credit, the complete text from Weinberg’s book is:

for any quantity that undergoes the transformation \( \mathbf{V'} = \mathbf{\Lambda V}\) when the co-ordinate system is transformed by \( \mathbf{x'} = \mathbf{\Lambda x}\) …

— S. Weinberg
Gravitation and Cosmology p. 35

More generally a contravariant vector is a quantity where the i-th component in the other reference frame is given by:

\[\begin{equation} \mathbf{V'_i} = \sum_j \frac{\partial x'_i}{\partial x_j} \mathbf{V_j} \end{equation}\]

Here \(\frac{\partial x'_i}{\partial x_j}\) is partial derivative of the i-th co-ordinate in frame 2 w.r.t. j-th co-ordinate in frame 1 evaluated at some point. Remember these co-ordinates are like functions or variables of multivariate calculus. This can be expressed in matrix form using the Jacobian matrix as:

\[\mathbf{V'} = \mathbf{J V}\]

Recall the Jacobian matrix is the matrix of first-order partial differentials. When the transformation between the co-ordinates is linear i.e., \(\mathbf{x' = M x}\), then the Jacobian is same as \(\mathbf{M}\). This is generalization of the fact that when \(y = k x\), \(dy / dx = k\). In our case, \(\mathbf{M = \Lambda}\).

Note very very carefully that when you are dealing with multi-variate calculus:

\[\frac{\partial x_i}{\partial x'_j} \neq \frac{1}{\partial x'_j / \partial x_i}\]

This is a statement of the fact that given a matrix \(\mathbf{M}\) (in our case replace \(\mathbf{M}\) with the Jacobian \(\mathbf{J}\)):

\[\begin{equation} M = \begin{pmatrix} m_{00} && m_{01} \\ m_{10} && m_{11} \end{pmatrix} \end{equation}\]

\[\begin{equation} M^{-1} \neq \begin{pmatrix} \frac{1}{m_{00}} && \frac{1}{m_{01}} \\ \frac{1}{m_{10}} && \frac{1}{m_{11}} \end{pmatrix} \end{equation}\]

To get the partial derivatives in the other direction, we have to invert the Jacobian using laws of matrix inversion. i.e., Given:

\[\begin{equation} J_{ij} = \frac{\partial x'_{i}}{\partial x_{j}} \end{equation}\]

\(\frac{\partial x_{i}}{\partial x'_{j}}\) is given by \(J^{-1}\). e.g., consider the cartesion co-ordinates vs. spherical co-ordinates. Given:

\[\begin{equation} J = \begin{pmatrix} \frac{\partial r}{\partial x} && \frac{\partial r}{\partial y} && \frac{\partial r}{\partial z} \\ \frac{\partial \theta}{\partial x} && \frac{\partial \theta}{\partial y} && \frac{\partial \theta}{\partial z} \\ \frac{\partial \phi}{\partial x} && \frac{\partial \phi}{\partial y} && \frac{\partial \phi}{\partial z} \\ \end{pmatrix} \end{equation}\]

it then follows that \(J^{-1}\) will give us:

\[\begin{equation} J^{-1} = \begin{pmatrix} \frac{\partial x}{\partial r} && \frac{\partial x}{\partial \theta} && \frac{\partial x}{\partial \phi} \\ \frac{\partial y}{\partial r} && \frac{\partial y}{\partial \theta} && \frac{\partial y}{\partial \phi} \\ \frac{\partial z}{\partial r} && \frac{\partial z}{\partial \theta} && \frac{\partial z}{\partial \phi} \\ \end{pmatrix} \end{equation}\]

\(J\) will be multipled with a \((\delta x, \delta y, \delta z)\) to give a \((\delta r, \delta \theta, \delta \phi)\), whereas \(J^{-1}\) will be multipled with a \((\delta r, \delta \theta, \delta \phi)\) to give a \((\delta x, \delta y, \delta z)\).

Raising and lowering indices

For this, realize you require a metric tensor first of all. You cannot raise or lower indices without a metric tensor. Lets denote the (covariant) metric by \(g_{\mu \nu}\) and the (contravariant) metric by \(g^{\mu \nu}\).

Properties of metric tensor:

\(g_{\mu \nu}\) and \(g^{\mu \nu}\) are inverse of each other. This is written mathematically as \(g_{i k} g^{k j} = \delta^{i}_{j}\) where \(\delta^{i}_{j} = 1\) for \(i=j\) and \(0\) otherwise. In matrix lingo this is nothing but \(g G = I\) where I have used \(g\) and \(G\) for the two matrices representing \(g_{\mu \nu}\) and \(g^{\mu \nu}\).
metric tensor is a tensor of rank 2 so can be written as a \(d \times d\) matrix where \(d\) is the number of dimensions we are dealing with (\(d=4\) for us in physics - 3 for space, 1 for time)
metric tensor is symmetric so \(g_{\mu \nu} = g_{\nu \mu}\)

Finally, we come to the algorithm to raise or lower an index on any (mixed) tensor. A mixed tensor is a tensor that has a mixture of contravariant and covariant indices. For computer programmers, it helps to think of a tensor like an N-D array in computer science. In pseudocode, here is a rank 4 tensor:

T = new float[d][d][d][d];

However, in addition to being an N-D array, a tensor (in physics) has to obey very specifc rules when its expressed in a different co-ordinate system. Only then can it be called a tensor. i.e., if \(T\) is the tensor written in frame 1 and \(T'\) is the same tensor written in frame 2, the numbers in the two N-D arrays have to be related to each other by very specific rules which are the rules of contravariant and covariant transformations on respective indices.

Anyway here is the algorithm to lower the \(i\) th index on a tensor \(T(...,i,...)\):

First of all, we can do this operation only if \(i\) is upstairs in the tensor. If its already downstairs then this operation is meaningless.
Step 1: Write down the tensor and replace \(i\) with another symbol. I choose the next symbol in the alphabet after the last index of \(T(...,i,...)\). e.g., if we have \(T(i,j,k,l)\) I will use the symbol \(m\)
Step 2: Multiply above by \(g_{\mu \nu}\) and replace \(\nu\) with \(m\) and \(\mu\) with \(i\) so it becomes \(g_{im}\).
Now, you have a repeated index \(m\) and we must sum over it.
The result is the tensor with \(i\) in downstairs position.

Symbolically (making the summation explicit):

\[\begin{equation} T_i = \sum_m g_{im}T^{m}; \end{equation}\]

where I have hidden the other indices on \(T\) as they remain unchanged and carry over as-is to \(T\) on the LHS.

Since we are not doing matrix multiplication, it doesn’t matter if you write \(g_{im}\) to the left or right of \(T(...,m,...)\). Also since metric tensor is symmetric, it doesn’t matter if you use \(g_{im}\) or \(g_{mi}\).

The algorithm to raise an index is similar except that we use \(g^{\mu \nu}\) in this case. The whole operation is also called as contracting \(T\) with \(g_{\mu \nu}\) or \(g^{\mu \nu}\) as the case may be.

Also remember that in the usual notation used in literature, primes (e.g., \(T vs. T'\)) are used to denote the same quantity but measured in different co-ordinate systems (a.k.a. reference frames) whereas raising or lowering indices is an operation you do in the same frame and it changes the quantity. It is no longer the same. This should be evident from above equation.

I think this post ended up being more on mathematics and less on physics. The two are intertwined. If you understand the notation and tensor math you are already half-way through understanding relativity or other advanced topics like QFT. If you don’t understand this language, the equations will all look gibberish and you won’t be able to make any progress.

Btw, I disagree very strongly with statements like time is an illusion. Different observers will measure time differently (because they use different inertial frames) but time is very much real.

Raising and lowering indices

Further Reading