LOG#105. Einstein’s equations.


In 1905,  one of Einstein’s achievements was to establish the theory of Special Relativity from 2 single postulates and correctly deduce their physical consequences (some of them time later).  The essence of Special Relativity, as we have seen, is that  all the inertial observers must agree on the speed of light “in vacuum”, and that the physical laws (those from Mechanics and Electromagnetism) are the same for all of them.  Different observers will measure (and then they see) different wavelengths and frequencies, but the product of wavelength with the frequency is the same.  The wavelength and frequency are thus Lorentz covariant, meaning that they change for different observers according some fixed mathematical prescription depending on its tensorial character (scalar, vector, tensor,…) respect to Lorentz transformations.  The speed of light is Lorentz invariant.

By the other hand, Newton’s law of gravity describes the motion of planets and terrrestrial bodies.  It is all that we need in contemporary rocket ships unless those devices also carry atomic clocks or other tools of exceptional accuracy.  Here is Newton’s law in potential form:

4\pi G\rho = \nabla ^2 \phi

In the special relativity framework, this equation has a terrible problem: if there is a change in the mass density \rho, then it must propagate everywhere instantaneously.  If you believe in the Special Relativity rules and in the speed of light invariance, it is impossible. Therefore, “Houston, we have a problem”.

Einstein was aware of it and he tried to solve this inconsistency.  The final solution took him ten years .

The apparent silly and easy problem is to develop and describe all physics in the the same way irrespectively one is accelerating or not. However, it is not easy or silly at all. It requires deep physical insight and a high-end mathematical language.  Indeed,  what is the most difficult part are  the details of Riemann geometry and tensor calculus on manifolds.  Einstein got  private aid from a friend called  Marcel Grossmann. In fact, Einstein knew that SR was not compatible with Newton’s law of gravity. He (re)discovered the equivalence principle, stated by Galileo himself much before than him, but he interpreted deeper and seeked the proper language to incorporante that principle in such a way it were compatible (at least locally) with special relativity! His  “journey” from 1907 to 1915 was a hard job and a continuous struggle with tensorial methods…

Today, we are going to derive the Einstein field equations for gravity, a set of equations for the “metric field” g_{\mu \nu}(x). Hilbert in fact arrived at Einstein’s field equations with the use of the variational method we are going to use here, but Einstein’s methods were more physical and based on physical intuitions. They are in fact “complementary” approaches. I urge you to read “The meaning of Relativity” by A.Einstein in order to read a summary of his discoveries.

We now proceed to derive Einstein’s Field Equations (EFE) for General Relativity (more properly, a relativistic theory of gravity):

Step 1. Let us begin with the so-called Einstein-Hilbert action (an ansatz).

S = \int d^4x \sqrt{-g} \left( \dfrac{c^4}{16 \pi G} R + \mathcal{L}_{\mathcal{M}} \right)

Be aware of  the square root of the determinant of the metric as part of the volume element.  It is important since the volume element has to be invariant in curved spacetime (i.e.,in the presence of a metric).  It also plays a critical role in the derivation.

Step 2. We perform the variational variation with respect to the metric field g^{\mu \nu}:

\delta S = \int d^4 x \left( \dfrac{c^4}{16 \pi G} \dfrac{\delta \sqrt{-g}}{\delta g^{\mu \nu}} + \dfrac{\delta (\sqrt{-g}\mathcal{L}_{\mathcal{M}})}{\delta g^{\mu \nu}} \right) \delta g^{\mu \nu}

Step 3. Extract out  the square root of the metric as a common factor and use the product rule on the term with the Ricci scalar R:

\delta S = \int d^4 x \sqrt{-g} \left( \dfrac{c^4}{16 \pi G} \left ( \dfrac{\delta R}{\delta g^{\mu \nu}} +\dfrac{R}{\sqrt{-g}}\dfrac{\delta \sqrt{-g}}{\delta g^{\mu \nu}} \right) +\dfrac{1}{\sqrt{-g}}\dfrac{\delta ( \sqrt{-g}\mathcal{L}_{\mathcal{M}})}{\delta g^{\mu\nu}}\right) \delta g^{\mu \nu}

Step 4.  Use the definition of a Ricci scalar as a contraction of the Ricci tensor to calculate the first term:

\dfrac{\delta R}{\delta g^{\mu \nu}} = \dfrac{\delta (g^{\mu \nu}R_{\mu \nu})}{\delta g^{\mu \nu} }= R_{\mu\nu} + g^{\mu \nu}\dfrac{\delta R_{\mu \nu}}{\delta g^{\mu \nu}} = R_{\mu \nu} + \mbox{total derivative}

A total derivative does not make a contribution to the variation of the action principle, so can be neglected to find the extremal point.  Indeed, this is the Stokes theorem in action. To show that the variation in the Ricci tensor is a total derivative, in case you don’t believe this fact, we can proceed as follows:

Check 1. Write  the Riemann curvature tensor:

R^{\rho}_{\, \sigma \mu \nu} = \partial _{\mu} \Gamma ^{\rho}_{\, \sigma \nu} - \partial_{\nu} \Gamma^{\rho}_{\, \sigma \mu}+ \Gamma^{\rho}_{\, \lambda \mu} \Gamma^{\lambda}_{\, \sigma \nu} - \Gamma^{\rho}_{\, \lambda \nu} \Gamma^{\lambda}_{\, \sigma \mu}

Note the striking resemblance with the non-abelian YM field strength curvature two-form

F=dA+A \wedge A = \partial _{\mu} A_{\nu} - \partial _{\nu} A_{\mu} + k \left[ A_\mu , A_{\nu} \right].

There are many terms with indices in the Riemann tensor calculation, but we can simplify stuff.

Check 2. We have to calculate the variation of the Riemann curvature tensor with respect to the metric tensor:

\delta R^{\rho}_{\, \sigma \mu \nu} = \partial _{\mu} \delta \Gamma^{\rho}_{\, \sigma \nu} - \partial_\nu \delta \Gamma^{\rho}_{\, \sigma \mu} + \delta \Gamma ^{\rho}_{\, \lambda \mu} \Gamma^{\lambda}_{\, \sigma \nu} - \delta \Gamma^{\rho}_{\lambda \nu}\Gamma^{\lambda}_{\, \sigma \mu} + \Gamma^{\rho}_{\, \lambda \mu}\delta \Gamma^{\lambda}_{\sigma \nu} - \Gamma^{\rho}_{\lambda \nu} \delta \Gamma^{\lambda}_{\, \sigma \mu}

One cannot calculate the covariant derivative of a connection since it does not transform like a tensor.  However, the difference of two connections does transform like a tensor.

Check 3. Calculate the covariant derivative of the variation of the connection:

\nabla_{\mu} ( \delta \Gamma^{\rho}_{\sigma \nu}) = \partial _{\mu} (\delta \Gamma^{\rho}_{\, \sigma \nu}) + \Gamma^{\rho}_{\, \lambda \mu} \delta \Gamma^{\lambda}_{\, \sigma \nu} - \delta \Gamma^{\rho}_{\, \lambda \sigma}\Gamma^{\lambda}_{\mu \nu} - \delta \Gamma^{\rho}_{\, \lambda \nu}\Gamma^{\lambda}_{\, \sigma \mu}

\nabla_{\nu} ( \delta \Gamma^{\rho}_{\sigma \mu}) = \partial _\nu (\delta \Gamma^{\rho}_{\, \sigma \mu}) + \Gamma^{\rho}_{\, \lambda \nu} \delta \Gamma^{\lambda}_{\, \sigma \mu} - \delta \Gamma^{\rho}_{\, \lambda \sigma}\Gamma^{\lambda}_{\mu \nu} - \delta \Gamma^{\rho}_{\, \lambda \mu}\Gamma^{\lambda}_{\, \sigma \nu}

Check 4. Rewrite the variation of the Riemann curvature tensor as the difference of two covariant derivatives of the variation of the connection written in Check 3, that is, substract the previous two terms in check 3.

\delta R^{\rho}_{\, \sigma \mu \nu} = \nabla_{\mu} \left( \delta \Gamma^{\rho}_{\, \sigma \nu}\right) - \nabla _{\nu} \left(\delta \Gamma^{\rho}_{\, \sigma \mu}\right)

Check 5. Contract the result of Check 4.

\delta R^{\rho}_{\, \mu \rho \nu} = \delta R_{\mu \nu} = \nabla_{\rho} \left( \delta \Gamma^{\rho}_{\, \mu \nu}\right) - \nabla _{\nu} \left(\delta \Gamma^{\rho}_{\, \rho \mu}\right)

Check 6. Contract the result of Check 5:

g^{\mu \nu}\delta R_{\mu \nu} = \nabla_\rho (g^{\mu \nu} \delta \Gamma^{\rho}_{\mu\nu})-\nabla_\nu (g^{\mu \nu}\delta \Gamma^{\rho}_{\rho \mu}) = \nabla _\sigma (g^{\mu \nu}\delta \Gamma^{\sigma}_{\mu \nu}) - \nabla_\sigma (g^{\mu \sigma}\delta \Gamma ^{\rho}_{\rho \mu})

Therefore, we have

g^{\mu \nu}\delta R_{\mu \nu} = \nabla_\sigma (g^{\mu \nu}\delta \Gamma^{\sigma}_{\mu\nu}- g^{\mu \sigma}\delta \Gamma^{\rho}_{\rho\mu})=\nabla_\sigma K^\sigma


Step 5. The variation of the second term in the action is the next step.  Transform the coordinate system to one where the metric is diagonal and use the product rule:

\dfrac{R}{\sqrt{-g}} \dfrac{\delta \sqrt{-g}}{\delta g^{\mu \nu}}=\dfrac{R}{\sqrt{-g}} \dfrac{-1}{2 \sqrt{-g}}(-1) g g_{\mu \nu}\dfrac{\delta g^{\mu \nu}}{\delta g^{\mu \nu}} =- \dfrac{1}{2}g_{\mu \nu} R

The reason of the last equalities is that g^{\alpha\mu}g_{\mu \beta}=\delta^{\alpha}_{\; \beta}, and then its variation is

\delta (g^{\alpha\mu}g_{\mu \nu}) = (\delta g^{\alpha\mu}) g_{\mu \nu} + g^{\alpha\mu}(\delta g_{\mu \nu}) = 0

Thus, multiplication by the inverse metric g^{\beta \nu} produces

\delta g^{\alpha \beta} = - g^{\alpha \mu}g^{\beta \nu}\delta g_{\mu \nu}

that is,

\dfrac{\delta g^{\alpha \beta}}{\delta g_{\mu \nu}}= -g^{\alpha \mu} g^{\beta \nu}

By the other hand, using the theorem for the derivation of a determinant we get that:

\delta g = \delta g_{\mu \nu} g g^{\mu \nu}


\dfrac{\delta g}{\delta g^{\alpha \beta}}= g g^{\alpha \beta}

because of the classical identity

g^{\alpha \beta}=(g_{\alpha \beta})^{-1}=\left( \det g \right)^{-1} Cof (g)


Cof (g) = \dfrac{\delta g}{\delta g^{\alpha \beta}}

and moreover

\delta \sqrt{-g}=-\dfrac{\delta g}{2 \sqrt{-g}}= -g\dfrac{ \delta g_{\mu \nu} g^{\mu \nu}}{2 \sqrt{-g}}


\delta \sqrt{-g}=\dfrac{1}{2}\sqrt{-g}g^{\mu \nu}\delta g_{\mu \nu}=\dfrac{1}{2}\sqrt{-g}g_{\mu \nu}\delta g^{\mu \nu}


Step 6. Define the stress energy-momentum tensor as the third term in the action (that coming from the matter lagrangian):

T_{\mu \nu} = - \dfrac{2}{\sqrt{-g}}\dfrac{(\sqrt{-g} \mathcal{L}_{\mathcal{M}})}{\delta g^{\mu \nu}}

or equivalently

-\dfrac{1}{2}T_{\mu \nu} = \dfrac{1}{\sqrt{-g}}\dfrac{(\sqrt{-g} \mathcal{L}_{\mathcal{M}})}{\delta g^{\mu \nu}}

Step 7. The extremal principle. The variation of the Hilbert action will be  an extremum when the integrand is equal to zero:

\dfrac{c^4}{16\pi G}\left( R_{\mu \nu} - \dfrac{1}{2} g_{\mu \nu}R\right) - \dfrac{1}{2} T_{\mu \nu} = 0


\boxed{R_{\mu \nu} - \dfrac{1}{2}g_{\mu \nu} R = \dfrac{8\pi G}{c^4}T_{\mu\nu}}

Usually this is recasted and simplified using the Einstein’s tensor

G_{\mu \nu}= R_{\mu \nu} - \dfrac{1}{2}g_{\mu \nu} R


\boxed{G_{\mu\nu}=\dfrac{8\pi G}{c^4}T_{\mu\nu}}

This deduction has been mathematical. But there is a deep physical picture behind it. Moreover,  there are a huge number of physics issues one could go into. For instance, these equations bind to particles with integral spin which is good for bosons, but there are matter fermions that also participate in gravity coupling to it. Gravity is universal.  To include those fermion fields, one can consider the metric and the connection to be independent of each other.  That is the so-called Palatini approach.

Final remark: you can add to the EFE above a “constant” times the metric tensor, since its “covariant derivative” vanishes. This constant is the cosmological constant (a.k.a. dark energy in conteporary physics). The, the most general form of EFE is:

\boxed{G_{\mu\nu}+\Lambda g_{\mu\nu}=\dfrac{8\pi G}{c^4}T_{\mu\nu}}

Einstein’s additional term was added in order to make the Universe “static”. After Hubble’s discovery of the expansion of the Universe, Einstein blamed himself about the introduction of such a term, since it avoided to predict the expanding Universe. However, perhaps irocanilly, in 1998 we discovered that the Universe was accelerating instead of being decelerating due to gravity, and the most simple way to understand that phenomenon is with a positive cosmological constant domining the current era in the Universe. Fascinating, and more and more due to the WMAP/Planck data. The cosmological constant/dark energy and the dark matter we seem to “observe” can not be explained with the fields of the Standard Model, and therefore…They hint to new physics. The character of this  new physics is challenging, and much work is being done in order to find some particle of model in which dark matter and dark energy fit. However, it is not easy at all!

May the Einstein’s Field Equations be with you!

LOG#032. Invariance and relativity.

Invariance, symmetry and invariant quantities are in the essence, heart and core of Physmatics. Let me begin this post with classical physics. Newton’s fundamental law reads:

\mathbf{F}=m\mathbf{a}=\begin{cases}m\ddot{x}, \;\; m\ddot{x}=m\dfrac{d^2x}{dt^2}\\\;\\m\ddot{y}, \;\; m\ddot{y}=m\dfrac{d^2y}{dt^2}\\\;\\ m\ddot{z}, \;\; m\ddot{z}=m\dfrac{d^2z}{dt^2}\end{cases}

Suppose two different frames obtained by a pure translation in space:




We select to make things simpler


We can easily observe by direct differentiation that Newton’s fundamental is invariant under translations in space, since mere substitution provides:



m\dfrac{d^2\mathbf{r}'}{dt^2}=m\dfrac{d^2\mathbf{r}}{dt^2} \leftrightarrow \mathbf{a}'=\mathbf{a}

By the other hand, rotations around a fixed axis, say the z-axis, are transformations given by:

\boxed{\mathbf{r}'=R\mathbf{r}\leftrightarrow \begin{pmatrix}x'\\y'\\z'\end{pmatrix}=R\begin{pmatrix}x\\y\\z\end{pmatrix}\rightarrow\begin{cases}x'=x\cos\theta+y\sin\theta\\y'=-x\sin\theta+y\cos\theta\\z'=z\end{cases}}

If we multiply by the mass these last equations and we differentiate with respect to time twice, keeping constant \theta and m, we easily get

\mathbf{F}'=R\mathbf{F}\leftrightarrow \begin{cases}F_{x'}=F_x\cos\theta+F_y\cos\theta\\F_{y'}=-F_x\sin\theta+F_y\cos\theta\\F_{z'}=F_z\end{cases}


\boxed{\begin{pmatrix}F'_x\\F'_y\\F'_z\end{pmatrix}=\begin{pmatrix}\cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0& 1 \end{pmatrix}\begin{pmatrix}F_x\\F_y\\F_z\end{pmatrix}\rightarrow \begin{cases}F'_x=F_x\cos\theta+F_y\sin\theta\\F'_y=-F_x\sin\theta+F_y\cos\theta\\F'_z=F_z\end{cases}}

Thus, we can say that Newton’s fundamental law is invariant under spatial translations and rotations. Its form is kept constant under those kind of transformations. Generally speaking, we also say that Newton’s law is “covariant”, but nowadays it is an abuse of language since the word covariant means something different in tensor analysis. So, be aware about the word “covariant” (specially in old texts). Today, we talk about “invariant laws”, or about the symmetry of certain equations under certain set of (group) transformations.

Newton’s law use the concept of acceleration:



a_x=\dfrac{dv_x}{dt}=\dfrac{d^2x}{dt^2} a_y=\dfrac{dv_y}{dt}=\dfrac{d^2y}{dt^2} a_z=\dfrac{dv_z}{dt}=\dfrac{d^2z}{dt^2}

or, in compact form

a_i=\dfrac{dv_i}{dt},\;\; i=x,y,z

And then, the following equations are invariant under translations in space and rotations:

\mathbf{F}=m\mathbf{a} or \mathbf{F}=\dfrac{d\mathbf{p}}{dt} with \mathbf{p}=m\mathbf{v}

Intrinsic components of the aceleration provide a decomposition


where we define

a_\parallel=\dfrac{dv}{dt}\leftrightarrow \mathbf{a}_\parallel=\dfrac{dv}{dt}\mathbf{u}_\parallel

where \mathbf{u}_\parallel is a unit vector in the direction of the velocity, and


In the case of motion along a general curve, we can approximate the motion in every point of the curve by a circle of radius R, and thus

s=R\Delta \theta\rightarrow \Delta \theta=\dfrac{s}{R}=\dfrac{v\Delta t}{R}\rightarrow \dfrac{\Delta \theta}{\Delta t}=\omega=\dfrac{v}{R}

By the other hand,

\Delta v_\perp=v\Delta \theta\rightarrow a_\perp=v\dfrac{\Delta \theta}{\Delta t}=\dfrac{v^2}{R}=\omega^2R

and we get the known expression for the centripetal acceleration:


More about invariant quantities in Classical Physics: the scalar (sometimes called dot) product of two vectors is invariant, since the length of every vector is constant in euclidean spaces under rotations and translations. For instance,

\boxed{r^2=x^2+y^2+z^2=\mathbf{r}\cdot\mathbf{r}=\mathbf{r'}\cdot\mathbf{r'}=\mbox{INVARIANT}=\mbox{SQUARED LENGTH}}

In matrix form,

r^2=X^TX=\delta _{ij}x^ix^j=\begin{pmatrix}x & y & z\end{pmatrix}\begin{pmatrix}1 &0& 0\\ 0 &1& 0\\ 0& 0& 1\end{pmatrix}\begin{pmatrix}x\\ y\\ z\end{pmatrix}

where we have introduced the \delta_{ij} symbol to be the so-called Kronecker delta as “certain object” with components: its components are “1” whenever i=j and “0” otherwise. Of course, the Kronecker delta symbol “is” the identity matrix when the symbol have two indices. However, let me remark that “generalized delta Kronocker” with more indices do exist and it is not always posible to express easily that “tensor” in a matrix way, excepting using some clever tricks.

The scalar (dot) product can be computed with any vector quantity:

\mathbf{a}\cdot\mathbf{a}=a^2=a_x^2+a_y^2+a_z^2\rightarrow \mathbf{a}\cdot\mathbf{b}=a_xb_x+a_yb_y+a_zb_z

Moreover, there is a coordinate free definition as well:

\mathbf{a}\cdot\mathbf{b}=ab\cos\theta,\;\; \theta=\mbox{angle formed by}\; \mathbf{a},\mathbf{b}

Note that the invariance of the dot product implies the invariance of classical kinetic energy, since:


We have also the important invariant quantities:

\mbox{WORK}=W=\int \mathbf{F}\cdot d\mathbf{r}


where the second equality holds if the force is constant along the trajectory. Moreover, in relativistic electromagnetism, you also get the wave-number 4-vector:

\boxed{\mathbb{K}=(K^0,\mathbf{K})}\leftrightarrow \mbox{WAVE NUMBER SPACETIME VECTOR}

and the invariant \mathbb{K}\cdot\mathbb{K}=K^2, that you can get from the plane wave solution:

A=A_0\exp (i(K^\mu X_\mu))=A_0\exp (i(\mathbb{K}\cdot \mathbb{X}))

where the phase invariant reads

\phi =\mathbb{K}\cdot \mathbb{X}=\mathbf{K}\cdot\mathbf{X}-\omega t

Therefore, we deduce that


and the wave number  vector satisfies the following relation with the wave-length

\vert \mathbf{K}\vert =K=\dfrac{2\pi}{\lambda}

There is another important set of transformations or symmetry in classical physics. It is related to inertial frames. Galileo discovered that the laws of motion are the same for every inertial observer, i.e., the laws of Mechanics are invariant for inertial frames! A Galilean transformation is defined by:

\boxed{\mbox{GALILEAN TRANSFORMATIONS}\begin{cases}\mathbf{x}'=\mathbf{x}-\mathbf{V}t\\t'=t\end{cases}}

where \mathbf{V}=constant. Differentiating with respect to time, we get

\boxed{\mbox{GALILEAN TRANSFORMATIONS}\begin{cases}\dfrac{d\mathbf{x}'}{dt}=\dfrac{d\mathbf{x}}{dt}-\mathbf{V}\\ \;\\ \dfrac{dt'}{dt}=1\end{cases}}

and then

\boxed{\mbox{GALILEAN TRANSFORMATIONS}\begin{cases}\dfrac{d^2\mathbf{x}'}{dt^2}=\dfrac{d^2\mathbf{x}}{dt^2}\\ \;\\\dfrac{d^2t'}{dt^2}=0\end{cases}}

And thus, the accelerations (and forces) that observe different inertial ( i.e., reference frames moving with constant relative velocity) frames are the same


And now, about symmetry. What are the symmetries of Physics? There are many interesting transformations and space-time symmetries. A non-completely exhaustive list is this one:

1. Translations in space.

2. Translations in time.

3. Rotations around some axis ( and with fixed angle).

4. Uniform velocity in straight line, a.k.a., galilean transformations for inertial observers. This symmetry “becomes” Lorentz boosts in the spacetime analogue of special relativity.

5. Time reversal ( inversion of the direction of time), T.

6. Reflections in space (under “a mirror”). It is also called parity P.

7. Matter-antimatter interchange, or charge conjugation symmetry, C.

8. Interchange of identical atoms/particles.

9. Scale transformations \mathbb{X}'=\lambda\mathbb{X}.

10. Conformal transformations (in the complex plane or in complex spaces).

11. Arbitrary coordinate transformations (they are also called general coordinate transformations).

12. Quantum-mechanical (gauge) phase symmetry: \Psi\rightarrow \Psi'=\Psi \exp (i\theta).

Beyon general vectors, in classical physics we also find “axial” vectors (also called “pseudovectors”). Pseudovectors or axial vectors are formed by the 3d “cross”/outer/vector  product:

\mathbf{C}=\mathbf{A}\times \mathbf{B}=\begin{vmatrix}e_1 & e_2 & e_3\\ A_x & A_y & A_z\\ B_x & B_y & B_z \end{vmatrix}=e_1\begin{vmatrix}A_y & A_z\\ B_y & B_z\end{vmatrix}-e_2\begin{vmatrix}A_x & A_z\\ B_x & B_z\end{vmatrix}+e_3\begin{vmatrix}A_x & A_y\\ B_x & B_y\end{vmatrix}

Some examples are the angular momentum

\mathbf{L}=\mathbf{r}\times\mathbf{p}=\mathbf{r}\times m\mathbf{v}

or the magnetic force

\mathbf{F}_m=q\mathbf{v}\times \mathbf{B}

Interestingly, the main difference between axial and polar vectors, i.e., between common vectors and pseudovectors is the fact that under P (parity), pseudovectors are “invariant” while common vectors change their sign, i.e., polar vectors become the opposite vector under reflection, and pseudovectors remain invariant. It can be easily found by inspection in the definition of angular momentum or the magnetic force ( or even the general definition of cross product given above).

Now, we turn our attention to invariants in special relativity. I will introduce a very easy example to give a gross idea of how the generalization of “invariant theory” works as well in special relativity. From the classical definition of power:


Using the relativistic definition of 4-momentum:


where M=m\gamma, we are going to derive a known result, since E=Mc^2. Note that, in agreement with classical physics, from this


Therefore, inserting the relativistic expressions for energy and force into the power equation, we obtain:


Multiplying by 2M, and using the Leibniz rule for the differentiation of a product of two functions:


or equivalently


and so


Integrating this, we deduce that


If we plug \mbox{constant}=m^2c^2=M^2c^2(v=0)


M^2c^2=M^2v^2+m^2c^2\rightarrow M^2=m^2\gamma^2,\;\; \gamma^2=\dfrac{1}{1-\frac{v^2}{c^2}}

and thus, we have rederived the notion of “relativistic mass”


Special relativity generalizes the notion of dot product ( scalar product) to a non-euclidean (pseudoeuclidean to be more precise) geometry. The dot product in special relativity is given by:

\boxed{\mathbb{A}\cdot\mathbb{B}=A^x B_x+A^y B_y+A^z B_z-A^t B_t}

The sign of the temporal fourth component is conventional in the sense some people uses a minus sign for the purely spatial components and a positive sign for the temporal component. Using a more advanced notation we can write the new scalar product as follows:

\boxed{A^\mu B_\mu=A_\mu B^\mu=A^x B_x+A^y B_y+A^z B_z-A^t B_t}

where the repeated dummy index implies summation over it. This convention of understanding summation over repeated indices is called Einstein’s covention and it is due to Einstein himself. Another main point about notation is that some people prefer the use of a \mu=0,1,2,3 while other people use \mu=1,2,3,4. We will use the notation with \mu=0,1,2,3 unless we find some notational issue. Unlikely to the 3d world, the 4d world of special relativity forces us to use something different to the Kronecker delta in the above scalar product. This new object is generally called pseudoeuclidean “metric”, or Minkovski metric:

\boxed{A^\mu B_\mu=\eta_{\mu\nu}A^\mu B^\nu=\mathbb{A}\cdot\mathbb{B}=A^xB_x+A^yB_y+A^zB_z-A^tB_t}

In matrix form,

\boxed{A^\mu B_\mu=A^\mu\eta_{\mu\nu}B^\mu=A^T\eta B=\begin{pmatrix}A^t & A^x & A^y & A^z\end{pmatrix}\begin{pmatrix}-1 & 0 & 0 & 0\\ 0& 1 & 0 & 0\\ 0 & 0 & 1& 0\\ 0 & 0& 0& 1\end{pmatrix}\begin{pmatrix}B^t \\ B^x \\ B^y \\ B^z\end{pmatrix}}

Important remarks:

1st. \eta=\eta_{\mu\nu}=diag(-1,1,1,1) in our convention. The opposite convention for the scalar product would give \eta=diag(1,-1,-1,-1).

2nd. The square of a “spacetime” vector is its “lenght” in spacetime. It is given by:

\boxed{A^2=\mathbb{A}\cdot\mathbb{A}=A^\mu A_\mu=A_x^2+A_y^2+A_z^2-A_t^2=-(\mbox{SQUARED SPACETIME LENGTH})}

In particular, for the position spacetime vector

S^2=x^\mu x_\mu=-c^2\tau ^2

3rd. Unlike the euclidean 3d space, 4d noneuclidean spacetime introduces “objects” non-null that whose “squared lenght” is equal to zero, and even weirder, objects that could provide a negative dot product!

4th. For spacetime events given by a spacetime vector \mathbb{X}=x^\mu e_\mu=(ct,\mathbf{r}), and generally for any arbitrary events A and B (or 4-vectors) we can distinguish:

i) Vectors with A^2=\mathbb{A}\cdot\mathbb{A}>0 are called space-like vectors.

ii) Vectors with A^2=\mathbb{A}\cdot\mathbb{A}=0 are called null-vectors, isotropic vectors or sometimes light-like vectors.

iii) Vectos with A^2=\mathbb{A}\cdot\mathbb{A}<0 are called time-like vectors.

Thus, in the case of the spacetime (position) vector, every event can be classified into space-like, light-like (null or isotropic) and time-like types, depending on the sign of s^2=\mathbb{X}\cdot\mathbb{X}=X^T\eta X. Moreover, the metric itself allows us to “raise or lower” indices, defining the following rules for components:

x^\mu=\begin{pmatrix}ct \\ \mathbf{r}\end{pmatrix}\rightarrow x_\mu =\eta _{\mu \nu}x^\nu=x^\nu \eta_{\mu\nu}=X^T\eta=(-ct,\mathbf{r})

The minkovskian metric has a very cool feature too. Its “square” is the identity matrix. That is,


Then, the metric is its own inverse:


In components, it reads




and where we have introduced the Kronecker delta symbol in four dimensions in the same manner we did in the 3d space. Therefore, the Kronecker delta has only non-null components when \mu=\sigma, so that \delta^0_{\;\;0}=\delta^1_{\;\;1}=\ldots=1

Subindices are called generally “covariant” components, while superindices are called “contravariant” components. It is evident that euclidean spaces don’t distinguish between covariant and contravariant components. The metric is the gadget we use in non-euclidean metric spaces to raise and lower indices/components from tensor quantities. Tensors are multi-oriented objects. The metric itself is a second order tensor, more precisely, the metric is a second order rank 2 covariant tensor. 4-vectors are contravariant object with a single index. Upwards single-indexed tensors are contravariant vectors, downwards single-indexed tensores are covariant vectors. When a metric is introduced, there is no need to distinguish covariant and contravariant tensors, since the components can be calculated with the aid of the metric, so we speak about n-th rank tensors. Multi-indixed objects can have some features of symmetry. The metric is symmetric itself, for instance, under the interchange of subindices ( columns and rows). So, then

\eta_{\mu\nu}=\eta_{\nu\mu}\rightarrow \eta =\eta^T

What kind of general objects can we use in Minkovski spacetime or even more general spaces? Firstly, we have scalar fields or functions, i.e., functions depending only on the spacetime coordinates:

\psi (x)=\psi' (x') \leftrightarrow \psi (x^\mu) =\psi ' (x'^\mu) \leftrightarrow \psi (x,y,z, ct)= \psi ' (x',y',z',ct')

Another objetct we have found are “vectors” or “oriented segments”. In 3d space, they transform as \mathbf{x}'=R\mathbf{x}. In 4d spacetime, we found \mathbb{X}'=L\mathbb{X}.

In 3d space, we also found pseudovectors. They are defined via the cross product, that in components read: c^i=\epsilon ^{ijk}a_jb_k, where the new symbol \epsilon^{ijk} is a completely antisymmetric object under the interchange of any pair of indices (with \epsilon^{123}=+1) is generally called Levi-Civita symbol. This symbol is the second constant object that we can use in any number of dimensions, like the Kronecker delta

\delta ^i_{\;\; j}=\begin{cases}+1,\mbox{if}\; i=j\\ 0,\mbox{otherwise}\end{cases}

The completely antisymmetric Levi-Civita symbol has some interesting identities related with the Kronecker delta. Thus, for instance, in 2d and 3d respectively:


\epsilon^{ijk}\epsilon_{ilm}=\delta^{j}_{\;\; l}\delta^{k}_{\;\; m}-\delta^{j}_{\;\; m}\delta^{k}_{\;\; l}



We have also the useful identity:

\epsilon^{i_1i_2\ldots i_n}\epsilon_{i_1i_2\ldots i_n}=n!

The n-dimensional Levi-Civita symbol is defined as:

\epsilon^{i_1i_2\ldots i_n}a_1a_2\ldots a_n=det(\mathbf{a_{1}},\mathbf{a_2},\ldots,\mathbf{a_{n}})

and its product in n-dimensions

\varepsilon_{i_1 i_2 \dots i_n} \varepsilon_{j_1 j_2 \dots j_n} = \begin{vmatrix}\delta_{i_1 j_1} & \delta_{i_1 j_2} & \dots & \delta_{i_1 j_n} \\\delta_{i_2 j_1} & \delta_{i_2 j_2} & \dots & \delta_{i_2 j_n} \\\vdots & \vdots & \ddots & \vdots \\ \delta_{i_n j_1} & \delta_{i_n j_2} & \dots & \delta_{i_n j_n} \\ \end{vmatrix}

or equivalently, given a nxn matrix A=(a_{ij})

\epsilon^{i_1i_2\ldots i_n}a_{1i_i}a_{2i_2}...a_{ni_n}=det( a_{ij})=\dfrac{1}{n!}\epsilon^{i_1i_2\ldots i_n}\epsilon^{j_1j_2\ldots j_n}a_{i_1 j_1}a_{i_2 j_2}\ldots a_{i_n j_n}

This last equation provides some new quantity called pseudoscalar, different from the scalar function in the sense it changes its sign under parity in 3d, while a common 3d scalar is invariant under parity! Generally speaking, determinants (pseudoscalars) in even dimensions are parity conserving, while determinants in odd dimensions don’t conserve parity.

Like the Kronecker delta, the epsilon or Levi-Civita can be generalized to 4 dimensions (or even to D-dimensions). In 4 dimensions:

\epsilon^{\mu\nu\sigma\tau}=\epsilon_{\mu\nu\sigma\tau}=\begin{cases}+1,\mbox{if} (\mu\nu\sigma\tau)\mbox{is an even permutation of 0,1,2,3}\\-1,\mbox{if} (\mu\nu\sigma\tau)\mbox{is an odd permutation of 0,1,2,3}\\ 0,\mbox{otherwise}\end{cases}

In general, unlike the Kronecker deltas, the Levi-Civita epsilon symbols are not ordinary “tensors” (quantities with subindices and superindices, with some concrete properties under coordinate transformations) but more general entities called “weighted” tensors (sometimes they are also called tensorial densities). Indeed, the generalized Kronecker delta can be defined of order 2p is a type (p,p) tensor that is a completely antisymmetric in its ”p” upper indices, and also in its ”p” lower indices.  This characterization defines it up to a scalar multiplier.

\delta^{\mu_1 \dots \mu_p }_{\;\;\;\;\;\;\;\;\;\; \nu_1 \dots \nu_p} =\begin{cases}+1 & \quad \text{if } \nu_1 \dots \nu_p \text{ are an even permutation of } \mu_1 \dots \mu_p \\-1 & \quad \text{if } \nu_1 \dots \nu_p \text{ are an odd permutation of } \mu_1 \dots \mu_p \\ \;\;0 & \quad \text{in all other cases}.\end{cases}

Using an anti-symmetrization procedure:
\delta^{\mu_1 \dots \mu_p}_{\;\;\;\;\;\;\;\;\;\;\nu_1 \dots \nu_p} = p! \delta^{\mu_1}_{\lbrack \nu_1} \dots \delta^{\mu_p}_{\nu_p \rbrack}

In terms of an pxp determinant:
\delta^{\mu_1 \dots \mu_p }_{\;\;\;\;\;\;\;\;\;\;\nu_1 \dots \nu_p} =\begin{vmatrix}\delta^{\mu_1}_{\nu_1} & \cdots & \delta^{\mu_1}_{\nu_p} \\ \vdots & \ddots & \vdots \\ \delta^{\mu_p}_{\nu_1} & \cdots & \delta^{\mu_p}_{\nu_p}\end{vmatrix}

Equivalently, it could be defined by induction in the following way:

\delta^{\mu \rho}_{\nu \sigma} = \delta^{\mu}_{\nu} \delta^{\rho}_{\sigma} - \delta^{\mu}_{\sigma} \delta^{\rho}_{\nu}

\delta^{\mu \rho_1 \rho_2}_{\nu \sigma_1 \sigma_2} = \delta^{\mu}_{\nu} \delta^{\rho_1 \rho_2}_{\sigma_1 \sigma_2} - \delta^{\mu}_{\sigma_1} \delta^{\rho_1 \rho_2}_{\nu \sigma_2} + \delta^{\mu}_{\sigma_1} \delta^{\rho_1 \rho_2}_{\sigma_2 \nu}
\delta^{\mu \rho_1 \rho_2 \rho_3}_{\nu \sigma_1 \sigma_2 \sigma_3} = \delta^{\mu}_{\nu} \delta^{\rho_1 \rho_2 \rho_3}_{\sigma_1 \sigma_2 \sigma_3} - \delta^{\mu}_{\sigma_1} \delta^{\rho_1 \rho_2 \rho_3}_{\nu \sigma_2 \sigma_3} + \delta^{\mu}_{\sigma_1} \delta^{\rho_1 \rho_2 \rho_3}_{\sigma_2 \nu \sigma_3} - \delta^{\mu}_{\sigma_1} \delta^{\rho_1 \rho_2 \rho_3}_{\sigma_2 \sigma_3 \nu}
and so on.

In the particular case where  p=n  (the dimension of the vector space), in terms of the Levi-Civita symbol:
\delta^{\mu_1 \dots \mu_n}_{\nu_1 \dots \nu_n} = \varepsilon^{\mu_1 \dots \mu_n}\varepsilon_{\nu_1 \dots \nu_n}

Under a Lorentz transformation, we have ( using matrix notation) the next transformations:

A'=LA \leftrightarrow (A')^T=A^TL^T

\eta A'=\eta LA\rightarrow (A')^T\eta A'=A^T(L^T\eta L)A

A'^T\eta A'=A^T \eta A iff L^T \eta L=\eta

so the metric itself is “invariant” under a Lorentz transformation (boost). I would like to remark that the metric can be built from the basis vectors in the following way:

\eta_{\mu \nu}=e_\mu e_\nu=e_\mu \cdot e_\nu= g\left( e_\mu ,e_\nu\right)=\begin{cases}-1, \mu =\nu =0\\ +1, \mu =\nu =1,2,3\\ 0,\mu \neq \nu \end{cases}

For Lorentz transformations, we get that

x^\mu\rightarrow x'^\mu =\Lambda^\mu _{\;\;\; \nu} x^\nu


\Lambda^\mu _{\;\;\; \nu}=\begin{pmatrix}\Lambda^0_{\;\;\; 0}& \Lambda^0_{\;\;\; 1}& \Lambda^0_{\;\;\; 2}& \Lambda^0_{\;\;\; 3}\\ \Lambda^1_{\;\;\; 0}& \Lambda^1_{\;\;\; 1}& \Lambda^1_{\;\;\; 2}& \Lambda^1_{\;\;\; 3} \\ \Lambda^2_{\;\;\; 0}& \Lambda^2_{\;\;\; 1}&\Lambda^2_{\;\;\; 2}& \Lambda^2_{\;\;\; 3}\\ \Lambda^3_{\;\;\; 0}& \Lambda^3_{\;\;\; 1}& \Lambda^3_{\;\;\; 2}& \Lambda^3_{\;\;\; 3}\end{pmatrix}

Moreover, the equation \Lambda^{-1}=\eta^{-1}\Lambda \eta, i.e., for pseudo-orthogonal Lorentz transformations, taking the determinant, we deduce that

\det (\Lambda)=\pm 1\leftrightarrow \det (\Lambda)^2=1

We can fully classify the Lorentz transformations according to the sign of the determinant and the sign of the element \Lambda^0_{\;\;\; 0} as follows:

\Lambda\begin{cases} \mbox{Proper Lorentz transf.(e.g.,boosts,3d rotations, Id)}: L^\uparrow_+ \det (\Lambda)=+1, \Lambda^0_{\;\;\; 0}\ge 1\\ \mbox{Improper Lorentz transf.:}\begin{cases}L^\downarrow_+ \det (\Lambda)=+1, \; \Lambda^0_{\;\;\; 0}\le -1\\ L^\uparrow_- \det (\Lambda)=-1,\; \Lambda^0_{\;\;\; 0}\ge 1\\ L^\downarrow_- \det (\Lambda)=-1,\; \Lambda^0_{\;\;\; 0}\le 1\end{cases} \end{cases}

For instance, let us write 5 kind of Lorentz transformations:

1) Orthogonal rotations. They are continuous (proper) Lorentz transformations with a 3×3 submatrix \Omega +\Omega^T=0:

\Lambda=\begin{pmatrix}1 & \mathbf{0}\\ \mathbf{0} & \Omega\end{pmatrix}

2) Boosts. They are continuous (proper) Lorentz transformations mixing spacelike and timelike coordinates. The matrix has in this case the form:

\Lambda =\begin{pmatrix}\gamma & -\beta \gamma & \mathbf{0}\\ -\beta \gamma & \gamma & \mathbf{0}\\ \mathbf{0}&\mathbf{0}& \mathbb{I}\end{pmatrix}

3) PT symmetry. Discrete non-proper Lorentz transformation. It “inverts” space and time coordinates in the sense \mathbf{r}\rightarrow -\mathbf{r} and t\rightarrow -t. They belongs to L_+^\downarrow. The matrix of this transformation is:


4) Parity. Discrete non-proper Lorentz transformation. It inverts only the spacelike components of true vectors ( be aware of pseudovectors!) in the sense \mathbf{r}\rightarrow -\mathbf{r}. Sometimes, it is denoted by P, parity, and this transformation belongs to L_-^\uparrow. It is defined as follows:


5) Time reversal, T. Discrete non-proper Lorentz transformation. It inverts the direction of time in the sense that t\rightarrow -t. \Lambda_T belongs to the set L_-^\downarrow

Remark: If X^2>0, then L_+^\uparrow, L_-^\uparrow don’t change the sense of time. This is why they are called orthochronous!