Friday, May 22, 2020

Say you test positive (or negative), what then?

Positive Test Then What?

If I test positive, how sure am I that it is a true positive? I write this post in the context of SARS-CoV-2, but the conclusions are exactly as relevant for pregnancy tests or any other detection problems.

I am writing this post, because I keep hearing and reading wrong conclusions everywhere, regarding SARS-CoV-2 tests. Interpreting test results correctly is crucial both when taking decisions about one's health and when discussing public health policies. I am going to try to explain it as simply as I can.

As with any detection problem, the statistics are subtle and unintuitive but not so difficult.
I am not an expert in biology, so I when I talk about that part, take it with a grain of salt.
In order to understand what the results of the test mean, it is important to understand what it measures. I am concerned in this post with the mathematics of certainty.
I have studied statistics, detection and estimation theory in depth, and that part, I do understand well.

Terminology: Measuring How Good a Test Is

There are four numbers which are important for a test: PPV, NPV, sensitivity and specificity.
How sure I am that a positive is true, is measured by the PPV (conversely NPV for a negative).
These two numbers depend on circumstances and need to be estimated.

How good a test is measured by sensitivity (specificity for negatives), the quotient between true positives and what should have been a positive (true positives + false negatives).

How Do I Interpret My Test Results?

I run a test to check a hypothesis. For example, I take a test for Covid-19. It may be a qPCR to see if I am infectious or IgG Elisa to see if I already had the disease.

Let's say I am in the second situation, I am trying to see if I may be immune (or at least if I have developed some antibodies, there are still some questions about how immunity works for this disease). I look at the test and it has a specificity of 95.0 % and a sensitivity of 80.0 %. This means that if I test 100 negative people, I would get 5.0 false positives on average (obtained with: $100\frac{95.0}{95.0+5.0}=95.0\%$). If I test 100 positive people, I would get an average of 20 false negatives ($100\frac{80}{80+20}=80.0\%$). This looks quite good compared with many of the available tests. False positives are worse than false negatives here, we are talking about immunity.

Assume I have not had any symptoms and the seroprevalence where I live, the number of people who had the disease and would test positive with an ideal perfect test, is of 5%. If I test positive, how sure am I that I actually had the disease and so I may be immune? Sure around 90% right? No, this is completely wrong and is what this post is about.

Before taking any conclusions that you are immune or before discussing the change in any public policies (immune passports, seroprevalence studies), be sure you understand the issues involved.

In the above example, I have 50% chances of being a true positive. What? Why?

Step by Step - Testing the whole population

So, step by step, let's start with a 100 people of which 5% is infected. That means 5 out of a 100 have been infected. Let's draw them (I have chosen a population of a 100 to make numbers
easy, but any conclusions can be extrapolated to a population of any size).

1) Population: 5 with antibodies out of a 100

Then we test them. We obtain 1 false negative (the odds are one false negative now and then, there could be none, the results are similar) and 4 false positives. We are bound to have some false positives because even if the test is very good, there are a lot of true negatives and some
of the tests are bound to fail.

2) Test results for the population (95% specificity, 80% sensitivity):

4 true positives, 91 true negatives,

4 false positives, 1 false negative


In our example, we took the test and got a positive, so we are either a false positive or a true positive. Now let's examine just the people who tested positive.

3) Subset who tested positive

Half of them are wrong!
The parameters of the test are just half of the story. When I am buying the test, I am interested in the specificity and sensitivity which measure how accurate the test is. However, in order to interpret the test, I need to see the prevalence in the measured population or, in other words, the probability that I had the disease in the first place. If the probability that I had the disease is too low, no matter how good the test (no test is perfect), the results are going to be disappointing.

If I can decrease the green part above, I can be more certain of the result. If I had symptoms, for example, things are very different.
Let's see what happens.

Step by Step - Looking only at people who got symptoms

For this example, we are going to assume that 7% of the population got symptoms.

Of course, looking at people who presented symptoms is just one way to adjust the prior probability, we can also look at people who were in close contact with infected and other things.

1) Population: 7 people who presented symptoms

2) Test results for the 7 people who presented symptoms

(95% specificity, 80% sensitivity)

4 true positives, 2 true negatives

0 false positives, 1 false negative

When look at the subset of the tested population that presented some symptoms at some point in the last month, the story is completely different:

3) Subset who tested positive

 Almost 100% of the positives are true positives (we have lost some data because the population is small and we cannot have half a person, but with a bigger population the results are very good nonetheless, around 96%).
We don't need to test only the people with symptoms, but our knowledge of someone having symptoms or being in contact with someone who has been infected or any other knowledge, can color our conclusions.

In conclusion...

One of the reasons (the other being bad incentives for contagion and discrimination) against an immune passport is that when the prevalence of the disease is low, a positive test does not carry a whole lot of certainty with it, without other information, and people are incentivized to lie.
Another problem is the false sense of security that a false positive conveys. People may believe they are immune (which may not be true even with a true positive, unless they are tested for neutralizing antibodies) and then may feel they are protected and put themselves and their community at risk.

And What If I Get a Second Test?

A second test may help, or not. It depends on the reason I may have got a false positive. Some tests measure for antibodies and there is cross-reactivity with antibodies for other viruses. For example, in SARS-CoV-2, there may be cross-reactivity with some cold viruses (which are also coronaviruses).
If I got the positive because of that, a second test will not help. Understanding how close is a false positive from a random event and how close it is from a repeatable event is important when thinking about taking a second test and interpreting the result. Taking a different test which looks for different antibodies may help, for example. In the event of the two tests being statistically independent (meaning  that if I get a false positive in the first one, I will not be able to predict the same mistake in the second one) two tests will help and can actually confirm the result. This is called orthogonal testing in medical parlance. In the case above, if both tests are equally good, it will take the result up to 75% certainty because the probability of making a mistake twice is the multiplication of both probabilities.
If they are not independent, like in the cross-reactivity example, taking the second test, may not add any information at all.

Probability: Bayes' Theorem

The theory behind all this is Bayes' Theorem and the starting probabilities conditioning the results
are called priors or prior probabilities. To know more about this, you can study bayesian inference.

Thanks to Elisa for her collaboration in this post. The mistakes are all mine.

El test ha sido positivo, ¿es un verdadero positivo?

El test ha sido positivo ¿y ahora, qué?

Estoy escribiendo este post en el contexto del SARS-CoV-2, pero las conclusiones son igual de relevantes para pruebas de embarazo o para cualquier otro problema de detección.

Escribo esta publicación porque sigo escuchando y leyendo conclusiones erróneas en todas partes con respecto a las pruebas de SARS-CoV-2. Interpretar los resultados de los tests correctamente es crucial tanto a la hora de tomar decisiones sobre la propia salud, como a la de discutir sobre políticas de salud pública.  Voy a tratar de explicar la interpretación de resultados de la manera más simple posible.

Como con cualquier problema de detección, las estadísticas son sutiles y poco intuitivas, pero no tan difíciles de entender. No soy un experto en biología, así que cuando hablo de esa parte, hay que aplicar un cierto escepticismo y cautela. Para entender lo que significan los resultados de la prueba, es importante entender lo que mide. En este post me preocupan las matemáticas que conciernen a la certeza. He estudiado estadística, detección y teoría de la estimación en profundidad, y esa parte, la entiendo bien.

Terminología: medir cómo de bueno es un test

Hay cuatro números relevantes para entender un test, (Valor Predictivo Positivo, PPV en inglés) VPP, VPN (Valor Predictivo Negativo, NPV en inglés), sensibilidad y especificidad.

El VPP mide lo seguro que estoy de que un positivo es verdadero (y por el contrario, el VPN mide la certeza de un negativo). Estos dos números dependen de las circunstancias y deben estimarse.

Cómo de bueno es un test, lo mide la sensibilidad (especificidad para los negativos), el cociente entre los verdaderos positivos y lo que debería haber sido positivo (verdaderos positivos + falsos negativos).

¿Cómo interpreto los resultados de mi test?

Cuando realizo un test es para verificar una hipótesis. Por ejemplo, me hago un test para Covid-19, ya sea un qPCR para ver si soy infeccioso o IgG Elisa para ver si ya he pasado la enfermedad.

Digamos que estoy en la segunda situación y estoy tratando de ver si puedo ser inmune (o al menos si he desarrollado algunos anticuerpos, ya que todavía hay dudas sobre cómo funciona la inmunidad para esta enfermedad). Miro el test y tiene una especificidad del 95.0% y una sensibilidad del 80.0%. Esto significa que si evalúo a 100 personas negativas, obtendría 5.0 falsos positivos en promedio (se obtiene así: $100\frac{95.0}{95.0+5.0}=95.0\%$). Si pongo a prueba a 100 personas positivas, obtendré un promedio de 20 falsos negativos  ($100\frac{80}{80+20}=80.0\%$). Estos resultados no son malos comparados con muchos test comerciales. Los falsos positivos son peores que los falsos negativos aquí porque estamos hablando de inmunidad. Pensar que no se es inmune es menos peligroso que creerse inmune.

Supongamos que no he tenido ningún síntoma y la seroprevalencia donde vivo, es decir, el número de personas que han tenido la enfermedad y que darían positivo con una prueba perfecta ideal, es del 5%. Si el resultado es positivo, ¿cómo de seguro estoy de que realmente tuve la enfermedad y, por lo tanto, puedo ser inmune? Seguro que alrededor del 90% ¿verdad? No, esto es completamente incorrecto y sobre eso se trata este post.

Antes de sacar conclusiones sobre si se es inmune o no y antes de discutir el cambio en cualquier política pública (pasaportes de inmunidad, estudios de seroprevalencia), hay que asegurarse de entender bien los problemas involucrados.

En el ejemplo anterior, tengo un 50% de probabilidad de ser un verdadero positivo. ¿Qué? ¿Por qué?

Paso a paso: test de toda la población

Entonces, paso a paso, si comenzamos con 100 personas, de las cuales el 5% estuvo infectado, eso significa que 5 de cada 100 tienen anticuerpos. Vamos a dibujarlos (he elegido una población de 100 personas para hacer los números más fáciles, pero las conclusiones se pueden extrapolar a una población de cualquier tamaño).

1) Población: 5 de 100 con anticuerpos

Les hacemos un test. Obtenemos 1 falso negativo (las probabilidades indican que habrá un falso negativo de vez en cuando, podría no haber ninguno y los resultados serían similares) y 4 falsos positivos. Es casi seguro que tendremos algún falso positivo porque incluso si la prueba es muy buena, hay muchos verdaderos negativos y algunas de las pruebas están destinadas a fallar.

2) Resultados del test para la población (95% de especificidad, 80% de sensibilidad):
4 falsos positivos, 91 verdaderos negativos,
4 verdaderos positivos, 1 falso negativo


En nuestro ejemplo, nos hicimos el test y obtuvimos un resultado positivo, por lo que tenemos que ser un falso positivo o un verdadero positivo. Ahora examinemos solo a las personas que dieron positivo.

3) Subconjunto que dio positivo

¡La mitad de ellos son falsos positivos!

Los parámetros propios del test son solo la mitad de la historia. Cuando compro un test, me interesa la especificidad y la sensibilidad que miden parámetros del propio test. Sin embargo, para interpretar la prueba, necesito ver la prevalencia en la enfermedad en la población medida o, en otras palabras, la probabilidad de que tuviera la enfermedad en primer lugar. Si la probabilidad de que tuviera la enfermedad es demasiado baja, no importa cómo de bueno es el test (ningún test es perfecto) y los resultados serán decepcionantes.

Si puedo disminuir la parte verde anterior, podré estar más seguro del resultado. Si tuve síntomas, por ejemplo, las cosas son muy diferentes.

Veamos qué sucede.

Paso a paso: observar solo a las personas que han tenido síntomas

Para este ejemplo, vamos a suponer que el 7% de la población ha presentado síntomas.  Por supuesto, observar a las personas que presentaron síntomas es solo una forma de ajustar la probabilidad previa, también podemos observar a las personas que estuvieron en contacto cercano con personas infectadas y otros parámetros.

1) Población: 7 personas que presentaron síntomas

2) Resultados de la prueba para las 7 personas que presentaron síntomas
(95% especificidad, 80% de sensibilidad)
4 verdaderos positivos, 2 verdaderos negativos
0 falsos positivos, 1 falso negativo

Cuando miramos, de las personas a las que hemos hecho el test el subconjunto que presentaron síntomas durante el último mes, la historia es completamente diferente:

3) Subconjunto que dio positivo

Casi el 100% de los positivos son verdaderos positivos (hemos perdido algunos datos porque la población es pequeña y no podemos tener media persona, pero con una población mayor los resultados son muy buenos, alrededor del 96%).

No necesitamos evaluar sólo a las personas con síntomas, pero nuestro conocimiento de que alguien tiene síntomas o de que está en contacto con alguien que ha sido infectado o cualquier otro conocimiento extra puede influir y ayudar a precisar los resultados.

En conclusión...

Una de las razones (la otra son malos incentivos para el contagio y la discriminación) contra un pasaporte de inmunidad es que cuando la prevalencia de la enfermedad es baja, una prueba positiva no conlleva mucha certeza, sin añadir otra información, e incentiva a mentir a la gente.

Otro problema es la falsa sensación de seguridad que transmite un falso positivo. Las personas pueden creer que son inmunes (lo que puede no ser cierto incluso con un verdadero positivo, a menos que se les haga una prueba de anticuerpos neutralizantes) y luego pueden sentir que están
protegidas y ponerse a sí mismas y a su comunidad en riesgo.

¿Y qué pasa si me hago una segunda prueba?

Un segundo test puede ayudar, o no. Depende de la razón por la que salió el falso positivo. Algunas pruebas miden los anticuerpos y hay reactividad cruzada con anticuerpos para otros virus. Por ejemplo, en el SARS-CoV-2, puede haber reactividad cruzada con algunos virus del resfriado común (que también son coronavirus).

Un segundo test no ayudará si el resultado positivo lo obtuve por esa causa. Comprender cómo de cerca está un falso positivo de un evento aleatorio y cómo de cerca está de un evento repetible es importante cuando se piensa en pasar una segunda prueba e interpretar el resultado. Hacerse un test diferente que busque anticuerpos diferentes puede ayudar, por ejemplo. En el caso de que los dos tests sean estadísticamente independientes (lo que significa que si obtengo un falso positivo en el primero, no podré predecir el mismo error en el segundo) dos pruebas ayudarán y pueden confirmar el resultado. Los médicos le llaman a esto testado ortogonal. En el caso anterior, si ambos tests son igual de buenos, el resultado tendrá un 75% de certeza porque la probabilidad de cometer un error dos veces es la multiplicación de ambas probabilidades.

Si no son independientes, como en el ejemplo de reactividad cruzada, pasar un segundo test puede no agregar ninguna información.

Probabilidad: Teorema de Bayes

La teoría detrás de todo esto es el Teorema de Bayes y la probabilidades que condicionan los resultados se llama probabilidad a priori. Para saber más de esto, puedes estudiar inferencia bayesiana.

Gracias a Elisa por su colaboración en esta publicación. Los errores son todos míos.

Saturday, July 14, 2018

Pi is not dead yet (never has been)

Not very long ago, I watched a video by Vi Hart about $\pi$ vs. $\tau$ radicalization. The video jests, but not really, about the problems of dehumanization of differing opinions in the internet. I am not sure if this really happens with $\pi$ vs. $\tau$, which is a fake controversy (or is it?) with educational purposes. In any case, this attack of differing opinions is a real phenomenon, so I will try to tread lightly. This post is about how the $\tau$ approach, to substitute the use of $\pi$ in mathematical formulas by $\tau=2\pi$ is on the wrong track, even if it counts among its supporters fine people such as John Conway and Vi Hart. I will try to explain why and, while doing it, introduce the reader to some deep and beautiful mathematics.

In case you only want to get the gist of it and not read the post, $\pi$ is at the center of a combinatorial/geometrical generalization involving Euler's Gamma function. The new constant, $\tau,$ does not fit well in this generalization. Absorbing some of the twos in the formulas can (and should be done) by using the diameter instead of the radius, as it is a more fundamental concept.


First of all, $\pi$ is a mathematical constant which has been known since ancient Greek times, although the Chinese, the Babilonians, and the Egyptians, the Indians and any ancient civilization that has studied geometry and mathematics has rediscovered it in one way or another. Some civilizations have discovered only rough aproximations, other algorithms or series which calculate the irrational number to any precision.

The number $\pi$ itself is fascinating, it is irrational, it seems to appear out of nowhere in different formulas, may be normal¹ (it is unknown at the moment) and the quest to calculate more of its digits has pushed the science and engineering of computation forward. An example of this, is the discovery of the fantastic Spigot algorithm.

¹A normal number is a real number whose sequence of digits is distributed uniformly in every base.

Anyway, $\pi$ was originally defined as the ratio of the circumference of a circle to its diameter (twice the radius).

So, if we have a circle of radius $r$, the circumference is $C=2\pi r$.
So what is the polemic? The $\tau$-ists propose to be done with $\pi$ and use $\tau=2\pi$ as a constant. The $\pi$-ists oppose them. Bob Palais, an ardent proponent of $\tau$, has documented some of the history of the polemic, there is a very famous video by Vi Hart, there is a tau manifesto an xkcd comic strip, and a special page dedicated to it by Robin Whitty (theorem of the day).

First let's recap the most popular and apparently unrelated places where $\pi$ appears. We will explain them in more detail and simpler terms later.

The same constant appears also in the surface of the circle, $S_{circle}=\pi r^2$, in the volume of the sphere $V_{sphere}=\frac{4\pi r^3}{3}$, and in general in formulas of spheres in various dimensions (hyperspheres, of which the circle and the sphere are particular cases).
On an apparently unrelated issue, the same constant appears in Euler's identity,
$e^{i\pi} + 1 = 0$, where $e$ is the natural base of the logarithm or Napier's constant, $i$ is the imaginary unit (the positive square root of -1 so $i=+\sqrt{-1}$).
Of course, Euler's identity is very related to the circle, because of Euler's formula, $$e^{ix}=\cos{(x)} + i \sin{(x)},$$ where $e$ is again Napier's constant, $i$ the imaginary unit and $\sin(x)$ and $\cos(x)$ are the trigonometric funcions sine and cosine with the argument in radians.
Radians is a measure of angle, which is the length of the arch or, in other words, the part of the circumference spanned by the angle being measured of the unit circle (of radius $1$).
The appearance of $\pi$ in Euler's identity is related to the complex numbers being a two dimensional space and a Clifford algebra. I will not get into detail here but $\pi$ is there because of circles and hyperspheres and their relation with rotations and rotors. This also explains the appearance of $\pi$ in Cauchy's integral theorem, Gauss and Stokes' theorem and its various generalizations for differential forms and exterior derivatives. This means that if we understand circles, there is not much new in terms of $\pi$ in all these cases.

The same constant appears again when dealing with the Gamma function, a natural continuous generalization of the factorial, $n!=n(n-1)(n-2)\cdots2\cdot1$, which we will explain later. For real numbers, the Gamma function is given by the integral
$$\Gamma(z)=\int_0^\infty x^{z-1}e^{-x}\, dx.$$

It is easy to prove that (integration by parts and induction) that when $n>0$ is a positive integer $\Gamma(n)=(n-1)!$.
Our constant appears in the formula $\Gamma(\frac{1}{2})=\sqrt{\pi}$. Because of the properties of the Gamma function, at every half integer $n>0$ its value is related to $\pi$ by the two formulas
$$\Gamma\Bigg(\frac{1}{2}+n\Bigg)=\frac{(2n)!}{4^n n!}\sqrt{\pi}$$ and
$$\Gamma\Bigg(\frac{1}{2}-n\Bigg)=\frac{(-4)^n n!}{(2n)!}\sqrt{\pi}.$$

Getting rid of those pesky twos

The proponents of $\tau$ have their hearts in the right place. Their idea is that a lot of the circle formulas have a $2$ which could be absorbed by the circle constant $\tau$. In this way, radians go from $0$ to $\tau$, Eulers identity becomes (by squaring) $e^{i\tau} = 1$ and most of the everyday uses of $\pi$ have one less constant. Other than the complaint that the letter $\tau$ is already taken in other fields or the appearance of $\pi$ without a $2$ in a different family of formulas, the interesting part of the controversy as it is now can be summarized as the side with more formulas should win the argument. As an example, the Pi manifesto looks like a mirror reflection of the Tau manifesto in this regard. There is also the argument of it being an unclear win which means a better strategy would be to keep the status quo, which is the path of less effort. Everyone knows already what $\pi$ stands for.

I have what I believe is a qualitatively different argument to why $\pi$ is better, that of naturality in the definition. I am not using naturality here as in Cathegory theory, but the concept is similar. To avoid confusion, I will call this property platonicity, which is a made up word I use to describe the essence of platonic objects, mathematical ideal objects. We have to try to maximize platonicity in the definition of the constant. When I say a definition of a constant has platonicity I mean that the definition respects the the internal structure of the formula and does not obstruct its generalization. The mathematical object (platonic object) approximates a concept which is useful as a building block for other concepts, because it is closer to being discovered than invented. This is, of course, a view based on mathematical platonism, which is, in any case, at the heart of many of the arguments already present.

While $\tau$-ists (and other fellow $\pi$-ists) argue in a sense from platonicity, they only characterize it as having less constants for the greatest number of formulas. I will rather argue that the important issue to consider is how formulas and concepts evolve as they are generalized.

So why is $\pi$ special? How do we generalize it?

There are two fundamental concepts from which $\pi$ emerges. Both concepts merge, once we generalize enough, and both are important to understand the matter at hand.

Geometrical $\pi$

As we saw, the ancient definition of $\pi$ comes from the circumference of a circle. It is a geometrical concept, and there are two ways I know to generalize it. First of all, as in all things
geometrical, by adding dimensions. So, we start with a circle. A circle is the set of points at the same distance of a special point called center. This generalizes easily to any number of dimensions. In 3D it is a sphere, and in more dimensions, it is the more difficult to see hypersphere.

A circle is defined by the equation $$\sqrt{x^2+y^2}=r.$$
For a sphere $$\sqrt{x^2+y^2+z^2}=r.$$
The generalization to all dimensions is clear, $$\sqrt{x_1^2+x_2^2+x_3^2\cdots+x_n^2}=r$$ where $n$ is the number of dimensions. This can be written more succintly as $$\sqrt{\sum_1^n x_i^2}=r.$$

The left side, the square root of the sum of squares, $$\sqrt{\sum_1^n x_i^2}$$ is used to define a distance or a length in the space we are working. This distance is called euclidean distance.

With this definitions, we can calculate the generalization of the formula $C=2\pi R$.
The proof is a little bit tedious, and it is simpler to do with generalizations we will see later,
but it concerns in any case integrals and induction. Follow the links for all the gory details.
The important thing is that the area of a hyperball is:
$$S_n(r) = (2\pi r)V_n(r) = (2\pi r)\frac{\pi ^\frac{n}{2}}{\Gamma{\Big(\frac{n}{2}+1\Big)}}r^n.$$

The Euler Gamma appears again, we can rewrite the whole formula in terms of it only or get rid of it with the formula we gave above. Let's start with the second path. The surface of a sphere is actually the derivative of the volume which gives the formula above. The volume formula is simpler and more fundamental, we will work with that. The resulting formula after the substitution of Gamma with its expansion for half-integer numbers, $\Gamma\Bigg(\frac{1}{2}+n\Bigg)=\frac{(2n)!}{4^n n!}\sqrt{\pi},$  is

It is very difficult to find a pattern, but one thing is clear, changing $\pi$ for $\tau$ will not make it better. Following the other path we spoke about, which is getting rid of $\pi$ for its Euler gamma representation using the formula we showed before, we get

This makes the pattern much clearer and points to the twos we wanted to get rid of as belonging to the radius. So, there are two questions we should ask ourselves now. What is the Gamma function fundamentally? and why is it so important for spheres? Lets explore the Gamma function a little more deeply.

Euler's Gamma

So, what is this Euler's Gamma function we keep bumping into? One interesting way to find new things in mathematics is to generalize something discrete (like the integer numbers) into something continuous (like the real numbers). Continuous things are, some times, easier to work with, because of calculus (integrals and derivatives), and because limits exist.

Many years ago, Euler, a genius which has been called the prince of mathematicians, asked himself the following question: How do we find a continuous generalization of the factorial?
Later, some other mathematicians asked how many of these generalizations are there?

So, we want to "fill in the gaps" as it where, between the integer numbers. This is the plot
for the factorial:

And we want to find its value in the middle (the red line):

Turns out that if we ask for the function to have the minimum properties one could ask for,
only Euler's Gamma Function can satisfy them. In this sense, there is a unique generalization of the factorial, which is amazing! Euler's Gamma is the continuous version of the factorial. This is the function that Euler discovered precisely.

What are these properties we would like the factorial to have?.

Of course, we want that ,for positive arguments, the function evaluated is positive. We also want the function to be continuous so we can calculate limits and so on, which the point of all this exercise; there should be no gaps.

There is a convention that the $0!=1$. There are combinatoric reasons for this. And this lets us set up the basis for an induction definition of the factorial from which we generalize to our function.
We can define the factorial by induction as
$$(n+1)! =\begin{cases}1, &: n=0\\
(n+1)n!, &: n>0\end{cases}.$$
We can use this as a property for our function. This means that our function should satisfy
$$f(x) =\begin{cases}1, &: x=0\\
(x+1)f(x), &: x>0\end{cases}.$$
With integer numbers this is enough, but continuous real functions have more degrees of freedom, so we will need some extra properties (just one). What is the problem?
The problem is that, while the values are fixed for the integer numbers, outside of that, the may fluctuate. We have to set up conditions so that this does not happen.

So, intuitively we would like the function to not oscillate unnecessarily. This would make no sense:

The technical term for the function to be curved belly down and not have any additional humps
is to say it is a convex function.

The factorial is multiplicative and grows very fast, so it is very common to take its logarithm. The logarithm converts multiplication to addition, so $\log(n!) = \log(n)+\log(n-1)+\cdots+\log(2)$. This way the values are smaller and we can add them together to obtain the factorial. Calculation and reasoning with logarithmic values is easier. Taking the logarithm of the value of a function straightens its graph, but if we look at the discrete plot for the factorial, it still has a curved belly, so it is convex. As a consequence, we would like the plot of the log of the Gamma function to still be convex (we say the Gamma is logarithmically convex). And that is it. There are no other continuous real positive functions which behave like a generalized factorial (the functional equation above) and which are logarithmically convex. This is the Bohr-Mollerup theorem. A simple proof can be found here.

This is why the Gamma function is so important. Once defined for positive real numbers (Euler's original definition as a limit), it can be generalized to negative numbers using an integral (Euler again) and then by analytic continuation extended to all complex numbers. All this generalizations are unique, only one function fits the bill.

I am glossing over the details, but there are proofs in the links.

All this is to say that Euler's Gamma Function is not some random integral, but the continuous version of the factorial, and that this relation with $\pi$ deserves further scrutiny.

Other distances

We have already seen that the hyperspheres are defined as the set of points at a constant distance of another one we call center. We used the euclidean distance, which comes fundamentally from the Pythagoras theorem (generalized). This distance is the square root of the squares of the coordinates of the point (if the center is the origin). We wrote this as  $$d(\{0, 0\}, \{x,y\})=\sqrt{\sum_1^n x_i^2}$$ or in two dimensions as $$d(\{0, 0\}, \{x,y\})=\sqrt{x^2+y^2}.$$ Lets play in two dimensions and we can generalize easily later.
The square root is really having $\frac{1}{2}$ as exponent, so we can also write the distance as
$$d(\{0, 0\}, \{x,y\})=(x^2+y^2)^{\frac{1}{2}}.$$ If we where to change the definition of distance, we would change the value of $\pi$, even if we keep everything else the same. To define a distance, (the technical term in mathematics is a metric), and keep ourselves sane, we would have to make it behave properly by making it satisfy some conditions. Any sane metric will be greater than zero, only be different from zero if the ends of the segment we are measuring are not the same and the distance of $a$ to $b$ has to be the same as the distance from $b$ to $a$ (symmetry) and going through a third point must take at least as long as going directly (triangle inequality). These properties define a metric, but unlike with the Gamma function, there are many metrics to choose from. Each will define a value of pi. There is a family of metrics which is a very simple generalization of the Euclidean metric, the metric which comes from the p-norm. A metric is the distance between two points and a norm is the length of a vector, so they are technically related. From any norm there is an induced metric where you get the vector from the origin to the destination by subtracting both vectors and then measure it.
If one of the points is the origin, our metric (distance) will be defined as $$d(\{0, 0\}, \{x,y\})=(|x|^p+|y|^p)^{\frac{1}{p}}.$$ This is normally written for any two vectors as $$||\overline{v}-\overline{w}||_p=(|x_v-x_w|^p+|y_v-y_w|^p)^{\frac{1}{p}}$$ where the two lines $||$ represent the norm.

This family of metrics (one for each value of $p$, with $p=2$ being the Euclidean or regular everyday distance), is of particular interest because it generalizes the Euclidean metric while at the same time gets rids of the twos which come from using this particular distance, so it will illustrate us as to which constants come from where. Also when $p=1$ and when $p\rightarrow \infty$ are of particular interest (when $0<p<1$ the definition does not preserve the last property, so it is not a metric).

In two dimensions what do these distances look like? Well, say $p=1$, then $$d(v, w)=||v-w||_1=(|x_v-x_w|+|y_v-y_w|).$$ This is called the taxicab distance. If you divide the space into little squares (like the streets in a roman grid, say, for example Manhattan). The number of blocks it takes to get to a place is the taxicab distance.

Examples of three paths of different taxicab length

In our case, where the space is continuous, you can think of these squares being infinitesimally small. What is a circle for this distance? a square with the vertices in the axis.

Taxicab circle of $r=3$ with one path marked in red

Taxicab circle for in continuous space (when the block size tends to 0)

What happens on the other edge case, in the limit where $p\rightarrow\infty$?
In that case, the after the limit, we obtain $$||\overline{v}-\overline{w}||_\infty=max(|x_v-x_w|,|y_v-y_w|).$$ In that case, as a circle you get a square again, but with edges parallel to the axis. Intermediate values of $p$ are in between, with the middle being the circle.

What is the value of pi in these cases? Is pi somehow special? When defining pi for these spaces, one have to be careful to use the distance consistently to measure both the circumference and the radius. These concepts are explained very well at an introductory level in this video. $\pi_2=3.14\ldots$ is definitely special, because it is the minimum value of generalized $\pi$ for this family of distances, which is called $\pi_p$. The maximum is attained in for $p=1$ and $p=\infty$ and is $\pi_1=4$ and $\pi_\infty=4$. Things like the radians and trigonometric functions  can be generalized successfuly to this setting.

These formulas are easily generalizable to more dimensions. What are the hyperspheres in that case?

Lets call them by their name, $n$-balls of norm $p$, where the $n$ is the dimension and $p$ is the $p$ in the formula for the norm. Well, for three dimensions, $n=3$, when $p=1$ the ball is an octahedron and when $p\rightarrow\infty$ one gets a cube. The three circles can be seen (cut open) in this picture:
Hyperballs in 3 dimensions

In other words, if we are in three dimensions, $||\overline{v}||_1=R$ is an octahedron. The generalization of a cube to more dimensions is called an hypecube or a polytope, which is precisely the family of $n$-balls of norm $p=\infty.$ The generalization of the octahedron is the hyperoctahedron or crosspolytope, which is the family we get for $p=1$.

Hypercubes (polytopes) for different dimensions

Hyperoctahedra (crosspolytopes) for different dimensions

The volume inside an $n$-ball of radius $R$ of norm $p$, that is the volume of the set of points contained in $||\overline{v}||_p\le R$ is
We see now that $\pi$ comes clearly from the Gamma function, and $\Gamma(\frac{1}{2})=\sqrt{\pi}$ is quite fundamental when thinking about $\pi$, and there is no $2$ in that part of the formula. The only remaining twos in the formula come from the radius not being fundamental and if we used the diameter, there will not be any extraneous constants and everything is cleaner.

Note that can also generalize volume (lets call it p-volume) and calculate the $n$-balls of $p$-norm generalized $p$-volumes, which we will talk about later. This requires some sophisticated mathematics. The volume used above is the standard regular volume, not derived in any way from the norm.

As an aside, the fact that polytopes and crosspolytopes are dual to each other has to do with the norms which they are hyperballs being dual which happens for two $p$ and $q$ norms if $1/p + 1/q = 1$. A theorem by Minkowski, which states that if the norms are dual, the balls are polar sets. See this paper for more details.

Combinatoric generalization

We have seen that the limits of the $p$-norm $n$-balls are polyhedra. There are many connections between combinatorics and polyhedra (or more generally polytopes). Combinatorics is the branch of mathematics which counts and obtains properties of configurations of finite or discrete sets. Polyhedra are discrete sets of vertices and edges, so combinatorics is deeply entwined with properties of polyhedra. Combinatorics is also related to finite groups, probabilities, etc. In many of the formulas used in combinatorics, the Gamma function and $\pi$ appear magically, but the ideas above can give a clear path of generalization from simpler concepts, which enables understanding. To find everything about generalized polyhedra and their properties, find a copy of ''Regular Polytopes'' by Coxeter.

Both hypercubes (polytopes) and hyperoctahedra (cross-polytopes) can be defined by adding one dimension at a time and specifying which edges are connected to the edges in this new dimension.
There are actually only three families of regular polytopes that exist for any number of dimensions, the third one being regular simplices (hypertetrahedra or regular simplices).

For example, we start with a segment (two dimensions), and add a new dimension. If we are growing a hypercube, we duplicate the number of vertices, so we duplicate the number of vertices at each step. The square will have $4$ vertices, the cube $8$ and so on. The number of vertices for dimension $n$ will be $2^n$. For an hyperoctahedron, we always add two opposed not connected vertices, so the number of vertices for dimension $n$ will be $2n$. In the hyperoctahedron, all the pairs of non-opposite vertices are connected to one another or, in other words, each vertex has $2n-2$ edges connected to it. For the hypercube, each vertex is connected to $n$ other vertices, so the number of edges in each vertex is $n$.

Related to all this is the concept of duality for polyhedra. The of a dual of a polyhedron has one vertex per face with edges connecting the vertices if the faces share an edge. Hypercubes and hyperoctahedron are dual of each other, so the number of faces of a hyperoctahedron is the number of vertices of the hypercube and vice versa.

So, going back to volumes, how many n-hyperoctahedron of the same radius can we fit inside an n-hypercube? This is where we can see that combinatorics come into place. Finding the number is easy using the formula we obtained before, $\frac{V_n^\infty(r)}{V_n^1(r)}=\frac{1}{n!}$,  so we can fit $n!$ hyperoctaedrons in a hypercube.  Lets fixed the orientation of a cube in place and number its vertex. The number of ways we can do it, considering that any two numberings which can be rotated to coincide are not the same (cube's orientation fixed in place). If the cube's orientation is not fixed in place (so two numberings are the same even if they are rotated), then the number of ways to number a cube is $\frac{(n-1)!}{2^n}$, which is the number of hyperoctahedra that can fit inside a hypercube of one less dimension and half the diameter.

In order to fill the space without leaving gaps, for many different numbers of dimensions, $n$, in we have to cut hyperoctahedra in pieces to pack them together. How much space we can fit by piling up polytopes in space is called packing density, and when its maximum is  $1$, as happens, for example, trivially with hypercubes in general and with octahedra for $n=2$ and $n=4$ dimensions, there exists a tiling with no gaps. If there are gaps, we average over  infinite space to obtain the density (or average in a hypercube or other region (tile) which captures the periodic pattern (tiling or tesselation) and which we know can be packed. This properties are studied by combinatorics and probability, as they are related to finite and discrete sets (vertex, tilings) and densities.

We can make the game more interesting by thinking about how many numberings of a fixed hypercube exist with some family of numbers repeated (for example, primes can be repeated). Or numberings with random numbers with some distribution. In these cases we will get (in average, or asymptotically as the number of dimensions grow or...) something between $n!$ and $1$, which we will be able to approximate with the Gamma function or the volume of a hyperball of $p$-norm. Many if not all $\pi$ and Gamma functions which appear in probability densities and combinatorics can benefit from these ideas or interpreted in terms of them.

What is Pi, fundamentally?

So, my conclusion is that $\pi$ value is fundamentally tied to the Gamma function and the euclidean distance. The resulting formulas written generally, like $\Gamma\Big(\frac{1}{2}\Big)=\sqrt{\pi}$ would not benefit conceptually or in terms of simplicity from changing $\pi$ for $\tau$, quite the opposite.

The radius should probably give

In the spirit of getting rid of unnecessary constants and maximizing platonicity, there are still some lingering twos in formulas like $$V_n^p(r)=\frac{\Big(2r\,\Gamma{\Big(\frac{1}{p}+1\Big)}\Big)^n}{\Gamma{\Big(\frac{n}{p}+1\Big)}}.$$ I think the reason is that the radius is not a fundamental concept of mathematical reality. Again, we should look at how one generalizes this concept.

I have come across this same problem several times in different places. The branch of mathematics which studies how volume measurement is generalized is called measure theory. Essentially measure theory studies how to cut the space in little pieces ($\sigma$-algebras) and assign a number to each of them them which is their generalized volume. In measure theory when generalizing Lebesgue measure to Hausdorff measure sets are measured by their diameter (to a power depending on the dimension). In convexity analysis, where one has convex bodies and needs to measure their width the diameter is the way to measure it. Even with something as simple as generalizing a hypersphere, it is better to get rid of the radius and talk about the diameter. The diameter of any set is the distance between the two points of the set for which the distance is maximum, so it is defined generally, without any symmetry restriction, like the set having a center.
Using the diameter, rather than the radius or the side, is what gives us the most beautiful formula for hyperball of norm $p$:

Interesting ideas

From all this, we have got many interesting ideas to investigate generalizations. For example, can we define a $\pi$ for constant width objects? Can we find a family of distances for the other family of regular n-dimensional polytopes, tetrahedra, which are also self-dual? (using both at the same time so that the norm is symmetric or considering asymmetric seminorms) Can we generalize the volume for other $p$ metrics, which is to find an appropiately defined measure derived from the $p$-norm? What are the formulas for volumes of hyperballs in that case? How about $\pi$ for other norms not in the family we looked at, for example norms where the balls are polyhedral shapes? Can we look at $\pi$ and the Gamma function in different formulas of physics and mathematics and find the ties with these concepts? What are the packing densities of different families of balls?

These questions are beautiful and I am sure that will take us to beautiful and interesting mathematics, some already discovered, some new, some in between.

Have fun and let me know what you find!

Friday, June 9, 2017

Reimplementing Plan 9 dump, relaxing backups

Some time has passed now since I last used Plan 9 and there are many things I miss; the simplicity, understanding a whole system from the ground up... One of the things I used to miss most was the backup system. For the user it was amazingly simple. In /dump one could find a directory tree with a directory per year, with one per month inside an so on, with a snapshot of the filesystem at that point in time.
This happened automatically without intervention of the user. It enabled me to relax and focus on the job (whatever that was).
While I have used similar features in other systems, they were either too complex (and thus not trustworthy) or I didn't fully understand them, or they required the intervention of the user or they failed too often or a combination of all of them.

The first answer I get when I tell some developer this is "you should be using Git".
Of course, I know of the existence of Git, (and other VCSs), and use it but there are three main problems I find with them as backup systems. One is they are not trivial to navigate so when I am doing something urgent, they take cycles of whatever I am doing. Another is that when you are working with things which are not code, like images for presentations, video... there are problems with binary diffs. Finally, they require intervention of the user, you have to do a push everytime you edit a file. Of course, you could do it periodically, but there are still other factors to consider like simplicity. Adding a layer over git to make it into a dump filesystem is too complex which, again, makes it less trustworthy. Git is nice when you have a distributed team of developers and the complexity pays off. Backups are an entirely different problem.

The most important lessons I learnt from Plan 9's dump is that backups should happen without you doing anything and you should be able to access and navigate them instantly and without fuss. This makes you check without noticing that the backup is online and working because you are using it as part of your daily work, for example, by checking the history of a file edits or modifications. All of that without giving it any second thought.

Another requirement I learnt the hard way, and which is also important, is for the backup system to be as simple as possible, specially in the part that takes the snapshots, as my data and part of my sanity are going to depend on it. If I can't understand completely the format in which the backups are stored, I end up not trusting the backup system. If parts of the backups are lost, I should be able to understand and piece together whatever is left.

While I was thinking about this, I found this post which shows that rsync is the ideal candidate to make the snapshots. It is a battle tested command, where it is going to be quite difficult to find essential bugs, and has support to make the snapshots efficiently, by making hard links to the files that have not changed. I make my snapshots hourly, and for this time period, not making the dumps too redundant, is quite important. Also, there are quite a lot of corner cases in filesystems these days (ACLs, metadata, etc.) which I don't want to deal with or think about if possible. I let rsync take care of all that.

The script to make the snapshot is very simple and, of course, running it hourly in Unix is no problem (before I would have used cron, now systemd).

 Now I have a dump filesystem which I can bind or soft link (if you want to install the whole thing in Linux, it is explained here) to /dump or mount remotely with sshfs. This enables me to do things like

cd /dump/2017/0513/1410/MAIN/paurea/doc/howto

 to see that directory as it was in that date, like in Plan 9.

Not having almost any redundancy is a little troubling, because if a file is overwritten in the dump by someone, we may loose it. The clients mount it read-only, so it shouldn't happen,  exactly like all the other things the backup stops from happening which shouldn't be happening either. To prevent this, I checkpoint the whole filesystem with this script monthly. It creates a fresh copy without any backwards dependencies and doubles the use of space by the size of a working copy.
 To do that I use the cp(1) command which, by default, copies hard links as separate files.
I think of dumpme as compression and chkptme as controlled redundancy, two steps in channel coding (hopefully without catastrophic error propagation).

To navigate the dump comfortably, I reimplemented the yesterday(1) and history(1) commands from Plan 9 using go. They are complete reimplementations from scratch with the options I find useful.

These commands I called yest and hist, so now, you can run

$yest -y=1 /main/MAIN/paurea/doc/howto

to get the path of the file or directory in the dump a year ago. Some of the features of yesterday(1) like -c to copy yesterday's file are not there, I may add them later if I find them necessary.

The other command is hist, which will give you the history of the file or directory like history, including its diffs if it is a text file.

$ cd  /main/MAIN/paurea/doc/charlas/

$ hist -c slides.tex

#create    /dump/2017/0510/1605/MAIN/paurea/doc/charlas/ 19718 0777 2017-05-09 17:52:09 +0200 CEST  f
#wstat    /dump/2017/0510/1728/MAIN/paurea/doc/charlas/ 19718 0777 2017-05-10 16:55:52 +0200 CEST  f
#write    /dump/2017/0512/1851/MAIN/paurea/doc/charlas/ 34710 0777 2017-05-12 18:14:37 +0200 CEST  f
#write    /dump/2017/0518/1130/MAIN/paurea/doc/charlas/ 34844 0777 2017-05-17 16:08:08 +0200 CEST  f

Without the -c it returns the incremental diffs of the file. The diff of two files is implemented using diffmatchpatch, which returns a list of edits (insertions, deletions and matches), which I had to convert into line differences like the ones given by the command diff(1).

One feature I would like to implement in the future is to somehow mark the checkpoints in the filesystem and use some form of scrubbing or error detection with the different (supposedly equal) copies of the files.

Wednesday, June 29, 2016

Supersum? Subproduct?

When I was in school, like all the other kids, I was taught addition, multiplication and a little bit later exponentiation. There are several ways to think about all of them, some of them have been very fruitful in mathematics at large (like groups and geometry) and some others, less, like iterated composition (unless you think of arrow composition in category theory as abstracted composition), so many mathematicians don't think about them much.

My plan is to try to write this post with as few prerequisites as posible and to tell a story about some of my explorations in this area. This may trump rigour, precision and formalism. Everything I will write about here can be proven and made completely formal. I will also write the story as if it was presented to me coherently and clearly, which was, unfortunately not the case. There were many holes, which I filled in on my own later. Of course, also, my memory of it is diffuse after so much time.

The way addition was presented to me in school, was by counting.

$$ a + b = \underbrace{1+1+\cdots}_{a\ times}+ \underbrace{1+1+\cdots}_{b\ times}$$.

Later followed by tables to make calculations easier, of course.

Then they defined multiplication, which was more interesting,

$$ a * b = \underbrace{a+a+a\cdots}_{b\ times}=\underbrace{b+b+b\cdots}_{a\ times}$$.

again, followed by tables.

By looking at the definition, is not evident that the two ways to write the same multiplication, i.e. that $a*b = b*a$, are the same.
 We also used another equivalent definition which was geometric, appealing to the intuitive idea of area. This was done both discretely by arranging dots in a rectangle and counting them

  and continuously

 with the idea of area (or, for three elements and three dimensions, volume).

The area inside the rectangle is the multiplication of both sides, so the operation has to be commutative. If I remember correctly, this was both the definition of area and a kind of proof of commutativity (this was at school, as I said, the exposition was not very precise).

We learnt about all the properties of both operations. Even if at the time we didn't know the name, we learnt that addition and multiplication without the 0 are Abelian groups:
  1. Associative, $(a+b)+c = a+(b+c)$ and $(a*b)*c = a*(b*c)$.
  2. Commutative, because the groups are Abelian, $a+b = b+a$ and $a*b = b*a$.
  3. Identity element a+0 = a (for multiplication, $a*1 = a$).
  4. For each element there is an inverse $a-a = 0$, $a/a = 1$. We write the inverse of $a$ for addition $-a$ and the inverse of $a$ for multiplication $1/a$. $0$ would have no inverse for multiplication, which is why it is left out. It is also called the absorbing element because $a*0 = 0$ for all $a$.
  5. A group also needs to be closed, the operation with any two numbers cannot go out of the set.
We have one more property, which relates addition and multiplication, the distributive property
$$a*(b+c) = a*b + a*c.$$
 This property is quite intuitive if we draw the operation.

We can use the distributive property to define negative products,
$$a*(b-b) = 0 = a*b + a*(-b),$$
$$a*(-b)  = -(a*b)$$
and so on.
 A little later, exponentiation was defined, which also has a definition as an iterative composition
$$ a ^ b = \underbrace{a*a*a\cdots}_{b\ times},$$
where $a$ is called base and $b$ is the exponent.
The first surprise is that the exponentiation is not commutative, so $a^b \neq b^a$ in general.
We learnt about many properties of exponentiation, but I will center myself in two,
$$a^{b+c} = a^b*a^c $$
$$e^{b\ ln(a)} = a^b$$
where $e$ is a special number, the Euler number, $2.7182818284 \ldots$ and
$ ln(a) $ is the inverse of the exponential operation, the natural logarithm, the logarithm with base $e$, i.e. the solution to $e^x = a$.
Note that I have jumped from the definition of exponentiation by iterated composition (which only works for integer numbers) to something which can be defined for real numbers. Real numbers include fractions, like $0.5$ and numbers with infinite non-repeating digits after the decimal point like $e$ or $\pi$. We spent a year going from one to the other, and this was the first time I was exposed to an interesting proof and a generalization.
With the iterated composition definition, it is easy to find the result with a fractional base, for example
$$0.5^3 = 0.5*0.5*0.5=0.125$$.
The problem appears when we want to use a fractional exponent,
$$3^{0.5} = ??? $$
How do we apply the operation $0.5$ times? The problem here is that the definition is inherently asymmetric (which is why it is surprising that the multiplication, which is defined similarly is commutative).
The approach used to fix the problem is very interesting. We can use the property above,
$a^{b+c} = a^b * a^c$ recursively to find
$(a^b)^c = a^{b*c}$.
We need one more ingredient, negative exponents, but
$$a^{0} = a^{1-1} = a^{1}*a^{-1}$$
and $a^{0} = 1$ so $a^{-1} = mult\_inverse(a)$.
 The inverse element can be used to define the division, $a*mult\_inverse(b) = a/b$ which is the inverse operation of the multiplication.

Why is $a^0 = 1$? This is an intriguing question. It is not evident what applying the operation $0$ times should yield $1$. To see why, we can use again $a^{b+c} = a^b * a^c$,
when $c = 0$, $a^{b+0} = a^b = a^b * 1$. So for everything to work, the identity of multiplication
has to be the result of exponentiating by the identity of addition.

We can finally find the value when the exponents are fractions by solving the equation $a = (a^{1/b})^b$. For example, if we are looking for $a^{0.5}$, we only need to find a number that multiplied by itself is $a$. For more general fractions, we can do it by parts, $a^{b/c} =(a^b)^{1/c}$.
We already know how to do it with negative exponents, and they can be combined. For more general exponents, i.e. real numbers, we can approximate them as much as we want by greater and greater $b$ and $c$ in the fractions $\frac{b}{c}$ (more generally using a limit). What we are doing here is finding the completion of the rational numbers (fractions) which are the real numbers. This is a theme that appears a lot in mathematics.
 All this, gives us a definition, but it is not very satisfactory for calculation. To do that, we normally use more powerful tools; infinite sums, which in math are called series.
I am not going to derive them here, but I will write them down anyway, because they will appear later.

First lets define the exponential function as $f(x) = e^x$. This means that to obtain the value of the function, you plug an integer value $x=n$ in the exponent and you get a value. To do that, you can use the definition of both $e$, which is the number above (technically, we have to play with derivatives, series and other more advanced tools to define it) and apply the iterative composition definition. For example
$f(3) = e*e*e$. For fractional or real exponents, you use the approach described above.
We can then define the inverse function of $e^x$, which we have already seen, the natural logarithm which by definition by $ln(e^x) = x$ and conversely
$e^{ln(x)} = x$.

Euler discovered that the exponential can be written as an infinite series (infinite sum)
$$e^x = 1 + x+ \frac{x^2}{1*2} +\frac{x^3}{1*2*3} + \frac{x^4}{1*2*3*4}+\ldots$$.
We can write $1*2*3*4*5*\cdots n = n! $, which is called the factorial.
There is a similar series for the logarithm,
$$ln(x) = (x-1) - \frac{(x-1)^2}{2} + \frac{(x-1)^3}{3} - \frac{(x-1)^4}{4}\ldots$$

Even though the series are infinite, the terms keep getting smaller so, by picking a reasonable
number of terms, we get a good enough approximation of the value we are looking for.

We can use the fact that the logarithm is the inverse of the exponential,
$x^y = (e^{ln(x)})^y= e^{ln(x)*y}$,
and the series to calculate exponentials for any base values. We can start with the series and this is an alternative but equivalent approach to define exponentials for any values.

So,  after this rather long introduction, we are in the position to understand what my frame of mind was when I asked my highschool teachers the following two questions:
  1. Does it make sense to define operations by iterated composition after the exponentiation? What are their properties?
  2. We have defined a kind of fractional composition for the exponentiation and multiplication (multiplication by a fractional number and a fractional exponent respectively), can we do the same in general and find an operation between addition and multiplication?
 The first question was the simplest and was the first one that drew my attention. Of course, in the intellectual wasteland that is (or at least used to be) secondary math education in Spain, my teachers didn't help. When I asked them any questions about material not in the syllabus their eyes would glaze over or I would get told off for being weird. Also, there was no internet because it didn't exist yet, so I had no way of knowing if what I was doing had been previously done. More concretely, I continued iterating the exponentiation to get a new operation,
$$\underbrace{a^{a^{a^{a\ldots}}}}_{b\ times}.$$
The operation can be interpreted in two ways, one which is actually a new operation the other which is not.

The first way to parenthesize,
$$\underbrace{{(({a^a})^a})^a\ldots}_{b\ times}=a^{a^{b-1}},$$
is not a new operation, whereas
$$\underbrace{a^{(a^{(a^{(a\ldots)})})}}_{b\ times}$$
is actually a new operation. We can continue defining an infinite sequence of operations by continuing to iterate further each of them.
I wrote down all the properties I found. Later, the internet appeared and I discovered that
my operation was actually called tetration and the following ones are hyperoperations.
So I was late for these discoveries, but nevertheless, I got the pleasure of finding them myself.
There is a lot of work to do in this area still, but this post is about the second question.

The second question was much more difficult to answer. To make it more precise,
if we index the operations using a variant of Knuth notation, where there is an arrow to indicate the operation and a superindex to indicate which hyperoperation we are talking about:
$$a*b = a\uparrow^0 b$$
$$a^b = a\uparrow^1 b$$
Then, what would $a\uparrow^{0.5}b$ be?

I tried different strategies without success.
My first strategy, was to try to generalize iterated function composition. To go from addition to multiplication, we are actually compositing a function many times. We start with
$$f(x, y) = x + y,$$
and then
$$x*y = f(x, f(x, f(x, \ldots))\ldots).$$
Then we iterate for exponentiation and so on.

If we could iterate, say, $0.5$ times, then, we could define an operation between the sum and the multiplication. I will be calling it supersum for now, but I don't know what the right name for it is, hence the title of the post.
There are various places in mathematics where the concept of fractional composition in some sense appears including multiplication $a*0.5$ and exponentiation $a^{0.5}$. The most notable of it is in linear tranformations. One can define the root of a matrix, either by means of series or by diagonalization and multiplying by this matrix represents fractional application of the linear transformation.
Could we turn our problem into one of fractional compositon?
Trying to apply this approach to our problem in an elegant way is very difficult, at least for me. I got stuck for a long time.
I tried various other strategies, also without success, which I don't even remember.
Then, at some point, while reading about Lie groups, I had my first break.
I had the idea of using the exponential map of Lie groups. The approach is simple. We can use an exponential function to map addition to multiplication
$$a^{(b+c)} = a^b * a^c.$$
I only had to find an exponential for the multiplication $r(x)$ for which the analogous formula would need to hold,
$$r(a*b) = r(a) *r(b).$$ Then I could interpolate somehow between both exponentials.
This does not define uniquely the function we are looking for, for example, the identity would fit the bill.
I had to get the inspiration elsewhere; the Mellin transform. I won't delve into depth here, but the Mellin transform is invariant under dilation (multiplication), whereas the Fourier transform is invariant under translation (addition). We can use the functions found in the Mellin transform and they will be the the exponentials we seek for. The new exponential for multiplication is then $r(x) = 1/x$. Now we need a map which varies from one to the other. It is not going to be continuous because one is not defined in $0$, so we have to "break" the exponential to convert it into $r(x)$.
After a lot of time staring into these functions, I finally got my second break. Enter the Mittag-Leffler function, which plays the role of the exponentials for fractional derivatives.

$$E_\alpha(x) = 1 + \frac{x}{\Gamma(\alpha +1)} + \frac{x^2}{\Gamma(2*\alpha +1)}\ldots$$
The gamma function, $\Gamma(\alpha)$, also discovered by Euler, is a continuous generalization of the factorial, indeed if $n$ is a positive integer number,
$\Gamma(n) = (n-1)!$.
For Mittag-Leffler functions, when $\alpha = 1$, we have $E_1(x) = e^x$. When $\alpha = 0$, the function is $E_0(x) = 1/(1-x)$.
$E_0$ is not the $r(x)$ we where looking for, but it is close enough. We can use the function to define the supersum, when $0\leq \alpha \leq 1$,
$$a\circledast_{\alpha} b = E_{\alpha}^{-1}(E_{\alpha}(a)*E_{\alpha}(b))$$
where $E_{\alpha}^{-1}(x)$ is the inverse of $E_{\alpha}(x)$. We leave the proof that it meets all the properties to the reader (it is in the paper linked at the end of the post).
 It has the particular cases
$$a\circledast_1b = a+b = ln(e^a*e^b) $$
$$a\circledast_0b = a+b-a*b = c$$
which can be rewritten
$$c-1 = -(x-1)*(y-1),$$
which is a form of multiplication (reversed and shifted, which can be fixed, but it is essentially a multiplication). It gets better if you use $E_{\alpha}(-x)$ and you can shift the neutral element,
but I will leave that for another post. You may want to see an interesting (introductory) video related to this ideas.
There are many ways to extend and apply this new group and I am still sieving through the results. For example, we can use it to define a transform (this is what abstract harmonic analysis is about, read more here), but again, this is a story for another post.
I have informally called this group (technicaly the Lie group determined by the exponential map defined by the Mittag-Leffler functions applied on the multiplication) supersum.

What would you name it?

For more details on all this, see the paper I wrote. It is an Accepted Manuscript of an article published by Taylor & Francis in Integral Transforms and Special Functions and will be available available online: