Wednesday, June 29, 2016

Supersum? Subproduct?

When I was in school, like all the other kids, I was taught addition, multiplication and a little bit later exponentiation. There are several ways to think about all of them, some of them have been very fruitful in mathematics at large (like groups and geometry) and some others, less, like iterated composition (unless you think of arrow composition in category theory as abstracted composition), so many mathematicians don't think about them much.

My plan is to try to write this post with as few prerequisites as posible and to tell a story about some of my explorations in this area. This may trump rigour, precision and formalism. Everything I will write about here can be proven and made completely formal. I will also write the story as if it was presented to me coherently and clearly, which was, unfortunately not the case. There were many holes, which I filled in on my own later. Of course, also, my memory of it is diffuse after so much time.

The way addition was presented to me in school, was by counting.

$$a + b = \underbrace{1+1+\cdots}_{a\ times}+ \underbrace{1+1+\cdots}_{b\ times}$$.

Later followed by tables to make calculations easier, of course.

Then they defined multiplication, which was more interesting,

$$a * b = \underbrace{a+a+a\cdots}_{b\ times}=\underbrace{b+b+b\cdots}_{a\ times}$$.

again, followed by tables.

By looking at the definition, is not evident that the two ways to write the same multiplication, i.e. that $a*b = b*a$, are the same.
We also used another equivalent definition which was geometric, appealing to the intuitive idea of area. This was done both discretely by arranging dots in a rectangle and counting them

and continuously

with the idea of area (or, for three elements and three dimensions, volume).

The area inside the rectangle is the multiplication of both sides, so the operation has to be commutative. If I remember correctly, this was both the definition of area and a kind of proof of commutativity (this was at school, as I said, the exposition was not very precise).

We learnt about all the properties of both operations. Even if at the time we didn't know the name, we learnt that addition and multiplication without the 0 are Abelian groups:
1. Associative, $(a+b)+c = a+(b+c)$ and $(a*b)*c = a*(b*c)$.
2. Commutative, because the groups are Abelian, $a+b = b+a$ and $a*b = b*a$.
3. Identity element a+0 = a (for multiplication, $a*1 = a$).
4. For each element there is an inverse $a-a = 0$, $a/a = 1$. We write the inverse of $a$ for addition $-a$ and the inverse of $a$ for multiplication $1/a$. $0$ would have no inverse for multiplication, which is why it is left out. It is also called the absorbing element because $a*0 = 0$ for all $a$.
5. A group also needs to be closed, the operation with any two numbers cannot go out of the set.
We have one more property, which relates addition and multiplication, the distributive property
$$a*(b+c) = a*b + a*c.$$
This property is quite intuitive if we draw the operation.

We can use the distributive property to define negative products,
$$a*(b-b) = 0 = a*b + a*(-b),$$
so
$$a*(-b) = -(a*b)$$
and so on.
A little later, exponentiation was defined, which also has a definition as an iterative composition
$$a ^ b = \underbrace{a*a*a\cdots}_{b\ times},$$
where $a$ is called base and $b$ is the exponent.
The first surprise is that the exponentiation is not commutative, so $a^b \neq b^a$ in general.
We learnt about many properties of exponentiation, but I will center myself in two,
$$a^{b+c} = a^b*a^c$$
and
$$e^{b\ ln(a)} = a^b$$
where $e$ is a special number, the Euler number, $2.7182818284 \ldots$ and
$ln(a)$ is the inverse of the exponential operation, the natural logarithm, the logarithm with base $e$, i.e. the solution to $e^x = a$.
Note that I have jumped from the definition of exponentiation by iterated composition (which only works for integer numbers) to something which can be defined for real numbers. Real numbers include fractions, like $0.5$ and numbers with infinite non-repeating digits after the decimal point like $e$ or $\pi$. We spent a year going from one to the other, and this was the first time I was exposed to an interesting proof and a generalization.
With the iterated composition definition, it is easy to find the result with a fractional base, for example
$$0.5^3 = 0.5*0.5*0.5=0.125$$.
The problem appears when we want to use a fractional exponent,
$$3^{0.5} = ???$$
How do we apply the operation $0.5$ times? The problem here is that the definition is inherently asymmetric (which is why it is surprising that the multiplication, which is defined similarly is commutative).
The approach used to fix the problem is very interesting. We can use the property above,
$a^{b+c} = a^b * a^c$ recursively to find
$(a^b)^c = a^{b*c}$.
We need one more ingredient, negative exponents, but
$$a^{0} = a^{1-1} = a^{1}*a^{-1}$$
and $a^{0} = 1$ so $a^{-1} = mult\_inverse(a)$.
The inverse element can be used to define the division, $a*mult\_inverse(b) = a/b$ which is the inverse operation of the multiplication.

Why is $a^0 = 1$? This is an intriguing question. It is not evident what applying the operation $0$ times should yield $1$. To see why, we can use again $a^{b+c} = a^b * a^c$,
when $c = 0$, $a^{b+0} = a^b = a^b * 1$. So for everything to work, the identity of multiplication
has to be the result of exponentiating by the identity of addition.

We can finally find the value when the exponents are fractions by solving the equation $a = (a^{1/b})^b$. For example, if we are looking for $a^{0.5}$, we only need to find a number that multiplied by itself is $a$. For more general fractions, we can do it by parts, $a^{b/c} =(a^b)^{1/c}$.
We already know how to do it with negative exponents, and they can be combined. For more general exponents, i.e. real numbers, we can approximate them as much as we want by greater and greater $b$ and $c$ in the fractions $\frac{b}{c}$ (more generally using a limit). What we are doing here is finding the completion of the rational numbers (fractions) which are the real numbers. This is a theme that appears a lot in mathematics.
All this, gives us a definition, but it is not very satisfactory for calculation. To do that, we normally use more powerful tools; infinite sums, which in math are called series.
I am not going to derive them here, but I will write them down anyway, because they will appear later.

First lets define the exponential function as $f(x) = e^x$. This means that to obtain the value of the function, you plug an integer value $x=n$ in the exponent and you get a value. To do that, you can use the definition of both $e$, which is the number above (technically, we have to play with derivatives, series and other more advanced tools to define it) and apply the iterative composition definition. For example
$f(3) = e*e*e$. For fractional or real exponents, you use the approach described above.
We can then define the inverse function of $e^x$, which we have already seen, the natural logarithm which by definition by $ln(e^x) = x$ and conversely
$e^{ln(x)} = x$.

Euler discovered that the exponential can be written as an infinite series (infinite sum)
$$e^x = 1 + x+ \frac{x^2}{1*2} +\frac{x^3}{1*2*3} + \frac{x^4}{1*2*3*4}+\ldots$$.
We can write $1*2*3*4*5*\cdots n = n!$, which is called the factorial.
There is a similar series for the logarithm,
$$ln(x) = (x-1) - \frac{(x-1)^2}{2} + \frac{(x-1)^3}{3} - \frac{(x-1)^4}{4}\ldots$$

Even though the series are infinite, the terms keep getting smaller so, by picking a reasonable
number of terms, we get a good enough approximation of the value we are looking for.

We can use the fact that the logarithm is the inverse of the exponential,
$x^y = (e^{ln(x)})^y= e^{ln(x)*y}$,
and the series to calculate exponentials for any base values. We can start with the series and this is an alternative but equivalent approach to define exponentials for any values.

So,  after this rather long introduction, we are in the position to understand what my frame of mind was when I asked my highschool teachers the following two questions:
1. Does it make sense to define operations by iterated composition after the exponentiation? What are their properties?
2. We have defined a kind of fractional composition for the exponentiation and multiplication (multiplication by a fractional number and a fractional exponent respectively), can we do the same in general and find an operation between addition and multiplication?
The first question was the simplest and was the first one that drew my attention. Of course, in the intellectual wasteland that is (or at least used to be) secondary math education in Spain, my teachers didn't help. When I asked them any questions about material not in the syllabus their eyes would glaze over or I would get told off for being weird. Also, there was no internet because it didn't exist yet, so I had no way of knowing if what I was doing had been previously done. More concretely, I continued iterating the exponentiation to get a new operation,
$$\underbrace{a^{a^{a^{a\ldots}}}}_{b\ times}.$$
The operation can be interpreted in two ways, one which is actually a new operation the other which is not.

The first way to parenthesize,
$$\underbrace{{(({a^a})^a})^a\ldots}_{b\ times}=a^{a^{b-1}},$$
is not a new operation, whereas
$$\underbrace{a^{(a^{(a^{(a\ldots)})})}}_{b\ times}$$
is actually a new operation. We can continue defining an infinite sequence of operations by continuing to iterate further each of them.
I wrote down all the properties I found. Later, the internet appeared and I discovered that
my operation was actually called tetration and the following ones are hyperoperations.
So I was late for these discoveries, but nevertheless, I got the pleasure of finding them myself.
There is a lot of work to do in this area still, but this post is about the second question.

The second question was much more difficult to answer. To make it more precise,
if we index the operations using a variant of Knuth notation, where there is an arrow to indicate the operation and a superindex to indicate which hyperoperation we are talking about:
$$a*b = a\uparrow^0 b$$
$$a^b = a\uparrow^1 b$$
Then, what would $a\uparrow^{0.5}b$ be?

I tried different strategies without success.
My first strategy, was to try to generalize iterated function composition. To go from addition to multiplication, we are actually compositing a function many times. We start with
$$f(x, y) = x + y,$$
and then
$$x*y = f(x, f(x, f(x, \ldots))\ldots).$$
Then we iterate for exponentiation and so on.

If we could iterate, say, $0.5$ times, then, we could define an operation between the sum and the multiplication. I will be calling it supersum for now, but I don't know what the right name for it is, hence the title of the post.
There are various places in mathematics where the concept of fractional composition in some sense appears including multiplication $a*0.5$ and exponentiation $a^{0.5}$. The most notable of it is in linear tranformations. One can define the root of a matrix, either by means of series or by diagonalization and multiplying by this matrix represents fractional application of the linear transformation.
Could we turn our problem into one of fractional compositon?
Trying to apply this approach to our problem in an elegant way is very difficult, at least for me. I got stuck for a long time.
I tried various other strategies, also without success, which I don't even remember.
I had the idea of using the exponential map of Lie groups. The approach is simple. We can use an exponential function to map addition to multiplication
$$a^{(b+c)} = a^b * a^c.$$
I only had to find an exponential for the multiplication $r(x)$ for which the analogous formula would need to hold,
$$r(a*b) = r(a) *r(b).$$ Then I could interpolate somehow between both exponentials.
This does not define uniquely the function we are looking for, for example, the identity would fit the bill.
I had to get the inspiration elsewhere; the Mellin transform. I won't delve into depth here, but the Mellin transform is invariant under dilation (multiplication), whereas the Fourier transform is invariant under translation (addition). We can use the functions found in the Mellin transform and they will be the the exponentials we seek for. The new exponential for multiplication is then $r(x) = 1/x$. Now we need a map which varies from one to the other. It is not going to be continuous because one is not defined in $0$, so we have to "break" the exponential to convert it into $r(x)$.
After a lot of time staring into these functions, I finally got my second break. Enter the Mittag-Leffler function, which plays the role of the exponentials for fractional derivatives.

$$E_\alpha(x) = 1 + \frac{x}{\Gamma(\alpha +1)} + \frac{x^2}{\Gamma(2*\alpha +1)}\ldots$$
The gamma function, $\Gamma(\alpha)$, also discovered by Euler, is a continuous generalization of the factorial, indeed if $n$ is a positive integer number,
$\Gamma(n) = (n-1)!$.
For Mittag-Leffler functions, when $\alpha = 1$, we have $E_1(x) = e^x$. When $\alpha = 0$, the function is $E_0(x) = 1/(1-x)$.
$E_0$ is not the $r(x)$ we where looking for, but it is close enough. We can use the function to define the supersum, when $0\leq \alpha \leq 1$,
$$a\circledast_{\alpha} b = E_{\alpha}^{-1}(E_{\alpha}(a)*E_{\alpha}(b))$$
where $E_{\alpha}^{-1}(x)$ is the inverse of $E_{\alpha}(x)$. We leave the proof that it meets all the properties to the reader (it is in the paper linked at the end of the post).
It has the particular cases
$$a\circledast_1b = a+b = ln(e^a*e^b)$$
and
$$a\circledast_0b = a+b-a*b = c$$
which can be rewritten
$$c-1 = -(x-1)*(y-1),$$
which is a form of multiplication (reversed and shifted, which can be fixed, but it is essentially a multiplication). It gets better if you use $E_{\alpha}(-x)$ and you can shift the neutral element,
but I will leave that for another post. You may want to see an interesting (introductory) video related to this ideas.
There are many ways to extend and apply this new group and I am still sieving through the results. For example, we can use it to define a transform (this is what abstract harmonic analysis is about, read more here), but again, this is a story for another post.
I have informally called this group (technicaly the Lie group determined by the exponential map defined by the Mittag-Leffler functions applied on the multiplication) supersum.

What would you name it?

For more details on all this, see the paper I wrote. It is an Accepted Manuscript of an article published by Taylor & Francis in Integral Transforms and Special Functions and will be available available online: http://www.tandfonline.com/doi/full/10.1080/10652469.2016.1267730.

1. This is great! Conway once told me that he and a friend were working on fractional tetration (i.e., a ^^ 0.5 and the like, for ^^ double-arrow) at some point back in the early '90s, and they found two workable ideas: one from imagining "smooshing all the numbers way down, calculating it, and blowing them back up," and one from doing the opposite (to paraphrase him).

They wanted to check whether the two operations yielded the same result, so they had a computer do a calculation: and the answer they got was the same to the twenty-fifth digit and then deviated from there.

Shortly thereafter, they found out that some other team of two had already done all this back in the late '70s or something, but hadn't published; so they figured their work was unoriginal, and didn't publish either.

The wikipedia page for tetration has some good ideas on it these days, but proves pretty clearly that there's no unique qualitatively "best" idea for what the definition of fractional tetration should be: it depends which properties you value more.

–––––

Recently I was helping a friend through Herstein, and we came across this unsung hero of a group: the rationals delete -1, under the operation a * b = a + b + ab.

Who knew? I'm puzzling over the best geometric way to account for this group. Your circle-star-zero looks so similar.

–––––

Possibly related, possibly unrelated: what's the best way to arrive at the Lorentz transform, or the velocity addition formula (a+b)/(1+ab/c^2) from first principles? Any account of it I've seen goes like this: assume the Lorentz transform. Then we find out all these neat things hold. Can we do it the other way around? Bonus points if you don't use wavefunctions ψ(x-ct).

1. The most principled way is to say: a Lorentz transformation is a linear transformation of four-dimensional space that leaves invariant the Minkowski norm x^2+y^2+z^2-t^2. This is the natural generalization of orthogonal transformations (rotations and reflections) to norms that are not positive-definite. (In fact, spatial rotations and reflections are special cases of Lorentz transformations by this definition.) You can also come up with more physically-motivated definitions by requiring the invariance of the speed of light (and linearity).

2. The operation
$$a\circledast_\alpha b=a+b+ab$$

is what you get if you use $E_\alpha (−x)$ to define the group.
I should have defined this for the paper because it is cleaner, but I discovered that $E_\alpha(−x)$
also works too late.
It is just $a\circledast \alpha b+1=(a+1)(b+1)$. It is a shifted multiplication with -1 being the absorbing element.
In geometrical terms I think of this group as defining dilations/contractions and a dilation has a fixed point, which in terms of the group is the absorbing element of the group, which is why -1 is singled out. The absorbing element is self contained, so you can take it out and the group is still a group.

–––––
The way I have always seen the Lorentz transformation obtained (I think it was Poincaré who did it first) is as the group of transformations which keep the Maxwell equations invariant.
Another, more geometrical way is just to write a coordinate transformation in Minkowski space (rotations, boosts, etc...). Minkowski space is hyperbolic, so there is an exponential map (in Riemman geometry terms) relating the tangent bundle and the manifold.
I have come across some subgroups of the Lorentz group (boosts, rotations) with remarkable similarities to some of the groups I found in the paper, so it is quite possibly related :-).

1. Sorry for reposting the response many times, but apparently preview does not interact well with the math plugin, so I had to remove and repost the comment three times (there is no edit button for comments either...).

3. With respect to tetration, I feel there is something missing in the wikipedia ideas (which are forms of interpolation). I think there has to be a canonical way to do tetration for the reals. I have been playing with the Polygamma, but it is too early to see if I would get anything out of it. The smooshing and blowing up the numbers you refer to, reminds me of non-newtonian calculus, which is also related to the approach I took.

Do you have any link to whatever Conway did/referred to?