The expected value of a Mega Millions ticket

As I type this post, the Mega Millions jackpot projects to be $500,000,000 (annuity value – after-tax cash value is more like $220,000,000). A single ticket costs $1. Is it worth it?

By “worth it,” I mean that the expected value of a ticket is greater than zero. The expected value of a random variable \(W\) which represents the event of winning the Mega Millions lottery with the ticket is equal to, roughly, (Probability of winning)*(Value of winning) + (Probability of losing)*(value of losing). In shorter math notation, this looks like:


Here, \(E[W]\) is the expected value of \(W\), \(P[W]\) is the probability of \(W\), and therefore \(1-P[W]\) is the probability of NOT \(W\). Also, \(V\) is the value of winning – in this case, $500,000,000 or $220,000,000, and \(C\) is the value of losing: -$1.

What is this probability? The players are asked to pick 5 balls from a first set of 56, then 1 ball from a second set of 46, the first set and second sets being independent. The number of possibilities is \(46 \binom {56}{5} = 175,711,536\). The probability \(P[W]\) is then the reciprocal of this number.

Although this probability is indeed small, it would seem that a sufficiently large jackpot would make buying a ticket worth it. And, indeed, even with the after-tax cash value, the expected value is greater than zero:

\(E[W]=\displaystyle \frac{220,000,000}{175,711,536} – \displaystyle \frac{175,711,535}{175,711,536} \approx 0.25\).

This could be taken as an encouragement to purchase Mega Millions tickets; after all, you can expect about $0.25 of value for each ticket you buy. What a bargain! With a return on investment like that, why don’t investment houses just put their client’s money into Mega Millions tickets?

The reason is that return is wrong and based on a false assumption: no matter how many tickets got the correct number, the payout is the same. In fact, the payout is split evenly among the winners. This will of course lower the expected value of a ticket. But by how much?

Consider a population of \(N\) tickets, where yours is among the \(k\) winners. (Mazeltov.) That means there are \(k-1\) winners besides yourself among the rest of the remaining \(N-1\) tickets. What is the probability that there are \(k-1\) winners among \(N-1\) tickets?

We consider each event of a ticket being a winning ticket being independent from any other ticket being a winning or losing ticket. (This is clearly the case.) The probability of a particular set of \(k-1\) winners among the \(N-1\) tickets is then \(p^{k-1} (1-p)^{(N-1)-(k-1)}\). That is, when there are \(k-1\) winners, there are also \((N-1) – (k-1) = N – k\) losers.

That said, there are also many different ways to arrange the other \(k-1\) winners among the remaining \(N-1\) tickets. To be precise, there are \(\binom {N-1}{k-1}\) different arrangements of winners and losers, on top of you as a winner. (Not literally!) I imagine that my friends literate in statistics recognize the random variable \(W\) as having a binomial distribution.

Note that, when there are \(k\) winners, the jackpot is split evenly among the winners; that is, each winner gets \(V/k\), and not \(V\). The expected value of a ticket is then a sum over all possible values of \(k\):

\(E[W]= \displaystyle V \sum_{k=1}^{N} \frac{1}{k} \binom{N-1}{k-1} p^{k} (1-p)^{N-k} – C (1-p)\),

where \(p = P[W]\) as defined above. Note that there is an extra factor of \(p\) coming from your own ticket.

The above sum is very difficult to evaluate numerically for \(N \approx 100,000,000\), and approximations to normal or Poisson distributions do not apply. However, we can observe that we are comparing the value of the above sum to \(p\). We can see that the value of the sum is less than \(p\) because of the fact, well-known from binomial distributions, that

\(\displaystyle \sum_{k=1}^{N} \binom{N-1}{k-1} p^{k} (1-p)^{N-k} = p\).


\(\displaystyle \sum_{k=1}^{N} \frac{1}{k} \binom{N-1}{k-1} p^{k} (1-p)^{N-k} < p[/latex], because the factor [latex,size="-2"]\frac{1}{k} < 1[/latex]. In fact, a rough calculation (it turns out you can neglect anything beyond the 6th term) gives a value of about 0.374 for the above sum. Again, compare this to 1 using the crude (and incorrect) estimate. So, in multiplying the first term in the first expected value equation by about 0.374, we get an expected value of about -$0.53. That is, rather than gaining 25 cents, you should expect to lose about 47 cents for every dollar you spend on a Mega Millions ticket. Then again, we're all kind of suckers for that galactically small chance we could win, and I don't blame any of you for throwing away a few pennies chasing the dream. Update (3/31/12): I need to correct an assertion I made, and some numbers. The conclusion stands - in fact, in light of what has happened over the past 24 hours, the conclusion is even more stark. First of all, I posted some incorrect numbers that I have since corrected. For the above values of the number of tickets in circulation and probability of winning: [latex]\displaystyle \sum_{k=1}^{N} \frac{1}{k} \binom{N-1}{k-1} p^{k-1} (1-p)^{N-k} \approx 0.763[/latex]. This is the reduction factor on the jackpot, not 0.374 as I published before. The expected value of a ticket is then about -$0.05. Again, still a loser. That said, it turns out that 100,000,000 tickets was a gross underestimate; rather, 1,500,000,000 tickets were sold! For this number, the reduction factor is about 0.117, which gives an expected value of a ticket as -$0.85, an even worse value than my incorrect previous numbers show. Second, although I was right in asserting that the multiple winners are not governed by a Poisson distribution (a great explanation is here), I was incorrect in ignoring the Law of Rare Events, which states that, under a certain limiting behavior, the binomial sum approaches the Poisson sum that results from assuming a Poisson distribution. Further, the limiting behavior need not be rigorously enforced: a large enough sample and a small enough probability does the trick.

The mathematical statement of the Law of Rare Events in this context is

[latex]\displaystyle \sum_{k=1}^{N} \frac{1}{k} \binom{N-1}{k-1} p^{k-1} (1-p)^{N-k} \approx \sum_{k=1}^{\infty} \frac{(N p)^{k-1}}{k!} e^{-N p} = \frac{1 – e^{-N p}}{N p}\).

It turns out that this is a very good approximation out to many decimal places. So, a very simple formula for the expected value of a Mega Millions ticket is

\(\displaystyle E[W] = V \frac{1 – e^{-N p}}{N} – C (1-p)\).

Computing square roots

Let’s say you want to take the square root of a real number \(a\) without a computer. How would you do it? How do you think a computer does it?

The only way I know any computer performs square roots practically is via the following recurrence:

\(s_1=1; s_n=\displaystyle \frac{1}{2} \left ( s_{n-1} + \frac{a}{s_{n-1}} \right )\),


\(\sqrt{a}=\displaystyle \lim_{n\to\infty}s_n\).

The recurrence derives from Newton’s Method of finding roots, as applied to the function \(f(x)=x^2-a\). But that is not the point; the point is the recurrence and how fast it converges to its goal. Typically, roots found via Newton’s Method exhibit quadratic convergence; that is, the error in an iteration is the square of the error of the previous iteration. It turns out that the above recurrence has an exact solution, and from this solution we can closely examine the convergence toward the above limit.

The way to see this solution is to set \(s_n=\sqrt{a} \coth{\theta_n}\), where the hyperbolic cotangent is

\(\coth{x}=\displaystyle \frac{e^{x}+e^{-x}}{e^{x}-e^{-x}}\).

The hyperbolic cotangent satisfies a doubling formula:

\(\coth{2 x}=\displaystyle \frac{1}{2} \left ( \coth{x} + \frac{1}{\coth{x}} \right )\).

The above recurrence then takes the simplified form

\(\theta_{1}=\tanh^{-1}{\sqrt{a}}; \theta_{n}=2 \theta_{n-1}\).

The solution to the original recurrence then easily follows:

\(s_n=\sqrt{a} \coth{ \left ( 2^{n-1} \tanh^{-1}{\sqrt{a}} \right ) }\).

One slight complication: for \(a>1\), \(\tanh^{-1}{\sqrt{a}}\) is a complex number with imaginary part = \(\frac{\pi}{2}\). Because the recurrence involves a doubling of the argument, the imaginary part has no effect on the result. That said, it is more direct to write the result as

\(s_n=\sqrt{a} \coth{ \left ( 2^{n-1} \Re \left [ \tanh^{-1}{\sqrt{a}} \right ] \right ) }\),

where \(\Re \left [ z \right ]\) denotes the real part of \(z\).

On the surface, it seems silly to express this solution to the recurrence in terms of the limit that it approximates. That said, the goal in deriving this solution was to examine how it converges to the limit. Along these lines, consider the following approximation, valid for large arguments:

\(\coth{x} \approx 1+2 e^{-2 x}\).

The error at later stages of the recurrence is then about

\(\displaystyle \left | \frac{s_n}{\sqrt{a}} – 1 \right | \approx 2 \times 10^{- \left (\log_{10}{e} \right ) \left ( \Re \left [ \tanh^{-1}{\sqrt{a}} \right ] \right ) 2^{n}} \).

For each increment in \(n\), the error is the square of the previous error, as I mentioned above being a characteristic of root finding via Newton’s Method. The solution allows us to be even more specific. Because \(\log_{10}{e} \approx 0.4343\) and \(\Re \left [ \tanh^{-1}{\sqrt{a}} \right ] \approx 1\) for most values of \(a>1\), each iteration supplies slightly less than \(2^n\) decimal places of accuracy.

Note that this analysis only applies to square roots of real numbers. For complex square roots, the initial guess in Newton’s Method must be complex, and the solution of the recurrence is more complicated.

Hungry Goats

Problem: A goat is tied to the edge of a circular plot of grass by a length of rope. How long should the rope be so that the goat eats exactly half of the grass?

Solution: Let the radius of the plot of grass be \(R\) and the length of the rope be \(L\). The center of the circular plot is \(O\) and the goat is tied to the edge of the plot boundary at \(C\).

Note that the area representing the grass that the goat eats is the intersection of two offset circles: one being the plot of grass (green), the other defined by the area the goat can move given that it’s tied where it is (red). Let the points of intersection of the circles be \(A\) and \(B\).

This area of intersection looks difficult at first, but it is really two circular segments: one for the green circle, and one for the red. A circular segment is the area between a chord of a circle and the arc it bounds. The green segment lies between arc \(\widehat{ACB}\) and line \(\overline{AB}\). The area of this segment is the difference between the area of the sector bounded by the arc \(\widehat{ACB}\) and the lines \(\overline{OA}\) and \(\overline{OB}\), and the triangle \(\bigtriangleup{AOB}\).

Let the angle subtended by the arc \(\widehat{ACB}\) be \(2 \phi\). The area of the sector \(A_{GS}\) is given by \(A_{GS}=\frac{1}{2} R^2 (2 \phi)\), and the area \(A_{GT}\) of triangle \(\bigtriangleup{AOB}\) is given by \(A_{GT}=\frac{1}{2} R^2 \sin{2 \phi}\). The area of the green segment \(A_G\) is then \(A_G=\frac{1}{2} R^2 \left (2 \phi – \sin{2 \phi} \right )\).

Results are similar for the red segment. If the angle subtended by the arc \(\widehat{AOB}\) is \(2 \theta\), then the area of the red segment \(A_R\) is \(A_R=\frac{1}{2} L^2 \left (2 \theta- \sin{2 \theta} \right )\). The area of the grass eaten by the goat is the sum of the areas of the red segment and the green segment, \(A_R + A_G\).

This area depends on four parameters, \(R\), \(L\), \(\phi\), and \(\theta\). Of these, \(R\) is given, and \(L\) is what we are tasked to find in terms of \(R\). This leaves us to find the other two parameters in terms of \(R\) and \(L\).

We do this by noting that triangle \(\bigtriangleup{AOC}\) is isosceles. From this triangle, we see that \(L=2 R \sin{\frac{\phi}{2}}\) and \(2 \theta + \phi = \pi\). These two relations will allow us to get a single equation relating \(L\) to \(R\).

We begin by expressing the condition that the area the goat eats, \(A_R + A_G\), is one half of the area of the circular plot, \(\pi R^2\):

\(A_R + A_G = \frac{1}{2} R^2 \left (2 \phi – \sin{2 \phi} \right ) + \frac{1}{2} L^2 \left (2 \theta- \sin{2 \theta} \right ) = \frac{\pi}{2} R^2\).

Let \(\beta = \frac{L}{R}\). Then \(\phi = 2 \arcsin{\frac{\beta}{2}}\) and \(\theta = \frac{\pi}{2} – \arcsin{\frac{\beta}{2}}\). Diving both sides of the area equation above by \(R^2\), and plugging in the above relations, we get the following equation for \(\beta\):

\(4 \left ( 1 – \frac{\beta^2}{2} \right ) \arcsin{\frac{\beta}{2}} – 2 \beta \sqrt{1 – \frac{\beta^2}{4}} + \pi \left ( \beta^2 – 1 \right ) = 0\).

(Yes, I combined a lot of steps here, including some non-trivial trig simplifications. This stuff is really better off left to the reader.)

This equation cannot be solved in analytical closed form. I think this is what is unexpected from a cursory inspection of the problem. So, you need some tool to solve it. Fortunately, Wolfram Alpha will solve it just nicely for free. Go to the site and type in the following string into the box: “findroot[ 4 (1 – b^2/2) ArcSin[b/2] – 2 b Sqrt[1 – b^2/4] + Pi (b^2 – 1) , {b,1}]”. The result is that, to six significant figures, \(\beta \approx 1.15873\). That is, the length of the rope is about 15.9% larger than the radius of the circular plot.

NB Some of you may scoff at my use of a web tool to solve my equation, and wonder what happened to rigorous analysis. The truth is that the solution of such equations has become so routine and cheap that, unless you are demonstrating something special about the solution process, there is little value in going through all of the low-level details about solutions. I know this contradicts something I posted earlier today, but the important detail was the derivation of the above equation, from which we obtained the solution.

IYI (If you’re interested): This problem is structurally very similar to those found in certain optics applications. In particular, folks who model image formation in microscopes and similar systems deal with geometry just as in this problem.  Actually, a little more general.  Imagine the following problem: there is a circular plot of grass, and a goat is tied up somewhere with a rope of a given length.  How much of the grass can the goat eat?  Now imagine two goats, each tied with rope of the same length, but in different places.  Again, how much of the grass can the goats eat?

I published a complete solution to this problem, but in the context of imaging a transilluminated object with a microscope using an extended source.  The problem, of course, can get far more complicated in the optics context, and all analogies with goats and grass gets lost when we ask about aberrations and defocus.  If you have the stomach for this, I published another paper which dealt with this.


An interesting sum

Problem: compute, in closed form, the following sum:
\(S = \displaystyle\sum_{n=1}^{\infty} b_n 2^{n}\),
\(b_n = 2 + \sqrt{b_{n-1}} – 2 \sqrt{1 + \sqrt{b_{n-1}}}\),
and \(b_0 = 1\).

Solution: Make the following substitution:

\(b_n = (p_n-1)^2\).

Then we find that \(p_n^2 = p_{n-1}\) and \(p_0 = 2\). Therefore:
\(p_n = 2^{2^{-n}}\).

The sum desired then takes the form

\(S = \displaystyle\sum_{n=1}^{\infty} \left (2^{2^{-n}} – 1 \right )^2 2^n\).

It is not obvious how to go from here. It’s not even obvious from a cursory inspection that the sum converges. We can verify convergence by observing that

\(\displaystyle\left (2^{2^{-n}} – 1 \right )^2 2^n \sim (\log 2)^2 2^{-n} \; (n \to \infty )\).

Now that we know that the sum converges, we can continue. We could expand the sum to evaluate, but there is a diverging piece which would give us fits (the \(2^n\) piece). Knowing that the sum converges, and therefore the divergences cancel, we write

\(S = \displaystyle\lim_{m\to\infty} S_m\)
\(S_m = \displaystyle\sum_{n=1}^{m} \left (2^{2^{-n}} – 1 \right )^2 2^n\).

The trick to see here is that
\(\displaystyle\left (2^{2^{-n}} \right)^2 = 2^{2^{-(n-1)}}\).

\(S_m = \displaystyle\sum_{n=1}^{m}2^{2^{-(n-1)}} 2^{n} – 2 \displaystyle\sum_{n=1}^{m}2^{2^{-n}} 2^n + \displaystyle\sum_{n=1}^{m} 2^n\), or

\(S_m = 2 \displaystyle\sum_{n=0}^{m-1}2^{2^{-n}} 2^{n} – 2 \displaystyle\sum_{n=1}^{m}2^{2^{-n}} 2^n + 2^{m+1}-2\).

The terms in the first two sums all cancel except for the first term in the first sum and the last term in the last sum. We can then write \(S_m\) in closed form:

\(S_m = 2 – 2 \displaystyle\left (2^{2^{-m}} – 1 \right ) 2^m\).

Using the fact, recited above, that

\(\displaystyle\lim_{m\to\infty}\left (2^{2^{-m}} – 1 \right ) 2^m = \log 2\),

we can then write the solution as

\(S = 2 (1 – \log 2)\).

Tough equation, easy solution

Problem: Solve the equation

\(x^{2}+2 a x+\displaystyle\frac{1}{16}=-a+\sqrt{a^2+x-\displaystyle\frac{1}{16}}\)

for \(x\) real and \(0alt : plots

There is an easy way to attack the solution of this equation and a hard way. The hard way is to add [latex "size="-2"]a\) to both sides and square, generating a fourth degree equation in \(x\). No thanks. The easy way, however, is to note that the right-hand side is a solution to the equation

\(x=y^{2}+2 a y+\displaystyle\frac{1}{16}\)

Furthermore, the solution to this equation gives

\(y=x^{2}+2 a x+\displaystyle\frac{1}{16}\).

That is, we have an equation for points where a function equal to its inverse function. The points in such a case lie along the line \(y=x\) (as can be verified in the plot), and we simply have a quadratic to solve:

\(x=x^{2}+2 a x+\displaystyle\frac{1}{16}\),

the solution to which is

\(x=\displaystyle\frac{1}{2}-a \pm \sqrt{\displaystyle\frac{3}{4}-4 a (1-a)}\).

Find the function


Given real functions \(f\), \(g\), and \(h\) which satisfy the following:

\(f’=2 f^{2} g h +\displaystyle\frac{1}{g h}\),

\(g’=f g^{2} h +\displaystyle\frac{4}{f h}\),

\(h’=3 f g h^{2} +\displaystyle\frac{1}{f g}\),

with \(f(0)=1\), \(g(0)=1\), and \(h(0)=1\), find the function \(f(x)\).


Clearly, one can divide each equation above by the functions \(f\), \(g\), and \(h\), respectively.  Adding the equations together produces a single equation for the function \(u=f g h\):

\(\displaystyle\frac{u’}{u}=6 \left (u+\displaystyle\frac{1}{u}\right )\),

from which a solution for \(u\) is produced:

\(6 x+C=\displaystyle\int^{u}\displaystyle\frac{du’}{1+u’^{2}}=\tan^{-1}u\)

Because \(u(0)=1\), \(u(x)=\tan \left (6 x+\displaystyle\frac{\pi}{4}\right )\).

This result is plugged back into the first equation, which may be written as follows:

\(\displaystyle\frac{f’}{f}=2 u+\displaystyle\frac{1}{u}=2 \tan \left (6 x+\displaystyle\frac{\pi}{4}\right )+\cot \left (6 x+\displaystyle\frac{\pi}{4}\right )\).

Integrating both sides, we can solve for the unknown function \(f\):

\(f(x)=-\displaystyle\frac{2}{6}\log \left [\cos \left (6 x+\displaystyle\frac{\pi}{4}\right) \right ] + \frac{1}{6}\log \left [\sin\left (6 x+\displaystyle\frac{\pi}{4}\right) \right ] + C’\).

Using \(f(0)=1\), the solution takes the form:

\(f(x)=2^{-\frac{1}{12}} \left [ \displaystyle\frac{\sin\left (6 x+\displaystyle\frac{\pi}{4}\right)}{\cos^{2}\left (6 x+\displaystyle\frac{\pi}{4}\right)} \right ]^{\frac{1}{6}}\)

Here is a plot of the solution.

An unintuitive limit

Problem: Let \(f(n)\) = the number of zeros in the decimal representation of \(n\).  For example, \(f(1009)=2\).  For \(a>0\), define



\(L=\displaystyle\lim_{N\to\infty}\frac{\log S(N)}{\log N}\).

Solution: The key is to recognize that the sum \(S(N)\) is best evaluated according to how many digits are in \(N\).  As an example, suppose \(N=9\): none of the single-digit numbers between 1 and 9 have zeros, so the sum \(S(N)\) is equal to 9.  Within the two-digit numbers, note that there are 9 numbers with 1 zero [e.g., 10, 20,…,90] and the rest [90-9=81] with no zeros; in this case \(S(99)=9 a+81+S(9)=9 a+9^2+S(9)\).  For three-digit numbers, there are 3 possibilities: the number can have 2, 1, or no zeros.  There are 9 such numbers with two zeros.  Numbers with 1 zero include 101 and 110; note that there are 2 different numbers resulting from two non-zero digits and one zero digit.  Given that there are \(9^2\) combinations of digits, it is clear that there are \(2\times 9^2\) three-digit numbers having exactly one zero digit.  Finally, there are \(900-162-9=729=9^3\) three-digit numbers with no zeros.  Therefore,

\(S(999)=S(10^3-1)=9 a^2+2\times 9^2 a+9^3+S(10^2-1)\).

The pattern here is clear to the [mathematically inclined] observer:

\(S(10^k-1)=9 a^{k-1}+\binom{k-1}{1}\times 9^2 a^{k-2}+\ldots+\binom{k-1}{k-2}\times 9^{k-1} a+9^{k}+S(10^{k-1}-1)\)


\(S(10^k-1)=9 (a+9)^{k-1}+S(10^{k-1}-1)=9 \displaystyle\sum_{j=0}^{k-1} (a+9)^{j}\).

Summing the series, we arrive at the following:

\(S(10^k-1)=9 \frac{(a+9)^{k}-1}{a+8}\).

In evaluating \(L\), observe that the limit \(N\to\infty\) is equivalent to \(k\to\infty\). In this limit, \(\log N \approx k\). We therefore get the final result:

\(L=\log (a+9)\)

Hoisting the world by a string

Problem: I have a rope, it fits around the equator exactly once. I add 10 cm to the rope, attach the ends, and pull up. How high off the ground can I pull the rope?

Solution: The length \(L\) of the rope before adding the 10 cm is \(L=2 \pi R\), where \(R\) is the radius of the earth and is about \(6.4 \times 10^6\) meters.

After adding the 10 cm, you give the rope a stretch in the center.  When the rope is stretched to full tautness, the result will be most of the rope hugged against the equator, with 2 straight pieces, each tangent to the equator.  Thus, we have the right triangle as pictured above.  We are interested in solving for the height \(h\).

We proceed by considering the rope after the addition of the 10 cm which will now be denoted as \(\Delta L\).  When pulled up, the rope hugs the earth outside the points of tangency.  Denote the points of tangency on the earth as corresponding to an angle \(\theta\) from the vertical, and the length of one of the two sections of rope not attached to the earth as \(y\).  The following relation holds:

\(L+\Delta L=2 (\pi – \theta) R+ 2 y\),


\(y^2=(R+h)^2-R^2=2 R h+h^2\).

Use the fact that \(L=2 \pi R\) and \(\theta=\tan^{-1} \frac{y}{R}\), and define \(w=\frac{\Delta L}{2 R}\) and \(z=\frac{y}{R}\).  The above equation is then rewritten as


where we must solve for \(z\).  Of course, this is a transcendental equation that cannot be solved exactly, but it is clear that, since \(w\) is very small, then \(z\) must also be small, and we can get places by expanding the transcendental function in a series:

\(z-\tan^{-1}z=\frac{1}{3} z^3-\frac{1}{5} z^5+\frac{1}{7} z^7-\ldots\)

The best way to use this series is to consider the first term, with all higher-order terms being some error:

\(z-\tan^{-1}z=\frac{1}{3} z^3+O(z^5)=\frac{1}{3} z^3 \left [ 1+O(z^2) \right ]\).

Then, to lowest order, we obtain the following:

\(z=\left ( 3 w \right )^{\frac{1}{3}} \left [ 1+O(w^{\frac{2}{3}}) \right ]\).

Now, to lowest order,

\(z=\sqrt{2 \frac{h}{R}} \left [ 1+O \left ( \frac{h}{R} \right ) \right ]\),

so that we now have an approximate solution and an error estimate:

\(h=\left ( \frac{9}{32} R \Delta L^2 \right )^{\frac{1}{3}}+O\left [ \left ( \frac{\Delta L^4}{R} \right )^{\frac{1}{3}} \right ]\).

Note that the solution involves the radius of the earth, which is a very large number compared with the rope extension of 10 cm.  The result will then be surprisingly large; the first-order term is, using the values given above, about 26.2 meters.  The error term, on the other hand, is on the order of 0.001 meters, or 0.1 cm, and can safely be called negligible.  However, for larger extensions, we can simply expand the series solution further until the error estimate is within acceptable bounds.

An improper double integral

Problem: Evaluate the following double integral.

\(I=\displaystyle\int\limits_{-\infty}^{\infty}dx\int\limits_{-\infty}^{\infty}dy\; e^{-x^2-y^2-(x-y)^2}\)

Solution: This problem was taken from the collection “Berkeley Problems in Mathematics“, Problem 2.3.3.  Two solutions are given, neither of which are close to [stylistically] mine, which I give below.

First, change to polar coordinates; that is, \(x=r \cos \theta\), \(y=r \sin\theta\).   Using the Jacobian \(dx\,dy=r\,dr\,d\theta\), we get

\(I=\displaystyle\int\limits_{0}^{2\,\pi}d\theta\int\limits_{0}^{\infty}dr\,r\;e^{-2\,(1-\sin \theta\, \cos \theta)\,r^{2}}\)


\(I=\displaystyle\frac{1}{4}\,\int\limits_{0}^{2\,\pi}\frac{d\theta}{1-\sin \theta\,\cos \theta}=\frac{1}{4}\,\int\limits_{0}^{2\,\pi}\frac{d\theta}{1-\frac{1}{2}\sin \theta}\).

Now, there are two ways I can think of to go about evaluating this latter integral.  First, we can Taylor expand and hope for the best; it turns out that the resulting series converges to something well-known, but you have to be an expert at recognizing such things.  [I so hate solutions that require a deus ex machina like that.]  The other way is to convert to complex variables and use the Residue Theorem. [I’m afraid, however, if you are not familiar with the Residue Theorem, then we are back to a deus ex machina. But one has to draw the line somewhere, I guess…]

So, consider the following integral:

\(J(a)=\displaystyle\int\limits_{0}^{2\,\pi}\frac{d\theta}{1-a\,\sin \theta}\;:|a|<1\).

Observe that

\(\displaystyle \sin \theta=\frac{1}{2 i}\left (e^{i \theta}-e^{-i \theta} \right )\).

The trick is to recognize that we are integrating over the unit circle \(C\).  if we let \(z=e^{i \theta}\), and transform to an integral over \(z\), then the result is the following:

\(J(a)=-\displaystyle\frac{2}{a}\,\displaystyle\oint\limits_C \frac{dz}{z^2-i \frac{2}{a}\,z-1}\).

Recall that a residue of a function \(f\) at \(z=z_0\) is equal to

\(\mbox{Res}(f;z_0)=\displaystyle\lim_{z \to z_0} (z-z_0)\,f(z)\),

with the Residue Theorem stating that, for a function \(f\) having simple poles \(\displaystyle\{z_n\}_{n=1}^{N}\) within the simple closed curve \(C\), then

\(\displaystyle\oint\limits_C dz\,f(z)=i\,2\,\pi\,\sum\limits_{n=1}^{N} \mbox{Res}(f;z_n)\).

To compute \(J\) using the Residue Theorem, we must compute the roots of the quadratic in the denominator of the integrand.  These roots are at

\(z=z_{\pm}=\displaystyle\frac{i}{a}\left (1 \pm \sqrt{1-a^2} \right )\).

Note that \(\displaystyle |z_{+}|>1\), so that we need only consider the root \(z_{-}\).  Hence,

\(J(a)=-\displaystyle\frac{2}{a}\,i\,2\,\pi\,\frac{1}{z_{-}-z_{+}}=\frac{2 \pi}{\sqrt{1-a^2}}\).

Finally, the result is

\(I=\displaystyle\frac{1}{4}\,J \left (\frac{1}{2} \right )=\frac{\pi}{\sqrt{3}}\).

Is it cheating to use a symbolic math computer to do your homework?

Fascinating demonstration given by Conrad Wolfram of Wolfram Research at TEDx, concerning the question of whether or not one cheats by using Wolfram Alpha to do your integrals for you.

The short answer to the question is that there is cheating going on, but not in the way someone who asks this question would think. The gist is that, as Wolfram claims, about 80% of math education consists of hand computations: computing integrals, derivatives, limits, roots, matrix inverses, etc. But not only is this all incredibly boring, but it also ill-prepares students for the real mathematical challenges out there. Really, the challenge is to teach students how to translate real-world problems in business, engineering, etc., into a mathematical language. Once the pure computation problem is set up, then a machine like Wolfram Alpha can turn the crack and generate data. The remaining challenge is to figure out how to interpret the data, and such an interpretation does not lend itself to a black/white solution.

Another point that Wolfram makes is that calculus should be taught a lot earlier than it is now. When, he does not say, but he makes the case that there are concept in calculus, namely the limit, that a “3 or 4 year-old” could grasp. He points to a terrific visual example of using inscribed polygons to approximate \(\pi\).  The greater point is that math education in the US needs a radical reshaping, and that computers are crucial in this reshaping.  The cheating done, in the meantime, is not by the students, but to the students, because they are being told that the computational tools they will use int he real world to solve problems are viewed as verboten in school.

In my opinion, Wolfram has a number of terrific points and his demonstration is valuable and should be viewed by anyone with an interest in math education.  But ultimately, Wolfram’s proposals would create a generation of students with too much trust in the computer, and by extension the people who program the computer.  One must remember that Wolfram is in the business of providing computational engines, and the stock of his company rises if the people behind his company are seen as the gatekeepers to a mysterious technology.  It is not unlike the trend of making automobile engines more computerized and less able to be worked on by average people.  By saying that the messiness of computation is boring and turns off students, we increase the reliance of math professionals on the computer and leave out the crucial skill of checking the computer for errors.

I have had quite a bit of experience with this issue in my work at IBM.  In semiconductor lithography, one of the main challenges is to simulate the physical processes involved in imaging circuit patterns on a wafer.  The calculations involved in this simulation are extremely complex and very heavy-duty.  We did rely on software packages to do a lot of this, but most of the time, the models we built with these software packages led us astray.  Was it a problem with the data, or was the computer lying to us, or, even more subtly, was the computer telling us the truth but we were making false assumptions about that truth?  The problems in trying to answer these questions were severe: taking data on the few running machines we had was expensive and getting time was difficult.  The software vendors were always too busy to answer our difficult questions about the integrity of their computational models.  The only practical way to deal with this was for IBM to have someone who could devise simple tests that would reverse-engineer the engine’s algorithm and assess from where mistakes were coming.

That someone was invariably myself, as I had all the necessary background, both from my schooling and my work experience.  I knew how to look under the hood.  More importantly, I knew how to derive the equations that went under the hood.  And many of these equations weren’t simple expressions that could be typed into Wolfram Alpha.  Rather, such equations required careful geometrical reasoning and pattern matching that was difficult, if not impossible, with which to trust such a tool as Wolfram Alpha.  In fact, I found it best to be completely distrustful of the computer as I was building my test cases.  These test cases would be designed so as to be hand computable, yet nontrivial.  Once these test cases were designed and computed, then the diagnosing of problems could commence.

Furthermore, without someone to understand how to plumb the depths of how computations are done, we would not get users who can diagnose incorrect results at the chip level.  That’s right, recall Intel’s Pentium FDIV error.  Finding this error took a forensic approach to computation – an approach that none of us would have in Wolfram’s world, as none of us would deign to even think about so lowly an operation as division.  And, irony of all ironies, Wolfram’s flagship product, Mathematica, has not been without its own problems over the years – not just standard-issue software bugs, but incorrect algorithms.

As to the point Wolfram makes that calculus can be taught a lot earlier – making allusions to 3 or 4 year-olds.  I’m not so sure.  Yes, the basic calculus concept of the limit is easy to grasp, but beyond the most superficial level it is essentially a deus ex machina.  Further, applying those limits to sequences and series involves the culmination of everything a typical calculus student has learned.  Sloppy analytical techniques leads to an inability to solve problems, even if the calculus concepts are well understood.  I have a terrific example of this from my days as an undergraduate tutor in the Math Dept at UMass.  I used to sit in the calculus drop-in centers for students taking the business calc [Math 127/128 for those of you who know of which I speak].  Now, I admit, this was not the calculus that one with serious mathematical curiosity took, but still.  Anyway, at some point in time, the students were required to perform double integrations of polynomials over 2 variables, and come up with a number as an answer.  The drop-in center got real busy with folks who were simply perplexed.  A typical conversation would go like this:

  • Me: So, tell me, what’s troubling you?
  • Student: I can’t do these integrals!
  • Me: Well, why don’t you do this one in front of me, and let’s see what’s wrong.
  • Student: OK.  So first I do the integral over y…is that right?
  • Me: Yes.
  • Student: Now I do it over x.  is that right?
  • Me: Looks good.
  • Student: Now I plug in the limits and…it gives me a different answer than what the answer key tells me.
  • Me: that’s because you added wrong.  1/2 – 1/3 = 1/6, not what you wrote.
  • Student: huh?  I don’t understand?
  • Me: Do you know how I got 1/6?
  • Student: No.

So, what we learn here is that the student understood the mechanics of integration, but couldn’t add fractions.  How is such a student supposed to comprehend a result from Wolfram Alpha?

So, I disagree that the mechanics of computation are best left to the experts.  I do think that there is a place for learning the mechanics of a root solve, or an integration – in fact, many, many such operations – as a part of math education.  I do agree that computers should play a greater role in math education, and perhaps elements of calculus could be taught earlier.  But hand computation is essential if we are going to educate a class of people ready to question authority.