Friday, October 28, 2016

Overrated Things

Christopher Hitchens famously said that the four most overrated things in life were champagne, lobster, anal sex and picnics. I'm with him on three of those. Nobody will ever improve on Hitch's list, but I'll nominate four things as next-most-overrated: falafel, the Bay Area, brownstones, and diners.

Falafel. People get so happy when they see a falafel place, or when there's falafel in the lunch buffet. I wouldn't feed falafel to a farm animal. It's dry and mealy on the inside, hard and greasy on the outside. I was a vegetarian for 11 years, and during that time I fooled myself into thinking that a lot of vegetarian things were good, but I was never deluded enough to like falafel. Most things that are overrated are pretty good in absolute terms, but falafel isn't just overrated—it's terrible.

The Bay Area. Having lived in the Bay Area, having visited many times, my opinion is that living in the Bay Area would be very nice as long as you could walk to the Cheeseboard, walk to your job every day, and never be forced to listen to Bay Area residents talking about how great it is to live in the Bay Area. In reality, living there means you will be listening to a lot of that, and you will be spending a lot of time in your car. Everything in the Bay Area shuts down too early at night. There is too little world-class high culture. Also, many Californians are just too laid-back to have the manners that my Southern and Midwestern heritages have conditioned me to expect from well raised people. You don't need to tell me about the ocean, the mountains, and the rest—I agree that stuff is great. I also think that my friends who live in California have seen no more of it in the past couple of years than I have.

Brownstones. Not enough window light, no view, and no building amenities. Frequently devoid of their original charm thanks to "renovation." Yet brownstones cost millions of dollars! I wouldn't buy a brownstone if I could. You are literally buying a facade.

Diners. I know diners. I was raised in my parents' diner. I'm never more comfortable than when I'm lolling in the booth of a diner. But there is way too much romance about diners. New York diners in particular are an expensive way to eat terrible food. Being on a quest for the great American diner—which, to be clear, I often am—just goes to show how worthless most diners are. You don't go on a quest for something unless it's elusive.

Bonus overrated thing:

Your favorite TV show. I agree that we're in "the golden age of TV." However, that's basically like saying we're in the golden age of burritos. Which we are! But burritos, like your favorite TV show, are overrated.

Tuesday, October 25, 2016

Comparing Election Models: FiveThirtyEight, The Upshot, and Princeton Election Consortium

Today on The Upshot I found a nice table that compares all of the major election models on a state-by-state basis. The results look a lot like what I'm seeing in my own model.

Using three of the major models—FiveThirtyEight, The Upshot, and Princeton Election Consortium (Sam Wang)—I created my final list of battleground states. Here it is:

Battleground (EV) D Participation D Win Prob. R Participation R Win Prob.
Florida (29) 82%–86% 0.82–0.86 99%–100% 0.14–0.18
Pennsylvania (20) 95%–97% 0.95–0.97 39%–55% 0.03
Michigan (16) 96%–97% 0.96–0.97 23%–44% 0.04
Wisconsin (10) 94%–96% 0.94–0.96 12%–31% 0.04–0.06
Minnesota (10) 95%–99% 0.95–0.99 4%–12% 0.01–0.05
Colorado (9) 92%–94% 0.92–0.94 18%–26% 0.06–0.08
Nevada (6)89% 0.89 20%–26% 0.11
New Hampshire (4) 98% 0.98 4%–8% 0.02
Tue 25 Oct 2016

In many of these states, there are competitive Senate races going on:

To make the list of battleground states, I simulated the three models separately and identified battleground states according to each model. The states above are those that emerged as battleground states in at least two out of the three models.

(To repeat the "battleground" notion that operates here: a state counts as a priority for candidate X if the state goes for candidate X in a relatively high fraction of the candidate's wins, even as the probability of candidate X winning the state is relatively low. Intuitively, priorities are states you need but don't yet have. Battleground states are those that appear on both candidates' priority lists. Note that choosing priorities this way involves an element of arbitrariness, or at any rate, judgment: what is a "high" fraction of wins? what is a "low" probability?)

The fact is, the battleground analysis paints a terrible picture for the R candidate. From what I remember, it looks nothing like the race in 2008. One way to see the D-R asymmetry at a glance is to look at a scatter plot of the win probability against the participation rate. Here is the D plot:

In this plot, each dot is a state. (Or an electoral-vote granting entity such as the District of Columbia or Maine Congressional District 2.) The horizontal axis is the probability of D winning the state. The vertical axis is the percent of D victories in the simulation for which D wins the state. The plot above represents a reasonable situation to be in: the more you need a state, the more you have it.

Compare to the R plot:

There are many states you need (up high on the plot) that you don't really have (left half of the plot).

So on paper, this is looking like a slaughter. But as the sportscasters like to say after every upset victory, That's why they don't play these games on paper. Get out and vote! This is especially important because of down-ballot elections.

In fact, if neither of the presidential candidates turns you on (I think I threw up a little in my mouth just now), leave the top spot blank. Don't vote for President. Just go the polling place and vote for whatever else looks like it makes sense.

I have some friends and relatives who never vote. The main reason they give is that they don't know enough about the candidates or the issues to make an informed choice. I can sympathize with that, because I often feel that way myself. Back when I lived in small-town Vermont, I remember one election when I only voted on a single race from the entire ballot, because I just didn't know anything. (I knew one of the people running and generally admired her judgment, so I voted for her.)

And in the most recent primary election in New York state, I had no idea which of the people running for Congress I should vote for. You know what saved me? Ballotpedia. I spent 15 minutes skimming Ballotpedia and I basically had an idea of which candidate seemed best. If you haven't voted recently, give it a try.

Another reason I've heard for not voting is that "The candidates are all the same." This—forgive me—is madness. Had John McCain been elected President in 2008, we wouldn't have Obamacare today, and we wouldn't have joined a range of new climate treaties. You may hate these things, you may cheer these things, but either way, they are big decisions for a country to make, and they directly reflect the choices that Americans collectively made at the ballot box in 2008 and 2012.

A related reason people don't vote is that "My vote doesn't make a difference." This feeling is understandable; even economists struggle to explain voting within a framework of rational self-interest. But when all else fails—when the demands of work and parenting and everyday life make voting seem decidedly optional—I do it anyway. My parents taught me that I have a duty to vote.

Sunday, October 23, 2016

It's Not A Difficult Series To Sum

As the story goes, somebody once posed a problem to Johnny von Neumann at a cocktail party:
A gnat flies back and forth between the front wheels of two bicycles that are approaching one another at respective speeds of 10 mph. The gnat flies at 20 mph and turns around instantly when it reaches either wheel. If the bicycles are initially 1 mile apart, then how far has the gnat traveled by the time it gets crushed?
If you visualize the problem, or anyhow an idealized version of it, then you will see that the gnat makes infinitely many trips back and forth, each trip shorter than the last. You could figure out the distance of each trip and sum an infinite series to find the answer. For the numbers given in the problem, this turns out to look like
23(1  + 13 + 19 + 127 + ... ) miles. 
The sum of an infinite series of the form 1 + r + r2 + ... (for r < 1) is 1(1 − r). Putting r = 13, we find the gnat's total distance as
2/3(1 − 1/3) = 1 mile.
The fast way to solve the problem is to reason that since the initial 1 mile separation between the bicycles is closing at a rate of 20 miles per hour, it will take 120th of an hour for the bicycles to collide; and the gnat is flying 20 mph for that whole time, so it covers a total distance given by speed × time = 20 mihr × 120 hr = 1 mile.

Now the story goes that von Neumann, presented with the problem, quickly answered "one mile." "Ah ha," said his friend, "you discovered the trick." "Trick?" said von Neumann. "It's not a very difficult series to sum."

As brilliant as von Neumann was, I have a hard time believing the story. But it's part of the lore—I heard the story back when I was a graduate student, and there are dozens of variations floating around. If the events actually did happen, then my interpretation of them isn't the usual one. Rather than concluding, as I think most people do, that von Neumann had instantly solved the problem the long way in his head—I just don't think that's possible, even for the person Los Alamos used as a human supercomputer—and rather than concluding that he was trying to suggest that he had (let's assume he wouldn't do something that lame), I think von Neumann was making a joke. The joke is that he had instantly solved the problem the long way in his head. (If this joke seems too meta, consider that von Neumann was the co-inventor of game theory.)

Once I was faced with a similar problem. I wanted to calculate the cooling effect in an expanding ideal gas directly from the motion of the gas particles. Instead of a single gnat, I was analyzing the motion of an enormous number of them, not all of them moving at the same speed, as they bounced back and forth between a moving piston and a container wall. Another important difference from the gnat problem was that the particle speeds weren't constant: the distribution of particle speeds changes as particles rebound from a receding piston. My results are published in the American Journal of Physics. The answer takes a very complicated form, otherwise I'd suspect you could find it somewhere in an old notebook of Boltzmann, Maxwell, or Gibbs.

I was reminded of all this the other day when I was looking at a triangle shaped like this blue one:

The area can be found by simply calculating one-half the base times the height. The result is 10.5. Now, to prove that "one-half base times height" works for a triangle like this, we usually show a picture like so:

Suppose we've previously established the principle that the area of a right triangle is one-half base times height. By this principle, the large right triangle has area 12 × 5 × 7 = 17.5. Meanwhile, the small right triangle has area 12 × 2 × 7 = 7. So the area of the triangle we want, the blue triangle, is 17.5 − 7 = 10.5. 

This is how it's usually done; here is Salman Khan, for example (skip to the 7-minute mark). But the other day, I was suddenly dissatisfied with a method of calculating the area of the blue triangle that forces me to contemplate the area of shapes that lie outside the blue triangle. 

If all we want is the area inside the blue triangle, then can't we find it by calculating only areas of shapes that are inside the blue triangle?

In the Borges short story "Funes the Memorious," the character of Ireneo Funes invents an absurd number system in order to resolve his dissatisfaction with having to use more than one number symbol to represent the number of gauchos in an episode from Uruguayan history. The method I used to calculate the area of the blue triangle is no less absurd—but perhaps similarly amusing. It goes like this.

Draw infinitely many horizontal and vertical segments to carve the blue triangle into infinitely many right triangles.

Use similarity to determine the lengths of the horizontal and vertical segments, then calculate the areas of the right triangles using "one-half base times height." I'll let you verify that this gives an infinite series,
area = 44150 (1 + 425 + 16625 + 25615625 + ...).
As the saying goes, it's not a very difficult series to sum, and the result is
area = 441/50(1 − 4/25) = 10.5
as expected.

One can do the problem symbolically, replacing 3 by b, 2 by x, and 7 by h. Then we get
area = \(\frac{1}{2}b^2h\frac{b+2x}{(b+x)^2}\sum_{n=0}^\infty{\left(\frac{x}{b+x}\right)^{2n}}\).
It is straightforward to verify that this gives 12bh, as expected. Thus, the area is one-half base times height.

Part of the "fun" of this method is to conceive of the area of a triangle as an infinite series. But just as one can derive 1 + r + r2 + ... = 1(1 − r) recursively by noticing that 1 + r + r2 + ... = 1 + r(1 + r + r2 + ...), one can find the triangle area recursively as well, thus avoiding the "dot-dot-dot." To do this, denote by A the area of the blue triangle. Then A is the sum of the largest right-triangle area (found by similarity as 6310), the next-largest right-triangle area (found by similarity as 12650), and the area of a triangle that is similar to the blue triangle by a scale factor 25. So we have
A6310 + 12650 425A.
Solve this linear equation to find A = 10.5.

Borges himself had occasion to write down an infinite series in his essay "Avatars of the Tortoise." The number of thinkers who appear in this essay is vast (Zeno, Aristotle, Hui Tzu, Thomas Aquinas, Lewis Carroll, and Williams James, among others), and I don't entirely follow the reasoning. But the series in the essay makes a fitting end to today's post:
10 + 1 + 110 + 1100 + 11000 + 110,000 + ....  
Needless to say, the sum of this series is 11.111\(\cdots\).

Saturday, October 22, 2016

Points In A Circle Problem: Solution for N = 2

The problem we've been considering is,
Place N points in the unit disk in such a way as to minimize the greatest possible area of a triangle in the disk that doesn't contain any of the points.
Note that the sense of 'containing a point' is strict: if a point lies on the boundary of a triangle, the triangle is not considered to contain the point.

Click here for my conjectured solution for N = 3 and an overview of my solution for N = 1.

The result for N = 1 is just what you'd expect: placing the obstacle point in the center of the disk minimizes the greatest possible area of a triangle in the disk that doesn't contain the point. The center of the disk is the unique point that minimizes the greatest area.

The N = 2 problem is
Place two points in the unit disk in such a way as to minimize the greatest possible area of a triangle in the unit disk that doesn't contain either point.
Unless I'm mistaken, the solution to N = 2 follows easily from the solution for N = 1, as follows.

To economize on words, let's introduce a bit of notation. Given two obstacle points X and Y in the closed unit disk,

  • let a(X, Y) denote the greatest possible area of a triangle in the disk that doesn't contain X or Y.
  • let a* be the minimum possible value of a(X, Y).

Our problem is to calculate a* and find a pair of points X* and Y* for which a(X*, Y*) = a*.

(I take the indicated extreme values to exist; this seems safe given the closed and bounded features of the problem.)

Now to begin with, if O is the center of the disk, then from the N = 1 case we know that a(O, O) = 1. Therefore a* cannot be greater than 1.

Furthermore, a* cannot be less than 1, because there are no obstacles X and Y for which a(X, Y) is less than 1. This we show as follows:

1. If X and Y are coincident, then from the N = 1 case, we know that a(X, X) is at least 1.

2. Next suppose X and Y are distinct. Now, either X and Y are collinear with the center of the disk, or else X and Y are not collinear with the center of the disk.

2(a). If X and Y are collinear with the center of the disk, then some diameter of the disk passes through both X and Y. Thus, either of the two 45-45-90 triangles with that diameter as the hypotenuse are feasible. This shows again that a(X, Y) is at least 1.

2(b). If X and Y are not collinear with the center of the disk, draw diameter AB parallel to XY. Both X and Y are on the same side of AB. Therefore, one of the two 45-45-90 triangles with hypotenuse AB will be feasible, so that again a(X, Y) is at least 1.

Thus, for all possible obstacle points X and Y, a(X, Y) is at least 1. Hence, a* cannot be less than 1.

Since a* cannot be greater than 1, and a* cannot be less than 1, it follows that a* = 1. Therefore 1 is the smallest possible area of a greatest area triangle in the disk not containing two obstacle points.

To attain this minimum, one may place the first obstacle point at the center of the disk and the second obstacle point anywhere in the disk.

(Because triangles cannot contain the center point, we know from the N = 1 case that their areas can't exceed 1; moreover, there does exist a feasible triangle, as shown in 2(a) above. Thus a(O, Y) = 1 = a* for all Y.)

Here is a picture of one solution:

The way the N = 2 problem is stated, the problem is solved as long as we exhibit a single pair of obstacle points that minimizes the greatest possible area of a feasible triangle. It is interesting however to extend the problem by asking for all obstacle pairs (X*, Y*) satisfying a(X*, Y*) = a*. For example, if the obstacle points have Cartesian coordinates M(−0.001, 0) and N(+0.001, 0), then it would seem that a(M, N) = 1. What mathematical properties of continuity or connectedness does the set of optimal obstacles have? There are probably general theorems that bear on this question.

Sunday, October 16, 2016

Points In A Circle Problem: Solution For N = 1

The problem we've been considering is,
Place N points in the unit disk in such a way as to minimize the greatest possible area of a triangle in the disk that doesn't contain any of the points.
Note that the sense of 'containing a point' is strict: if a point lies on the boundary of a triangle, the triangle is not considered to contain the point.

My conjectured solution for N = 3 is shown below. The three obstacle points are equally spaced around a circle, the radius of which is chosen so that the two triangles shown below have the same area.

How about the N = 1 version of the problem?
Place a point in the unit disk in such a way as to minimize the greatest possible area of a triangle in the unit disk that doesn't contain the point.
It is intuitively compelling that the best location for the obstacle is the center of the disk, and this in fact is the case. Putting the obstacle point at the center limits the greatest possible triangle area to 1, as shown below.

For any other location of the obstacle point, there exists a triangle of area greater than 1 that doesn't contain the point. Here's what that situation looks like in a particular case.

I won't post a detailed version of my solution to the N = 1 case, because my approach was pretty weedy—in part because I was retrofitting ideas that I'd generated while thinking about the N = 3 case. Maybe there's a fast way to get straight to the answer for N = 1?

At any rate, here was my approach in outline:

1. Establish some facts that reduce the number of possibilities.

     (i) For any placement of the obstacle point, a greatest-area triangle not containing the obstacle point must have all three vertices on the boundary of the disk.

The approach here is to show that if a triangle has a vertex in the interior of the disk, then the area of the triangle can be increased, without losing feasibility, by nudging vertices a sufficiently small distance. This looks a little different depending on whether the obstacle point is initially outside the triangle, or at a vertex of the triangle, or on a side of the triangle.

(In the first of these cases, you have to show that a point initially outside a triangle remains outside when a vertex is nudged a sufficiently small distance. This probably isn't hard to show using elementary methods, but it's a snap if you happen to know about barycentric coordinates.)

     (ii) Given two vertices in a greatest-area triangle not containing the obstacle point, the remaining vertex must be one of a finite number of possibilities. 

The feasible region for the third vertex turns out to be an arc on the boundary of the disk. Sliding the third vertex along this arc by a small amount increases the area of the triangle—unless the third vertex is one of the endpoints of the arc, or else one of the two places on the boundary of the disk where sliding the third vertex doesn't gain you any area (because sliding moves you parallel to the base of the triangle).

     (iii) If a greatest-area triangle not containing the obstacle point doesn't have the obstacle point on its boundary, then the triangle is an inscribed equilateral triangle.

This is basically a special case of (ii), in which the only possibilities for the third vertex are the two special places where sliding the third vertex doesn't gain you any area. The upshot is that the triangle must be isosceles on each of its bases, and therefore equilateral.

2. Use the above facts to determine an explicit function a(r) giving the area of a greatest-area triangle not containing an obstacle point located a distance r from the center of the disk.

Here is a sketch of the case \(0 < r < \frac{1}{2}\). Let the obstacle point be given, and consider a greatest-area triangle that doesn't contain the obstacle point. By (i), all three triangle vertices lie on the boundary of the disk. By (iii), some edge of the triangle touches the obstacle (because an equilateral triangle isn't feasible for \(r < \frac{1}{2}\); the incircle of an equilateral triangle has radius \(\frac{1}{2}\), so no matter how the triangle is rotated, it will contain an obstacle with \(r<\frac{1}{2}\)). Denoting the angle of that edge (as measured from the horizontal) by θ, a little trigonometry gives the length of the side as \(2\sqrt{1-r^2\sin^2\theta}\). The altitude is found with the help of (ii) as \(1 + r\sin\theta\). So the triangle area is \((1+r\sin\theta)\sqrt{1-r^2\sin^2\theta}\). Differentiating with respect to θ, we find that the area attains a maximum value \((1+r)\sqrt{1-r^2}\) when \(\theta = \frac{\pi}{2}\).

The result of analyzing the full range of possible r values is \(a(r) = (1 + r)\sqrt{1 - r^2}\) for \(0 \leq r \leq \frac{1}{2}\) and \(a(r) = \frac{3\sqrt{3}}{4}\) for \(\frac{1}{2} \leq r \leq 1\).

3. Show that the unique minimum of a(r) occurs when r = 0. Thus, the unique best location of the obstacle point is the center of the disk.

Straightforward, in view of the fact that we have the explicit function a(r).

So much for N = 1. Maybe I'll start devoting my subway rides to the N = 2 problem now!

P.S. I doubt that principle 1(i) holds for arbitrarily large N.

Thursday, October 13, 2016

Donald Trump's Closing Argument

Faced with criticism about his sexual behaviors, Trump is defending himself by saying that Bill Clinton's behaviors were worse. I'm not sure what the point of this is, but maybe Trump's argument is that the case of Bill Clinton shows that we Americans have decided that a certain amount of sexual aggression and even the odd sex crime aren't disqualifying for a President—hence, these things shouldn't disqualify Trump either. However, even if that were true as of 20 years ago, I don't think it's true anymore.

Even if it were true today, it's hardly a reason to vote for Trump. After all, nobody liked Bill Clinton's sexual behavior. People voted for Clinton because he wasn't only a womanizer: he was also a talented politician and a successful president. Trump offers none of that upside. He wants us to buy only the shitty part of Bill Clinton. 'Bill Clinton did worse, John McCain is a potty mouth, the Republicans denouncing me are hypocrites, lotsa guys talk like that,' says Trump. In other words, he's the worst part of a lot of people, rolled into one loathsome person.

(Illustration from the wonderful children's book That Terrible Halloween Night, by James Stevenson. Buy it on Amazon.)

Thursday, September 15, 2016

Points In A Circle Problem: Update on N = 3

For the configuration of obstacle points shown below, nobody has yet found a valid triangle with area greater than about 0.829. Recall that a triangle is valid as long as it doesn't strictly contain any of the obstacle points.

The obstacle points shown above are equally spaced around a circle of radius 0.32096.... Given these obstacle points, the greatest area I can find for a valid triangle is 0.82863..., attained by the two triangles shown below.

(Reader Matt found these as well.) 

I call the first triangle "the edge triangle," and I call the second triangle "the spotlight triangle."

The reason I placed the obstacle points a distance 0.32096... from the center is that doing so gives the edge triangle and the spotlight triangle equal areas. If the obstacle points were to move closer to the center, the area of the edge triangle would increase and the area of the spotlight triangle would decrease; if the obstacle points were to move further from the center, the area of the spotlight triangle would increase and the area of the edge triangle would decrease. The greater of the two areas is minimized when the obstacle points are as shown. It is on the strength of that circumstance that I guessed these obstacles as minimizing the maximum area over valid triangles.

Anyhow, recently I had a bit of downtime, so I wrote a quick program to look randomly for valid triangles with area greater than 0.82863.... To do that, I first generated 1,000 points distributed randomly over the disk:

These thousand points determine 166,167,000 triangles. I went through those triangles, discarding any of them with area less than 0.815. What remained was a list of about 1.6 million large triangles. I checked this list to see if any of the large triangles were valid. However, none of them were. 

So, this particular attempt at beating the record 0.82863... failed. 

Here are some of the large triangles that were closest to being valid:


It looks as if these triangles are "trying" to be like the spotlight triangle, with one case of an edge triangle. (I think edge triangles are less well represented because they have fewer ways to avoid the obstacles without losing area rapidly by sacrificing altitude.)

Anyway, this exercise increases my confidence somewhat that the edge triangle and the spotlight triangle are indeed maximizers for the obstacle points shown, and that up to symmetry these two triangles are the only maximizers.

Of course, none of this is a proof of anything.

Some notes on the program: 

  • Calculating area is much faster than checking validity. That's why I calculated 166,167,000 triangle areas first, and then checked only the largest triangles for validity. It would have been impractically slow to work in the reverse order (checking 166,167,000 triangles for validity and then calculating the areas for the subset found to be valid). Calculating 166,167,000 triangle areas took about 12 hours. (If I do this again, I'll see if I can optimize the area calculation.)  
  • In the validity test, I'm using barycentric coordinates to determine whether a given point is contained by a given triangle. I'm not sure the barycentric test is faster than one based on same-side-of-line considerations, but so far I haven't needed to worry about speed in this phase of things. (Also, barycentric coordinates make it easy to be quantitative in various ways, for example identifying triangles that are close to valid, as shown in the animation above.)
  • 166,167,000 triangles is a lot of triangles, so the program never holds all of them in memory. Instead of generating all those triangles ahead of time, I put the area test inside a threefold-nested loop over the thousand randomly selected points. Triples of points passing the area test were added to a steadily growing list of large triangles.