Thursday, September 15, 2016

Points In A Circle Problem: Update on N = 3

For the configuration of obstacle points shown below, nobody has yet found a valid triangle with area greater than about 0.829. Recall that a triangle is valid as long as it doesn't strictly contain any of the obstacle points.



The obstacle points shown above are equally spaced around a circle of radius 0.32096.... Given these obstacle points, the greatest area I can find for a valid triangle is 0.82863..., attained by the two triangles shown below.




(Reader Matt found these as well.) 

I call the first triangle "the edge triangle," and I call the second triangle "the spotlight triangle."

The reason I placed the obstacle points a distance 0.32096... from the center is that doing so gives the edge triangle and the spotlight triangle equal areas. If the obstacle points were to move closer to the center, the area of the edge triangle would increase and the area of the spotlight triangle would decrease; if the obstacle points were to move further from the center, the area of the spotlight triangle would increase and the area of the edge triangle would decrease. The greater of the two areas is minimized when the obstacle points are as shown. It is on the strength of that circumstance that I guessed these obstacles as minimizing the maximum area over valid triangles.

Anyhow, recently I had a bit of downtime, so I wrote a quick program to look randomly for valid triangles with area greater than 0.82863.... To do that, I first generated 1,000 points distributed randomly over the disk:


These thousand points determine 166,167,000 triangles. I went through those triangles, discarding any of them with area less than 0.815. What remained was a list of about 1.6 million large triangles. I checked this list to see if any of the large triangles were valid. However, none of them were. 

So, this particular attempt at beating the record 0.82863... failed. 

Here are some of the large triangles that were closest to being valid:


video


It looks as if these triangles are "trying" to be like the spotlight triangle, with one case of an edge triangle. (I think edge triangles are less well represented because they have fewer ways to avoid the obstacles without losing area rapidly by sacrificing altitude.)

Anyway, this exercise increases my confidence somewhat that the edge triangle and the spotlight triangle are indeed maximizers for the obstacle points shown, and that up to symmetry these two triangles are the only maximizers.

Of course, none of this is a proof of anything.

Some notes on the program: 

  • Calculating area is much faster than checking validity. That's why I calculated 166,167,000 triangle areas first, and then checked only the largest triangles for validity. It would have been impractically slow to work in the reverse order (checking 166,167,000 triangles for validity and then calculating the areas for the subset found to be valid). Calculating 166,167,000 triangle areas took about 12 hours. (If I do this again, I'll see if I can optimize the area calculation.)  
  • In the validity test, I'm using barycentric coordinates to determine whether a given point is contained by a given triangle. I'm not sure the barycentric test is faster than one based on same-side-of-line considerations, but so far I haven't needed to worry about speed in this phase of things. (Also, barycentric coordinates make it easy to be quantitative in various ways, for example identifying triangles that are close to valid, as shown in the animation above.)
  • 166,167,000 triangles is a lot of triangles, so the program never holds all of them in memory. Instead of generating all those triangles ahead of time, I put the area test inside a threefold-nested loop over the thousand randomly selected points. Triples of points passing the area test were added to a steadily growing list of large triangles. 

Sunday, September 11, 2016

Woohoo! Saturday School Is Back!

We took a break from Saturday School over the summer, doing math from time to time but not on a schedule. Now that regular school has started, we're getting back into the swing of things. This weekend my younger began learning the standard addition algorithm, and my elder started working toward multi-digit multiplication.

I thought people might like to have the latter set of worksheets, so I'm posting them here (PDF).

  • The first page is a skills brush-up.
  • She then filled out the second page, while I co-piloted as necessary.
  • Finally she completed page 3, which summarizes the conceptual work of page 2.


The last question on page 3 is an unrelated question about perfect numbers. At dinner the night before, I had remarked that 6 is called "perfect," because if you add up the factors of 6 not counting 6 itself, you get 1 + 2 + 3 = 6. Perfect numbers are fun (here is some cool background on them) but my underlying purpose in asking about them was actually to provide a bit of brush-up with multiplication facts.

Getting back to the worksheets, you'll see on page 2 that I use a visual model to make the distributive property concrete. However, there are some differences between page 2 and what one often sees in math textbooks. One difference is that I'm not referring anywhere to area. This isn't facially an area model. Instead, we are counting things. It's true that the things being counted are squares, so that if we conceive of the squares as area units, then we are also finding the total area of the strip. And as we get closer to the time of multiplying fractions, I'll be casting the squares as area units. But right now, I'm not introducing that language or activating those concepts in the student's mind. I'm staying close to the elementary meaning of multiplication, which is to count things arrayed in groups. (I did choose squares with an eye toward area models and fraction operations later on; thus, the diagram isn't baskets of flowers or the like.)

The second difference is that the diagrams on page 2 are numerically accurate. I didn't merely gesture at the ideas by drawing rectangular fields of color with arbitrary dimensions. In these diagrams, you can actually count all of the squares, and if you do then you will get the right answer.

I think that in the early stages of this kind of learning, textbooks often assume too much about what students do and don't understand about the visual representations the textbook uses. The author understands the elasticity of the representation scheme, but I doubt that all of the students do. To motivate multiplication as a better alternative to counting, let's be sure that multiplying and counting give the same answer! And in a case like this worksheet, where I'm using the distributive property to structure the multiplication into easy steps, I want each of those steps to give the right answer too. I'm suspicious of representations that ask beginners to think metaphorically.

Finally, in page 3 I'm making sure to summarize the visual/concrete work symbolically. Symbolic calculations like those on page 3 are going to generalize not only to fractions, but also to variables, expressions, area integrals, quantum-mechanical expectation values, you name it.

What's next? For the younger, we'll start practicing easy right-to-left vertical additions (no regrouping, or regrouping in one place). And I want to make sure there are no cobwebs about place value, by using the excellent place-value mini-assessment on achievethecore.org. For the elder, we'll begin building fluency with multiplying a multi-digit whole number by a one-digit whole number. As that takes hold, we can begin stacking those calculations per the standard multiplication algorithm.

Wednesday, August 10, 2016

The Rectangle Area Formula

In the rectangle area formula A = × W,
  • A is the number of unit squares needed to tile the rectangle
  • L is the number of length units when you measure one side of the rectangle
  • W is the number of length units when you measure an adjacent side of the rectangle.

The reason the formula works has to do with what the × symbol means.

Today let's confine our analysis of the formula to whole numbers, because I'm thinking about the formula today from the perspective of a young student who hasn't yet absorbed fractional quantities or fraction operations. For whole numbers, the simple interpretation of m × n is that it stands for the number of things in m groups of n things each.

In the context of a rectangle with, say, length 7 units and width 5 units, what are the groups, and what are the things?

If one side of the rectangle consists of 7 length units, then we can draw lines to divide the rectangle equally into 7 strips. There are as many strips in the rectangle as there are length units in the side.

We might have done the same thing on either of the two adjacent sides, which would have resulted in the number of strips being 5.

If we draw both sets of lines, then each of the strips gets divided into squares (question: why perfect squares?). For example, the 7 strips would turn into 7 groups of 5 squares each. Remembering what × means, the total number of squares must then equal 7 × 5.

This argument is pretty obviously sound for any rectangle whatsoever with whole-number side lengths, so we can say that for any such rectangle,
The number of unit squares needed to tile a rectangle equals the number of length units that fit along one side multiplied by the number of length units that fit along an adjacent side.
This is what is meant by "Area equals length times width."

Some drawbacks I have seen in the way some math textbooks teach A = L × W:

  • Doing a poor job establishing what m × n means altogether. Then the book has no chance of doing a proper job with area, or with anything else that relies on multiplication.
  • Doing a poor job with basic measurement concepts. Then it isn't clear what a length measurement means, or what an area measurement means. You can't relate two ideas you don't grasp individually.
  • Being unclear about the difference between area and length. An example of this would be blurring the distinction between iterations of length units along the boundary and iterations of area units just inside the boundary. One book says, "The number of squares in one row is the same as the length in unit squares of the rectangle." What is an object's "length in unit squares"? Unit squares are area units, not length units. A correct way to say it would be, "The number of squares in one row is the same as the number of length units in the edge of the rectangle." Maybe it sounds convoluted, but in mathematics, words mean things; you have to combine the words into statements that are true, or at least not hostile to the distinction at hand.
  • Taking too long to orchestrate the reasoning involved.

Sunday, August 7, 2016

Points In A Circle Problem: A Wild Guess for N = 3

With reference to the optimization problems in my last post, I found myself most interested by the N = 3 case:
Place 3 points in the interior of a unit disk so that the largest triangle not containing any of the points has least possible area.
I should actually clean up my language here, because the maximum area could be attained by two or more different triangles. It ought to say, "a largest triangle," not "the largest triangle." Or perhaps I could say:
Place 3 points in the interior of a unit disk so that the largest area among all triangles not containing any of the points is as small as possible.

In today's post, I thought I'd share my first wild guess at a solution to the N = 3 problem:



The dots are each located about 0.321 units from the center of the disk, and the configuration has 120-degree rotational symmetry.

Given my opening move, what is player two's best response? What is the largest area of any triangle that doesn't contain any of my points within its interior? Feel free to email me a sketch at zimblogzimblog@gmail.com.

Remember, it's OK if the boundary of your triangle touches one or more of the points, and it's OK if one or more of your triangle's vertices lies on the boundary of the disk.

So far, the greatest area I've been able to enclose with a triangle that doesn't contain the above three points is about 0.829 square units. But I haven't searched systematically or at length, so I'm far from certain that this is the greatest area possible. If any reader sketch appears to exceed that value, I'll let you know.

One more reflection on the problem: is it well posed? There are infinitely many triangles not containing the three given points, so one must consider the possibility that among them no greatest area exists. I guess I doubt that happens here—basically, because we allow touching of boundaries—but I haven't proven it.

Saturday, August 6, 2016

An Optimization Problem

Place a point in the interior of a unit disk so that the largest triangle not containing that point has least possible area. 
To describe the problem another way, suppose you and a friend are playing a game. You begin by drawing a dot somewhere inside the disk. Your friend then tries to draw the largest possible triangle (in terms of area) that doesn't contain the dot you drew. No point of your friend's triangle may lie outside the disk, but it's fine if one or more vertices of the triangle touch the boundary of the disk. It's also OK if the boundary of the triangle touches the dot—the dot just can't lie within the triangle's interior. Your objective is to place your dot in such a way that your friend's best triangle will be as small as possible.

Looked at from a distance, the goal here is to minimize a maximum quantity. Problems like that are common in many fields, including game theory (where the goal is to give your opponent only bad options from which to choose), investments, engineering, and quantum mechanics (in my research, I once used a min-max technique to shed a little light on the Heisenberg uncertainty principle).

Getting back to the problem at hand, I think I have the answer, but I haven't made sure of it.

You might also try some generalizations—these I haven't even begun to think about:

  • Place 2 points in the interior of a unit disk so that the largest triangle not containing either of the points has least possible area.
  • Place 3 points in the interior of a unit disk so that the largest triangle not containing any of the points has least possible area.
  • Place N points in the interior of a unit disk so that the largest triangle not containing any of the points has least possible area.
  • Place N points in the interior of a [choose shape] so that the largest [choose shape] not containing any of the points has least possible [length, area, volume].
  • Place N points randomly in the interior of a unit disk. What is the expected value of the area of the largest triangle not containing any of the points?

Feel free to share your results and partial progress on any of the problems. If the problems are standard among researchers (as they may well be), feel free to post some citations as well.

Enjoy!

Friday, August 5, 2016

Current Simulation Results

UPDATE September 24, 2016

This page will always contain my most recent simulation results. A shortcut to this page is available at www.tinyurl.com/ZimblogElection2016. Methodology for the simulation is here.


Most recent distribution of outcomes



















Most recent battleground list

TBD


Most recent detailed results

State (EV) D Participation D Win Prob. R Participation R Win Prob.
Alabama (9) 5.6% 0.05 96.1% 0.95
Alaska (3) 5% 0.05 95.3% 0.95
Arizona (11) 54% 0.5 56.8% 0.5
Arkansas (6) 5.4% 0.05 95.6% 0.95
California (55) 98.6% 0.95 11.3% 0.05
Colorado (9) 53.1% 0.5 56% 0.5
Connecticut (7) 62.3% 0.6 43.4% 0.4
District Of Columbia (3) 95.1% 0.95 5.4% 0.05
Delaware (3) 75.9% 0.75 26.5% 0.25
Florida (29) 60.4% 0.5 68.8% 0.5
Georgia (16) 55.5% 0.5 60.2% 0.5
Hawaii (4) 95.3% 0.95 5.7% 0.05
Idaho (4) 5.2% 0.05 95.6% 0.95
Illinois (20) 67.2% 0.6 52.6% 0.4
Indiana (11) 44.3% 0.4 66.5% 0.6
Iowa (6) 52.1% 0.5 53.8% 0.5
Kansas (6) 26.7% 0.25 77.8% 0.75
Kentucky (8) 5.6% 0.05 95.9% 0.95
Louisiana (8) 26.9% 0.25 78.3% 0.75
Maine (2) 60.5% 0.6 41.3% 0.4
MaineCD1 (1) 75.7% 0.75 25.8% 0.25
MaineCD2 (1) 50.3% 0.5 50.9% 0.5
Maryland (10) 95.7% 0.95 6.1% 0.05
Massachusetts (11) 95.6% 0.95 6.3% 0.05
Michigan (16) 56% 0.5 59.9% 0.5
Minnesota (10) 63.3% 0.6 46% 0.4
Mississippi (6) 26.9% 0.25 77.6% 0.75
Missouri (10) 43.5% 0.4 66% 0.6
Montana (3) 25.4% 0.25 76.3% 0.75
Nebraska (2) 5.2% 0.05 95% 0.95
NebraskaCD1 (1) 5.2% 0.05 95.1% 0.95
NebraskaCD2 (1) 40.7% 0.4 60.6% 0.6
NebraskaCD3 (1) 5.1% 0.05 94.9% 0.95
Nevada (6) 52.4% 0.5 53.6% 0.5
New Hampshire (4) 51.4% 0.5 53% 0.5
New Jersey (14) 64.6% 0.6 48.2% 0.4
New Mexico (5) 61.7% 0.6 42.8% 0.4
New York (29) 83.1% 0.75 39.5% 0.25
North Carolina (15) 55.4% 0.5 60.2% 0.5
North Dakota (3) 5.2% 0.05 95.5% 0.95
Ohio (18) 56.5% 0.5 61.1% 0.5
Oklahoma (7) 5.4% 0.05 95.8% 0.95
Oregon (7) 62.4% 0.6 44.1% 0.4
Pennsylvania (20) 57.2% 0.5 62.6% 0.5
Rhode Island (4) 75.8% 0.75 26.9% 0.25
South Carolina (9) 43.1% 0.4 65% 0.6
South Dakota (3) 25.6% 0.25 76.6% 0.75
Tennessee (11) 5.5% 0.05 96.2% 0.95
Texas (38) 53% 0.4 82.5% 0.6
Utah (6) 26.8% 0.25 77.9% 0.75
Vermont (3) 95.3% 0.95 5.6% 0.05
Virginia (13) 54.2% 0.5 58.3% 0.5
Washington (12) 78.1% 0.75 30.5% 0.25
West Virginia (5) 5.3% 0.05 95.6% 0.95
Wisconsin (10) 53.2% 0.5 56.1% 0.5
Wyoming (3) 5% 0.05 95.3% 0.95
Sat 24 Sep 2016



Cumulative Animation of Simulation Results




video



Wednesday, August 3, 2016

My Election Simulation for 2016

In 2008, I used a simple election simulation to identify battleground states and decide where to spend my time getting out the vote. Eight years later, there exist plenty of authoritative simulations (Sam Wang, FiveThirtyEight, The Upshot). All of them will eventually agree on where the battlegrounds are. Still, I like computer programming! I also like being able to run custom analyses that I can't get from the big players. So I'll be simulating the 2016 election in my own amateur way.

How the Simulation Works

1) For each state, I produce a probability distribution (d, r, l, g) telling how likely it is that the state's electoral votes will go Democratic, Republican, Libertarian, or Green.

2) I then simulate the election 100,000 times, using the probabilities in (1) to assign electoral votes for each state.

3) Using the ensemble of simulation outcomes, I can determine a couple of key quantities:

   a) The fraction of simulations in which each candidate wins. This is the headline win probability estimate. 

   b) The fraction of simulations in which state X goes for candidate Y when candidate Y wins the presidency.

I can use (b) to determine candidate priorities, which I define intuitively as "states that a candidate needs but doesn't yet have." In other words, if the simulations have state X going for candidate Y in a pretty high percentage of candidate Y's wins, yet currently the probability of candidate Y winning state X is pretty low, then I consider state X a priority for candidate Y. The battleground states are those that appear on both candidates' list of priorities. 

So far no states seem likely to go for third-party candidates, so quantity (b) and the ensuing priorities analysis is restricted to the D and R candidates. 

Results So Far

I've run the simulation four times since July 23. Here is today's output:



For what it's worth, the D candidate wins in 72% of today's simulations, while the R candidate wins in 27% of today's simulations. The election goes to the House of Representatives in about 1% of today's simulations.

What I'm really after however is the battleground analysis, which will inform my own decision making about how to participate in the election come Fall. Right now, the battlegrounds according to my simulation are:

Battleground (EV) D Participation D Win Prob. R Participation R Win Prob.
New York (29) 82% 0.75 44% 0.25
Illinois (20) 79% 0.75 37% 0.25
Washington (12) 78% 0.75 32% 0.25
2016-08-03

Currently I'm arbitrarily defining a priority to be a state for which the participation rate exceeds the party's probability of winning the election as a whole, but for which the win probability is less than 1. The three states above show up on both party's priority lists. (For example, New York is a battleground because 82% > 72%, 0.75 < 1, 44% > 27%, and 0.25 < 1.)

We've been hearing a lot about Pennsylvania and Ohio—why aren't they on my list? In my simulation, Ohio and Pennsylvania both participate in only 56% of D victories, so they aren't urgent enough to make it onto the D priority list according to the threshholds I set. My list is intentionally short, because I can't be everywhere; if on the other hand you're a well funded political campaign with thousands of volunteers to deploy, then you probably do include Ohio and Pennsylvania on your list of must-haves.

Every time I run the simulation, I get a histogram like the one above. To show the history of the simulation, I can turn all of my extant histograms into frames for a movie. The problem is that because I don't run the simulation every day, or on any kind of regular schedule, the histograms aren't evenly spaced in time. To fix this, I use linear interpolation on pixel values to morph one histogram into the next, in such a way that the resulting movie has one frame per day. Here is what the movie looks like now, at the beginning of the season:

video


Eventually I will write a blog post specifically to hold simulation results, and I'll probably update that post from time to time. So as the movie gets longer, that post is where you can go to watch it. 


Details (Important)

Turning state polling data (D%, R%, L%, G%) into state win probabilities (d, r, l, g)

Suppose you had a trustworthy poll telling you the percentage of likely voters planning to vote for candidates D, R, L, and G. How would you turn that information into win probabilities for the candidates? 

There are principled ways to do this, but using them would have required me to learn something before getting started, and in my fever to begin writing code I chose a less principled path. Here's what it looks like. Suppose for example that we have polling data (46.3%, 44.8%, 4%, 0.5%). What I do first is normalize the percentages so they sum to 100%. That gives (48.43%, 46.86% 4.18%, 0.52%). The idea behind this is, first, that we ought indeed to allocate undecided voters in some way—the polls are supposed to be restricted to likely voters—and, second, it seems reasonable to allocate undecided voters proportionally according to what is known about preferences among those who have made up their minds. (You could hold out hope that undecideds will break for your candidate, but what is the polling evidence that it will happen?)

Now, this poll we are talking about is based on a sample, typically around 1,000 voters. So it is an imperfect predictor of what will happen when the real "poll" happens at the polling place on election day. In the simulation, I allow that multiple outcomes on election day are possible. Specifically, I allow that the vote share for each candidate on election day might be anywhere from 6 percentage points less than the poll result to 6 percentage points greater. (Of course, I don't allow percentages to be less than zero or greater than 100!) I chose the number 6 to be two standard deviations, where one standard deviation is 3 percentage points, based on a polling sample of around 1,000. 

So in our example, the D candidate might end up with as little as 42.43% of the vote or as much as 54.43% of the vote. Of course, given the poll data we have, neither of these outcomes should be considered likely. If an outcome differs from the poll result by z%, then I assign it a probability proportional to ez2/18, again corresponding to a standard deviation of 3 percentage points. 

Working to a mesh resolution of 0.1, I generate a list of all possible four-tuples (D%, R%, L%, G%) consistent with the ±6 point realm of possibility. The probability of any one of these four-tuples is proportional to the product of the four single-candidate probabilities calculated using the exponential factor above. In this way, I produce a probability distribution over election-day outcomes for the state in question.

Now I choose one of the four-tuples to be the election day result (choosing randomly but according to the probability distribution), and simply look at the four-tuple to see which candidate got the highest vote share that day. I do that 10,000 times. The win probability for each candidate is defined as the fraction of the simulations in which that candidate wins.

Examples:
  • Given hypothetical polling data (25%, 25%, 25%, 25%), a recent debugging run produced candidate win probabilities (0.2512, 0.252, 0.2451, 0.2517). In several test runs using (25%, 25%, 25%, 25%) as an input, the output probabilities were always 0.25 when rounded to the nearest hundredth.
  • Given hypothetical polling data (0, 49, 51, 0), a recent debugging run produced candidate win probabilities (0, 0.31, 0.69, 0).
As you can see, having 51-49 lead in a trustworthy poll is a good position to be in. It gives you 2-to-1 odds of winning.

(By now I probably could have learned enough to produce these numbers in a principled fashion, but oh well.)

It takes 1–2 minutes for my laptop to turn a single state poll (D%, R%, L%, G%) into a set of win probabilities (d, r, l, g). I have to do it for all 56 electoral-vote-granting entities, which takes about 2 hours. In case you are wondering, the 56 electoral-vote-granting entities are the 48 states, the District of Columbia, Maine-At-Large, Maine Congressional District 1, Maine Congressional District 2, Nebraska-At-Large, Nebraska Congressional District 1, Nebraska Congressional District 2, and Nebraska Congressional District 3.

Once I have the state-by-state probabilities, simulating the national election 100,000 times is fast. It wouldn't be hard to avoid the simulation approach by using the polynomial method, but the very elegance of that method means that it suppresses individual state scenarios, which I'm interested in. (The histograms are also fun to look at.)

I should say that I'm accounting for the L and G candidates not because they have an appreciable chance of becoming president, but because they might conceivably affect state-level results. I think there are some states where the L candidate could attract a significant share of the vote. As for G, well, if there's a Nadering on the way it would be helpful to see it coming. In other words, I am including L and G not to estimate their win probabilities on either the state or national level, but in order to be able to model their indirect effect on the D and R probabilities at the state and hence national level. One of the reasons I built my simulator (apart from liking to code) is that I'm not sure I have faith yet in the big players' approach to modeling the effects of third-party candidates this cycle. 


What if you don't have state polling data?

Right now, polling data for each state isn't conveniently available. I don't know where to look for a trustworthy polling breakdown (D%, R%, L%, G%) for each state. Many of the polls you see reported in the news don't include L or G. And there's a whole set of decisions to make about which polls to pay attention to based on how recent they are, who conducted it and how they performed in predicting past elections, etc. 

If I can get better at consuming the information available online, I will start using poll data. In the meantime, I have a workaround. It's similar to the approach I used in my Senate race simulation. Here I look up the RealClearPolitics rating for each state and simply replace the rating with a probability for that state. Specifically,
  • "Solid" for a candidate means I assign win probability 0.95 for that candidate.
  • "Likely" for a candidate means I assign win probability 0.75 for that candidate.
  • "Leans" for a candidate means I assign win probability 0.60 for that candidate.
  • "Toss Up" means 50/50 chance for D and for R.
(There are no third parties in the workaround.) I chose the numbers 0.60, 0.70, and 0.95 more or less arbitrarily. I didn't want there to be any certain outcomes, so I made "solid" 0.95 instead of 1. I didn't want "lean" to lean too far, so I made it 0.6 instead of something like 0.67. (There didn't seem much point to making the "lean" probability less than 0.6, since it seemed that if the probability was in the fifties, RCP would just make it a toss-up.) As for the 0.75 assignment in the case of "Likely," one could argue this number should be as low as 0.67 or as high as around 0.85. I went sort of middle of the road. I haven't done a sensitivity analysis on these numbers.

At any rate, with state-by-state probabilities in hand, we proceed as described in the original method, simulating the national election 100,000 times to determine priorities and battleground states. This is the method I'm currently using, and it is the method that led to the results shown above (note for example the probabilities 0.75 and 1 − 0.75 = 0.25 in the table of battleground states, which are fiat values as noted in the bullets immediately above).