Fractal Data Stuart Reges, Principal Lecturer
.
 CSE Home About Us Search Contact Info

This page has a collection of resources from a talk given at the 2011 CS4HS workshop at the University of Washington.

I mentioned that the theme of the talk is the idea of using real world data as a way to make our courses more relevant and interesting to our students. I mentioned two examples from my intro programming class:

• I use data from the Social Security Administration about popular baby names as a programming assignment in which students show how the popularity of particular names has changed over time (1900 to 2000).

• I use zip code data to find zip codes within a certain search radius. I use a large zip code data file that gives the latitude and longitude of each zip code and I have a Java program that does the search.

I mentioned three kinds of distributions:

• Uniform: In this kind of distribution, you don't expect to see any special patterns for things like the first digit of a number. This is the boring case that you get with random values and with sequences that increase in a linear manner. They don't have much of a pattern to them.

• Gaussian: This is the normal distribution that we all studied in statistics class with the classic bell curve.

• Exponential: I focused most on this because we don't have very good intuitions about exponential phenomena.
When we think of fractals, we normally think of those pretty pictures you can produce with a fractal shape. Fractals have a property known as self-similarity. One way to think of it is that if you zoom in and out, you see the same kind of pattern. Many natural phenomena have this same property. Think of looking at a mountain range and zooming in and out. You tend to see the same kinds of patterns at every scale.

Exponential sequences have this same propert of self-similarity. And that gives them some curious properties. For example, if you have numbers that come from an exponential process, then you'll find that most of them start with a 1 (over 30%). The odd distribution of digits is known as Benford's Law. We see this property in all sorts of real world data.

We explored why this is so using an excel spreadsheet.

I used a program for counting the distribution of leading digits that is available either as a Java program or a Python program.

I mentioned three data sets as examples:

Stuart Reges