Stacoscimus Blagoblog

Due Respect

Cite your sources openly, for you stand on their shoulders.

So, of course, I reached out, over email. If anybody has been influential in one's life or one's work, even in the slightest way, it couldn't possibly hurt to let them know it. And in fact, it may be exactly the encouraging sign they need — this very moment — to continue.

I'd only contacted Bernie once before, for similar reasons, in 2011. His book, Statistical Theory, was influential for me for two reasons. First, he communicated clearly and lucidly, with an empathy for the reader. Second, and more importantly, he placed the idea in my head that a historical narrative is pedagogically more effective than a deductive one. In his words:

... I have been gradually coming to the conclusion that in mathematics generally, the best pedagogical order for the average student—and we have lots of them—is the historical one. We who have "been through it all" appreciate elegance and generality, but we tend to forget that we ourselves did not begin with the ultimate, unified, and general approach in our own learning process. (Lindgren, 1976)

I realized that I had made a similar statement—albeit with a different spin—in a recent post, when I tautologically said that our minds are usually quite typical. It occurred to me how much an influence he has had on my attitudes toward learning and teaching.

Cite your sources openly, for you stand on their shoulders.

Hi Bernie,

I thought about you just the other day, when I wrote a blog post about approaching mathematics obliquely.

I paraphrased your introduction to Statistical Theory, where you acknowledged that most students are typical, or something very similar.

Anyway, I thought of you, so I wanted to write and let you know. Here's the post:

http://www.stacoscimus.com/spatial-intuition-in-mathematics/

I trust you're doing well.

Very best,

Yuri

I didn't imagine today that I'd be dealing with death so suddenly. Nobody ever does, and I can't help but imagine that Bernie wasn't thinking too hard about it either, based on the repose and authority with which he taught me — through authorship — how to approach the arts of statistics and education intuitively.

Reading remembrances, I've learned that Bernie was not only an educator, he was a musician, organist, and singer who held soirées that spread the musical love. And quite importantly, he was a figure skater, too.

So here's a belated farewell salute to Bernie Lindgren, from somebody who knew him only barely, but implicitly. There's no surprise that he was a wonderful man, for an educator's heart is built on benevolence.

Educational Philosophy, and Stuff

I'm dedicated to learning and helping learn more about music and one another. The more you know, the more you grow, because learning brings knowledge and knowledge is power.

Right.

Or perhaps try this: See one, do one, teach one. This has been a slogan among the medical profession for some time (maybe back to Halsted). It's shown up in other disciplines as well. Medicine and music have a lot in common: lots of humanity, lots of technical skill, lots of intuition, and lots of practice. See one, do one, teach one as an apprenticeship slogan applies very, very well to musical skill.

It's oh-so-apocryphally quipped by American utilitarian John Dewey:

"One never truly learns something until he has confronted the necessary thinking required to teach that something to another learner."

Regardless where the misquote arose, technology has brought us far since then, and the Internet is the most profound information-sharing medium yet. It's the perfect vehicle for see-do-teaching, but unfortunately the first gets all the mic time.

Let's embrace the exploratory spirit of this three-pronged skill-based philosophy wholeheartedly, and promote all three opportunities. And the more we can shape the Internet in its spirit, the better!

On Not Getting Slashdotted

I can't laud A Small Orange highly enough.

This wonderful hosting company provides a fantastic product, unparalleled customer service, a great user experience (with the cpanelish tooling they provide), and an all-around feeling of good juju, even if you're not a fan of African shamanic arts.

Nonetheless, I'm leaving A Small Orange because of new interests --- especially the triplintersection of content creation, content delivery, and content experience.

Three years ago, I'd have despised anybody who referred to creative expression as (the noun) "content", rather than giving creation its due respect as the core of all human endeavors. I continue to feel a deep, resounding revulsion toward the moniker "creatives", alike as it is with the epithets "working class" and "homeless". People who use these terms are not necessarily evil, but they do -- to the extent that they have power to hire, fire, or donate -- consign others to a station of subservience.

Charitably, what we're really seeing is the intersection of two different value systems, arising from two distinct roles. The content producer can't thrive without a content deliverer (without whom there is no monetary reward). And a content deliverer can't exist without, well, content. Likewise, creativity needs an audience, a critical community, a sounding board, a social ballpit, whatever -- to even begin to create. And there must be a way to connect two humans, one who produces, well, stuff, and another who responds to said stuff.

Uncharitably, in my experience, people who are creative tend to be wielded as weapons of artistic mass destruction, producing the propaganda of capitalistic enterprise. Nonetheless, I've understood that it's not incumbent on content deliverers to deliver good content: it's incumbent on "content creators" to self-publish.

So I will.

Self-promotion is a concomitant of self-publishing: not a moral failing, but just part of the gig.

But it's not always easy. Artists often have fear-induced hang-ups: success, exposure, disdain, disapproval, influence, ridicule, and the like. In other cases, artists are aggressively prolific, spewing their product into the world with abandon, and saturating anybody nearby with their ectoplasm.

Myself, I'd like to try to push into the world some solutions to problems other people may be having, or touch on ideas others may be mulling over. In the end, the best anybody can do is to establish a real community of like minds. And if the community starts growing quickly, it's best not to have a lid.

A Torch-Installing Ansible Playbook

It's surprisingly difficult to wrap Ansible provisioning around Andrej Karpathy's very instructive and useful char-rnn package. Ansible has its quirks, which doesn't make everything straightforward. What's more, Torch has made a point of making it easy for researchers to get started with Lua (an amazing scriptable wrapper of C), and their installation scripts consequently assume an interactive environment rather than a production-ready system-wide one. Though there are Docker solutions available, I wanted to see how things would play in the Vagrant/Ansible universe.

Happily, it's not too difficult to guess-test-revise your way into a fragile but workable Ansible playbook. So with a quick cat !$ | pbcopy, here's what I came up with:

---
# Provision a box to do the computation.
#
# This is meant to be useful for local Vagrant virtual machines as well as for
# spot instances in the AWS cloud, for instance.

- name: Provision the box
  hosts: all
  become: yes
  tasks:
    - name: Upgrade aptitude packages
      apt:
        upgrade: "full"
        update_cache: yes

    - name: Install aptitude packages
      apt:
        name: "{{ item }}"
        state: present
        update_cache: yes
        cache_valid_time: 3600
      with_items:
        - git # Needed to download Torch in the first place.
        - libssl-dev # For luacrypto dependencies.

# Torch automates installation in a straightforward way; might as well use it.
# This is straight from http://torch.ch/docs/getting-started.html
- name: Install torch within the user account
  hosts: all
  tasks:
    - name: Grab Torch Repository
      git:
        repo: "https://github.com/torch/distro.git"
        dest: "~/torch"
        recursive: yes
        force: yes
        accept_hostkey: yes

    - name: Install Torch Dependencies
      shell: "cd ~/torch; bash install-deps"

    - name: Install Torch with default options
      shell: "cd ~/torch; yes | ./install.sh"

# Install the LuaRocks dependencies for char-rnn. This command's path is
# a bit fragile, depending on changes of torch's installer.
- name: Install project dependencies
  hosts: all
  tasks:
    - name: Install luarocks dependencies
      shell: "torch/install/bin/luarocks install {{ item }}"
      with_items:
        # Listed in https://github.com/karpathy/char-rnn README document.
        - nngraph
        - optim
        - nn

While I'm working toward a series of musical experiments using this as a launching-point, hopefully others will find this to be helpful.

Spatial Intuition in Mathematics

Approaching the world of statistics from the perspective of a musical empiricist, I was actually a bit surprised to learn that the core curriculum of OSU's Statistics PhD program was so mathematical. After all, it would seem that study of uncertainty itself -- and statistical methods for quantifying it -- ought to be motivated by either philosophical or practical concerns. It therefore seemed a non sequitur that Statistical Theory I (620) would begin with mathematical first principles and proceed deductively.

I was appropriately humbled, then, when I became aware of how imprecise natural language truly is compared to mathematical ones. Statistical theory can't be defined with sufficient precision using the language of a philosopher or practician (hat tips on this topic to both Bertrand Russel and Richard Feynman). Regardless, as a non-mathematician, more suggestions for how to think like one would have been very, very useful.

In honesty, I still can't claim to have the same mature intuition for mathematical language and relationships that I have for musical ones, notwithstanding the considerable overlap (and arguable essential equivalence) of the fields. Nonetheless, I have begun to enjoy a certain amused elan when I stumble upon unexpected fluency with certain mathematical ideas, despite their basicness.

It seems to me that most universally applied aspects of statistics -- namely mathematical models to help estimate unobservables -- cannot be intuitively understood without geometric imagination. They probably can't be effectively taught unless distance is introduced as a core concept. The typical human mind (and most of ours are typical) has capabilities reflecting adaptive solutions to evolutionary imperatives, and, at least for me, I can think of no better crutch than visual and spatial reasoning with which to approach statistical modeling. This idea is obviously not a new one.

What I've only now begun to perceive, however, is that mathematicians must be thinking and reasoning spatially, even when they're actually representing this reasoning by manipulating equations, functions, and other symbols of notation. Even if the last several centuries of mathematical work have been dedicated to formalizing ideas into algebra for more precise (or even automated) manipulation, I'd be willing to bet that geometrical thinking still drives our intuition.

Perhaps my more mathematically-minded friends would be able to confirm or deny!

How to get Started with R and Statistics

I'm often asked for advice about how to get involved with statistics and the R statistical programming language. Without a doubt, R is the premier open source tool for data analysis, owing to its enormous library of visualization and model-fitting packages, its relatively straightforward data munging abilities, and its in-depth online documentation. What's more, R is largely built by a community of educators, and quite a large number of datasets are included in the standard library. For all its quirks, it remains a tremendous pedagogical and practical tool.

But without a solid grasp of the underlying statistical theory, it can be difficult to suss out just how to interpret the outputs of, say, glm(). Both statistical knowledge and R knowledge provide a basic foundation to work with. If I were designing a basic curriculum, it would include four texts -- one each on practical statistics, R, statistical theory, and statistical learning.

Practical Statistics

For a first introduction to practical statistics, the OpenIntro Statistics, Second Edition really cannot be beat, owing both to its intuitive presentation of almost every foundational concept in statistics, and to its price -- it's a freely available pdf download written by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel. More than merely a book on statistics, the OpenIntro Statistics text is a primer on experimental design and the scientific method. The book uses practical motivation (and common views that R would produce) to develop ideas from the basics of probability distributions through point estimation and hypothesis testing, arriving finally at both normal-theory and logistic linear models. If I were to teach a course on statistics for beginners again, this would be my chosen text.

Among the strongest features of the book are its clear diagrams, most of which seem to be set using R, and the authors' focus on visualizing data in terms of distributions. Whereas many texts will display distributions and histograms sparingly, the OpenIntro makes them a centerpiece of theoretical explanations. After all, they form the foundation of almost all practical data explorations as well. For example, here's an early caveat to understanding distributions only in terms of mean and variance:

Also available are the datasets required for the exercises, indexed by a chapter guide in a bulk dataset download. However, the book itself does not offer a solid introduction to R.

R

There are several different introductions to R available for people of different skill levels. Most folks who ask me for a way into R are software engineers who already have some understanding of other languages, and are interested in picking up some basic ideas about data manipulation at the same time. So far, my favorite text to recommend is Peter Dalgaard's Introductory Statistics with R, Second Edition. Though it's costly, and DRM'ed by Springer (ick), I'm yet to find a more comprehensive and useful overview of R as a programming language and practical tool. For example:

All of the major data structures are clearly explicated, as are the core plotting, modeling, and hypothesis testing functions. However, explanations of the underlying theory behind these topics is not comprehensive -- the book is best used after statistical theory has been learned.

Statistical Theory

For the more mathematically oriented and those interested in a calculus-driven understanding of statistical theory, I really cannot recommend Bernard Lindgren's 1976 Statistical Theory (Third Edition) enough. It's practically free these days, with used copies typically selling for under $5.00. The text is lucidly written, covers sets, combinatorics, probability, distribution functions, point estimation, and more -- all with refreshing clarity. The presentation is logical and mathematical, and really contributes toward developing geometric intuition about statistical constructs:

This calculus-based thinking is incredibly helpful to understand the basic idea of what a probability distribution really is, which distributions are helpful for which purposes, and how to construct completely novel models. For those interested in building real-time processing algorithms that leverage mathematical facts based on approximations -- rather than relying on brute force in offline precomputation -- this introduction to the toolkit is invaluable.

Statistical Learning

Outside of simple linear models, there exist entire worlds of classification and prediction problems that suggest nonlinear models or sophisticated projections, as well as problems that occupy high-dimensional spaces. For more experienced folks, Trevor Hastie, Robert Tibshirani, and Jerome Friedman have provided an instant classic in The Elements of Statistical Learning, available for free in PDF form. The cohesiveness of the book's chapter organization reflects a strong theoretical framework that unites most all concepts in the state of the art of statistical learning.

However, it is the same authors' introductory text -- An Introduction to Statistical Learning with Applications in R written with Gareth James and Daniela Witten -- that will really become the best friend of a newcomer to statistical learning techniques. The theoretic and the practical are well-married in this text, with inline code examples accompanying beautiful diagrams:

While linear algebra is part of the offering, a solid foundation in statistical theory is presumed. After the basics of statistics and R are already learned, this book will take the reader to the next level of model-fitting, and I recommend it highly.

Python Thought of the Day

Even experienced Python developers get bitten by the old mutable-default trick. Since function defaults are evaluated at the time the def statement is executed, you might run into some surprising behavior:

>>> def blop(blat=[]):
...     blat.append('derp')
...     return blat
...
>>> blop()
['derp']
>>> blop()
['derp', 'derp']
>>> blop()
['derp', 'derp', 'derp']

For developers unaware of how Python code is evaluated, this kind of bug can seem like a real "gotcha." The ever-common solution, if a new list is to be the default, is:

>>> def blop(blat=None):
...     if blat is None:
...         blat = []
...     blat.append('derp')
...     return blat
...
>>> blop()
['derp']
>>> blop()
['derp']
>>> blop()
['derp']

It's amazing how often this kind of thing shows up; it certainly took me a few wild-goose chases of bug hunts before this became a reflex.

Communicating Numerical Precision

At Nomi, we've built an inference engine that accepts real-time data from several different sources to provide a single best estimate of the traffic in a store. This estimate is produced as a floating point value in the world of real numbers, not as an integer value in the world of natural numbers, like counts really ought to be. Our inference engine operates on a continuous number line using probability distributions, so it's perfectly normal to have an expected value of a visit count to be something like 43.2476784927658.

But, of course, in no world is it possible to have, say, 43.2476784927658 visitors enter a store in a single day.

It's been an ever-present challenge to effectively communicate our level of confidence in the estimates. The goal, of course, is to be completely transparent and earn trust. However, different audiences are likely to have completely different ideas of what appears trustworthy and what appears suspicious.

Really, there are two goals to work toward:

  1. Providing maximal benefit for customers.
  2. Instilling maximal confidence in the data.

These two goals are complicated to meet simultaneously because the consumers of the numbers have different styles of sophistication. I'll describe four perspectives.

Statistical Perspective

Statisticians treat measured or calculated numbers as point estimates, and often describe precision in terms of statistical variance (or equivalently, a standard deviation). Both the point estimate and its calculated variance can be carried around with arbitrarily high precision, but inference is only made when combining the two in a formal a statistical test. A good statistical way to present estimated visit counts might be as a 95% confidence interval, wherein we expect 95% of intervals so constructed in experimental replications to contain the the true value. That is, perhaps we would want to say the visits fall within (249, 334) with 95% confidence.

Statisticians distrust people who present obviously estimated numbers without standard deviations or confidence bounds.

Scientific Perspective

Scientists developed the notion of significant figures to deal with measurement precision. They know that when weighing a chemical, small air currents cause a scale's last few digits to fluctuate. A scientist will record the mass by writing down all the stable "trusted" numbers, as well as the first unstable "untrustworthy" number. This is a heuristic, base-ten way of thinking about precision in measurements.

A chemist would distrust people who present estimated visits as 24,543 since it doesn't communicate measurement precision. Instead, a chemist might trust a number like 24,500 -- with three significant figures.

Savvy Intuitive Perspective

An analytics-savvy person without statistical or scientific training is still aware of the idea of measurement error, and probably knows about sampling error for polls. They might trust somebody who provides numbers with confidence intervals disguised as "+/- 5%", and may or may not care whether the point estimates are rounded, or just what the confidence level of the interval was.

They may or may not get suspicious if numbers are presented without these sampling errors.

Intuitive Perspective

An intuitive person with little experience interpreting measurements might be unfamiliar with concepts like precision, estimation, or measurement error. Instead, they might expect all numbers to be exact if they are counts and rounded if they're percentages. In this case, trust comes from producing a plausible value on first principles --- a natural number for a count, and nothing with decimal places at all.

A naive intuitive consumer of data might get suspicious if they see visit counts like 24,500 because having an exact multiple of 100 is unlikely. They might also distrust a set of percentages that do not add to 100% merely because of rounding error.

Javascript Thought of the Day

Now, Javascript surely doesn't have the best reputation around for being a well-designed language, but I often feel like being an apologist for it, much in the same way that I truly enjoy the ethos and aesthetics of Bash as a programming language in its own right. They might have warts, but they're not arbitrary. Although Gary Bernhardt's now-classic WAT highlights some ridiculous consequences of Javascript's type conversion system, they are nonetheless explainable in a consistent way. In the JSC interpreter:

> [] + []

> [] + {}
[object Object]
> {} + []
0
> {} + {}
NaN

So while this is certainly WAT, it also can be explained relatively easily. The + operator can either add numbers, concatenate strings, or be used to specify a positive number. So, [] + [] is really empty string, and [] + {} is really empty string plus the string representation of {}, which is '[object Object]'. In fact, the Node interpreter makes this clear:

> [] + []
''
> [] + {}
'[object Object]'

Then, the final two are interpreted by JSC as empty code blocks followed by something that must be coerced into a numeric value via its string representation. Roughly:

> + Number([].toString())
0
> + Number({}.toString())
NaN

So these at least have some straightforward logical explanations, based on understanding Javascript's parsing and type coercion. But how on earth could we allow this to happen?

> Boolean([0])
true
> [0] == true
false

That the path to a boolean using Boolean is separate from the path to a boolean using == might seem downright nuts, except when one realizes that the == operator is used for far more than truth comparisons. The two use entirely different algorithms for determining the final evaluation.

A Solitaire Investigation

At a recent family gathering, my father showed me a solitaire game that his father used to play. Its rules are simple:

  • The goal is to discard all cards.
  • Play one card at a time from the top of the deck, forming a single row of cards. Always play the next card to the right of the one preceding it.
  • You can remove three cards at a time in any combination from the ends of the row (3 from one side or 2 from one and one from the other).
  • The removed cards must add up to either 10, 20, or 30, with face cards counting as 10 and aces as 1.
  • The final discard may be four cards, which are guaranteed to add up to a multiple of 10.

I wondered whether or not there could be a strategy to this game, or if the outcome is strictly determined by the lay of the cards after dealing them. That is, are there some unwinnable shuffles? And can one's strategy affect the odds of winning given a certain shuffle?

Probability of Winning

A first question might be to ask what the probability of winning is given a simple greedy strategy of always removing cards when possible. Note that after each deal, there's the potential that multiple different removals might be possible.

def remove_cards(in_play):
    """Remove cards from an ordered list of denominations
       currently in play, and return the remaining cards
       after one removal.
    """
    remaining = in_play[:]

    # Cannot remove if length is less than three.
    if len(in_play) >= 3:

        # Three from the end.
        if sum(in_play[-3:]) % 10 == 0:
            remaining = in_play[:-3]

        # 2 from the end; 1 from the start.
        elif (in_play[0] + sum(in_play[-2:])) % 10 == 0:
            remaining = in_play[1:-2]

        # 1 from the end; 2 from the start.
        elif (sum(in_play[:2]) + in_play[-1]) % 10 == 0:
            remaining = in_play[2:-1]

        # All three from the start.
        elif sum(in_play[:3]) % 10 == 0:
            remaining = in_play[3:]

    return remaining

This will handle a single removal from the row of in-play cards, using a simple algorithm that arbitrarily favors removing the most recently played cards first. Now wrap this function in a game simulator that removes cards after each deal until the result does not change.

from random import shuffle

def simulate_game():
    """Simulate a new solitaire game."""
    # Four suits with numbers 1 through 10, plus face cards.
    cards = (range(1, 11) + [10] * 3) * 4
    shuffle(cards)

    in_play = []
    for card in cards:
        # Add a new card.
        in_play.append(card)

        while True:
            # Remove cards until it is impossible.
            previous = in_play
            in_play = remove_cards(in_play)

            if previous == in_play:
                break

    return in_play

If any cards remain in play after this game, we have lost. This makes it straightforward to find the approximate success probability.

def find_success_prob(n=100):
    wins = 0.0
    for _ in range(n):
        remaining = simulate_game()

        # A victory if we have either one or four cards adding
        # to a multiple of ten.
        if len(remaining) <= 4 and sum(remaining) % 10 == 0:
            wins += 1

    return wins / n

A simple greedy strategy that prioritizes removing the most recenty played cards yields somewhere around a 22% success rate (21.8605% with a million simulations). However, this does not answer the question of whether choosing a different strategy might make one more or less likely to succeed.

A Different Strategy?

In fact, there is some reason to believe that the opposite strategy might be more effective — removing cards from the beginning of the row FIRST. After all, if a row starts with [9, 9, 10, ...], then these cards are in a sense "stranded" until either a 10 - ace combination or a two arrives up. This is a fixed, known requirement, whereas it seems intuitively that there might be more leeway at the end of the list for new possibilities to open up.

To test this, I reversed the prioritization order in the discard_cards function and repeated the one million simulations. To make a long story short, reversing the strategy and favoring removing things from the left side of the row yields a 21.8360% success probability calculated with a million simulations.

Using R's prop.test function to perform a chi-squared test reveals the difference to be statistically nonsignificant, p = .676. And of course, just looking at the two values makes it clear that the difference would be practically insignificant in any case — both values are essentially 22%.

The Solitaire Gamble

At least in this version of solitaire, it appears to be a simple gamble whether or not a shuffle will reveal a working combination. Of course, with imperfect shuffling and repeated plays, it will become more and more likely that clusters of cards summing to 10 or multiples thereof will reappear, making the 22% chance of success pessimistic.

But regardless of whether or not strategy makes any difference, solitaire that depends solely on the shuffle apparently provides enough reward to sustain excitement and suspense. Of course, my father's father never had access to the Internet, either.

Older Entries