Stacoscimus Blog

Python Thought of the Day

Often new Python developers get bitten by the old mutable-default trick. Since function defaults are evaluated at the time the def statement is executed, you might run into some surprising behavior:

>>> def blop(blat=[]):
...     blat.append('derp')
...     return blat
>>> blop()
>>> blop()
['derp', 'derp']
>>> blop()
['derp', 'derp', 'derp']

For developers unaware of how Python code is evaluated, this kind of bug can seem like a real "gotcha." The ever-common solution, if a new list is to be the default, is:

>>> def blop(blat=None):
...     if blat is None:
...         blat = []
...     blat.append('derp')
...     return blat
>>> blop()
>>> blop()
>>> blop()

It's amazing how often this kind of thing shows up; it certainly took me a few wild-goose chases of bug hunts before this became a reflex.

On Not Getting Slashdotted

I can't laud A Small Orange highly enough.

This wonderful hosting company provides a fantastic product, unparalleled customer service, a great user experience (with the cpanelish tooling they provide), and an all-around feeling of good juju, even if you're not a fan of African shamanic arts.

Nonetheless, I'm leaving A Small Orange because of new interests --- especially the triplintersection of content creation, content delivery, and content experience.

Three years ago, I'd have despised anybody who referred to creative expression as (the noun) "content", rather than giving creation its due respect as the core of all human endeavors. I continue to feel a deep, resounding revulsion toward the moniker "creatives", alike as it is with the epithets "working class" and "homeless". People who use these terms are not necessarily evil, but they do -- to the extent that they have power to hire, fire, or donate -- consign others to a station of subservience.

Charitably, what we're really seeing is the intersection of two different value systems, arising from two distinct roles. The content producer can't thrive without a content deliverer (without whom there is no monetary reward). And a content deliverer can't exist without, well, content. Likewise, creativity needs an audience, a critical community, a sounding board, a social ballpit, whatever -- to even begin to create. And there must be a way to connect two humans, one who produces, well, stuff, and another who responds to said stuff.

Uncharitably, in my experience, people who are creative tend to be wielded as weapons of artistic mass destruction, producing the propaganda of capitalistic enterprise. Nonetheless, I've understood that it's not incumbent on content deliverers to deliver good content: it's incumbent on "content creators" to self-publish.

So I will.

Self-promotion is a concomitant of self-publishing: not a moral failing, but just part of the gig.

But it's not always easy. Artists often have fear-induced hang-ups: success, exposure, disdain, disapproval, influence, ridicule, and the like. In other cases, artists are aggressively prolific, spewing their product into the world with abandon, and saturating anybody nearby with their ectoplasm.

Myself, I'd like to try to push into the world some solutions to problems other people may be having, or touch on ideas others may be mulling over. In the end, the best anybody can do is to establish a real community of like minds. And if the community starts growing quickly, it's best not to have a lid.

CSS from the ground up

CSS is a wild and wooly world to dive into --- it seems that the lion's share of CSS floating around the web is not particularly well-written. After all, the visual paradigm it addresses can really encourage a "tweak-till-it's-right" kind of approach.

Starting from the ground up, on the other hand, is really the only way to fully understand what it is your site is doing. Luckily, the Pelican static site generator does a fantastic job providing a straightforward set of html templates known as the 'simple' theme. Over the course of two days it was straightforward to assemble a dynamic one-or-two-column layout, inspired greatly by Giulio Fidente's work.

One important thing I've learned about CSS is to avoid w3schools as a resource! There are so many inaccuracies, and poor advice. Better to use the Mozilla CSS docs, or better yet, go to the W3C horse's mouth directly. CSS-Tricks is also a fantastic resource.

Other lessons learned:

  • Avoid 'absolute' positioning and favor blocks and floats when possible.
  • Relative (ie., percentage-based) and absolute measures for layout elements don't really play nicely together. I used percentage-based widths at the body-level <nav>, <main>, and <footer> CSS stylings, and then drew things such as one-pixel borders and set pixel-based padding to <div> elements nested immediately within.
  • Use the @media queries provided by CSS to construct dynamic layouts!


There are plenty of posts around describing things that can be done at the command line with Bash history, but the first step into a smörgåsbord of functionality is almost always through a single convenient doorway. This doorway is !$, which Bash expands to the last argument of the previous issued command.

How often do you edit a file and commit changes?

$ vi data/science/
# Make some changes
$ git add !$
$ git commit -m 'What a nice change that was!'

Or how about moving a file before editing it?

$ mv some/long/path/to/a/file.mdown some/long/path/to/a/
$ vi !$

Though it looks weird onscreen, !$ is actually kinda enjoyable to type as well; it just feels good to fork the '1' and '4' keys while holding shift with the right pinkie. It's a straightforward and memorable gesture that the strange '!$' combination belies.

TL;DR: 10/10, would use again.

A Solitaire Investigation

At a recent family gathering, my father showed me a solitaire game that his father used to play. Its rules are simple:

  • The goal is to discard all cards.
  • Play one card at a time from the top of the deck, forming a single row of cards. Always play the next card to the right of the one preceding it.
  • You can remove three cards at a time in any combination from the ends of the row (3 from one side or 2 from one and one from the other).
  • The removed cards must add up to either 10, 20, or 30, with face cards counting as 10 and aces as 1.
  • The final discard may be four cards, which are guaranteed to add up to a multiple of 10.

I wondered whether or not there could be a strategy to this game, or if the outcome is strictly determined by the lay of the cards after dealing them. That is, are there some unwinnable shuffles? And can one's strategy affect the odds of winning given a certain shuffle?

Probability of Winning

A first question might be to ask what the probability of winning is given a simple greedy strategy of always removing cards when possible. Note that after each deal, there's the potential that multiple different removals might be possible.

def remove_cards(in_play):
    """Remove cards from an ordered list of denominations
       currently in play, and return the remaining cards
       after one removal.
    remaining = in_play[:]

    # Cannot remove if length is less than three.
    if len(in_play) >= 3:

        # Three from the end.
        if sum(in_play[-3:]) % 10 == 0:
            remaining = in_play[:-3]

        # 2 from the end; 1 from the start.
        elif (in_play[0] + sum(in_play[-2:])) % 10 == 0:
            remaining = in_play[1:-2]

        # 1 from the end; 2 from the start.
        elif (sum(in_play[:2]) + in_play[-1]) % 10 == 0:
            remaining = in_play[2:-1]

        # All three from the start.
        elif sum(in_play[:3]) % 10 == 0:
            remaining = in_play[3:]

    return remaining

This will handle a single removal from the row of in-play cards, using a simple algorithm that arbitrarily favors removing the most recently played cards first. Now wrap this function in a game simulator that removes cards after each deal until the result does not change.

from random import shuffle

def simulate_game():
    """Simulate a new solitaire game."""
    # Four suits with numbers 1 through 10, plus face cards.
    cards = (range(1, 11) + [10] * 3) * 4

    in_play = []
    for card in cards:
        # Add a new card.

        while True:
            # Remove cards until it is impossible.
            previous = in_play
            in_play = remove_cards(in_play)

            if previous == in_play:

    return in_play

If any cards remain in play after this game, we have lost. This makes it straightforward to find the approximate success probability.

def find_success_prob(n=100):
    wins = 0.0
    for _ in range(n):
        remaining = simulate_game()

        # A victory if we have either one or four cards adding
        # to a multiple of ten.
        if len(remaining) <= 4 and sum(remaining) % 10 == 0:
            wins += 1

    return wins / n

A simple greedy strategy that prioritizes removing the most recenty played cards yields somewhere around a 22% success rate (21.8605% with a million simulations). However, this does not answer the question of whether choosing a different strategy might make one more or less likely to succeed.

A Different Strategy?

In fact, there is some reason to believe that the opposite strategy might be more effective — removing cards from the beginning of the row FIRST. After all, if a row starts with [9, 9, 10, ...], then these cards are in a sense "stranded" until either a 10 - ace combination or a two arrives up. This is a fixed, known requirement, whereas it seems intuitively that there might be more leeway at the end of the list for new possibilities to open up.

To test this, I reversed the prioritization order in the discard_cards function and repeated the one million simulations. To make a long story short, reversing the strategy and favoring removing things from the left side of the row yields a 21.8360% success probability calculated with a million simulations.

Using R's prop.test function to perform a chi-squared test reveals the difference to be statistically nonsignificant, p = .676. And of course, just looking at the two values makes it clear that the difference would be practically insignificant in any case — both values are essentially 22%.

The Solitaire Gamble

At least in this version of solitaire, it appears to be a simple gamble whether or not a shuffle will reveal a working combination. Of course, with imperfect shuffling and repeated plays, it will become more and more likely that clusters of cards summing to 10 or multiples thereof will reappear, making the 22% chance of success pessimistic.

But regardless of whether or not strategy makes any difference, solitaire that depends solely on the shuffle apparently provides enough reward to sustain excitement and suspense. Of course, my father's father never had access to the Internet, either.

Due Respect

Cite your sources openly, for you stand on their shoulders.

So, of course, I reached out, over email. If anybody has been influential in one's life or one's work, even in the slightest way, it couldn't possibly hurt to let them know it. And in fact, it may be exactly the encouraging sign they need — this very moment — to continue.

I'd only contacted Bernie once before, for similar reasons, in 2011. His book, Statistical Theory, was influential for me for two reasons. First, he communicated clearly and lucidly, with an empathy for the reader. Second, and more importantly, he placed the idea in my head that a historical narrative is pedagogically more effective than a deductive one. In his words:

... I have been gradually coming to the conclusion that in mathematics generally, the best pedagogical order for the average student—and we have lots of them—is the historical one. We who have "been through it all" appreciate elegance and generality, but we tend to forget that we ourselves did not begin with the ultimate, unified, and general approach in our own learning process. (Lindgren, 1976)

I realized that I had made a similar statement—albeit with a different spin—in a recent post, when I tautologically said that our minds are usually quite typical. It occurred to me how much an influence he has had on my attitudes toward learning and teaching.

Cite your sources openly, for you stand on their shoulders.

Hi Bernie,

I thought about you just the other day, when I wrote a blog post about approaching mathematics obliquely.

I paraphrased your introduction to Statistical Theory, where you acknowledged that most students are typical, or something very similar.

Anyway, I thought of you, so I wanted to write and let you know. Here's the post:

I trust you're doing well.

Very best,


I didn't imagine today that I'd be dealing with death so suddenly. Nobody ever does, and I can't help but imagine that Bernie wasn't thinking too hard about it either, based on the repose and authority with which he taught me — through authorship — how to approach the arts of statistics and education intuitively.

Reading remembrances, I've learned that Bernie was not only an educator, he was a musician, organist, and singer who held soirées that spread the musical love. And quite importantly, he was a figure skater, too.

So here's a belated farewell salute to Bernie Lindgren, from somebody who knew him only barely, but implicitly. There's no surprise that he was a wonderful man, for an educator's heart is built on benevolence.

Educational Philosophy, and Stuff

I'm dedicated to learning and helping learn more about music and one another. The more you know, the more you grow, because learning brings knowledge and knowledge is power.


Or perhaps try this: See one, do one, teach one. This has been a slogan among the medical profession for some time (maybe back to Halsted). It's shown up in other disciplines as well. Medicine and music have a lot in common: lots of humanity, lots of technical skill, lots of intuition, and lots of practice. See one, do one, teach one as an apprenticeship slogan applies very, very well to musical skill.

It's oh-so-apocryphally quipped by American utilitarian John Dewey:

"One never truly learns something until he has confronted the necessary thinking required to teach that something to another learner."

Regardless where the misquote arose, technology has brought us far since then, and the Internet is the most profound information-sharing medium yet. It's the perfect vehicle for see-do-teaching, but unfortunately the first gets all the mic time.

Let's embrace the exploratory spirit of this three-pronged skill-based philosophy wholeheartedly, and promote all three opportunities. And the more we can shape the Internet in its spirit, the better!

A Torch-Installing Ansible Playbook

It's surprisingly difficult to wrap Ansible provisioning around Andrej Karpathy's very instructive and useful char-rnn package. Ansible has its quirks, which doesn't make everything straightforward. What's more, Torch has made a point of making it easy for researchers to get started with Lua (an amazing scriptable wrapper of C), and their installation scripts consequently assume an interactive environment rather than a production-ready system-wide one. Though there are Docker solutions available, I wanted to see how things would play in the Vagrant/Ansible universe.

Happily, it's not too difficult to guess-test-revise your way into a fragile but workable Ansible playbook. So with a quick cat !$ | pbcopy, here's what I came up with:

# Provision a box to do the computation.
# This is meant to be useful for local Vagrant virtual machines as well as for
# spot instances in the AWS cloud, for instance.

- name: Provision the box
  hosts: all
  become: yes
    - name: Upgrade aptitude packages
        upgrade: "full"
        update_cache: yes

    - name: Install aptitude packages
        name: "{{ item }}"
        state: present
        update_cache: yes
        cache_valid_time: 3600
        - git # Needed to download Torch in the first place.
        - libssl-dev # For luacrypto dependencies.

# Torch automates installation in a straightforward way; might as well use it.
# This is straight from
- name: Install torch within the user account
  hosts: all
    - name: Grab Torch Repository
        repo: ""
        dest: "~/torch"
        recursive: yes
        force: yes
        accept_hostkey: yes

    - name: Install Torch Dependencies
      shell: "cd ~/torch; bash install-deps"

    - name: Install Torch with default options
      shell: "cd ~/torch; yes | ./"

# Install the LuaRocks dependencies for char-rnn. This command's path is
# a bit fragile, depending on changes of torch's installer.
- name: Install project dependencies
  hosts: all
    - name: Install luarocks dependencies
      shell: "torch/install/bin/luarocks install {{ item }}"
        # Listed in README document.
        - nngraph
        - optim
        - nn

While I'm working toward a series of musical experiments using this as a launching-point, hopefully others will find this to be helpful.

Spatial Intuition in Mathematics

Approaching the world of statistics from the perspective of a musical empiricist, I was actually a bit surprised to learn that the core curriculum of OSU's Statistics PhD program was so mathematical. After all, it would seem that study of uncertainty itself -- and statistical methods for quantifying it -- ought to be motivated by either philosophical or practical concerns. It therefore seemed a non sequitur that Statistical Theory I (620) would begin with mathematical first principles and proceed deductively.

I was appropriately humbled, then, when I became aware of how imprecise natural language truly is compared to mathematical ones. Statistical theory can't be defined with sufficient precision using the language of a philosopher or practician (hat tips on this topic to both Bertrand Russel and Richard Feynman). Regardless, as a non-mathematician, more suggestions for how to think like one would have been very, very useful.

In honesty, I still can't claim to have the same mature intuition for mathematical language and relationships that I have for musical ones, notwithstanding the considerable overlap (and arguable essential equivalence) of the fields. Nonetheless, I have begun to enjoy a certain amused elan when I stumble upon unexpected fluency with certain mathematical ideas, despite their basicness.

It seems to me that most universally applied aspects of statistics -- namely mathematical models to help estimate unobservables -- cannot be intuitively understood without geometric imagination. They probably can't be effectively taught unless distance is introduced as a core concept. The typical human mind (and most of ours are typical) has capabilities reflecting adaptive solutions to evolutionary imperatives, and, at least for me, I can think of no better crutch than visual and spatial reasoning with which to approach statistical modeling. This idea is obviously not a new one.

What I've only now begun to perceive, however, is that mathematicians must be thinking and reasoning spatially, even when they're actually representing this reasoning by manipulating equations, functions, and other symbols of notation. Even if the last several centuries of mathematical work have been dedicated to formalizing ideas into algebra for more precise (or even automated) manipulation, I'd be willing to bet that geometrical thinking still drives our intuition.

Perhaps my more mathematically-minded friends would be able to confirm or deny!

How to get Started with R and Statistics

I'm often asked for advice about how to get involved with statistics and the R statistical programming language. Without a doubt, R is the premier open source tool for data analysis, owing to its enormous library of visualization and model-fitting packages, its relatively straightforward data munging abilities, and its in-depth online documentation. What's more, R is largely built by a community of educators, and quite a large number of datasets are included in the standard library. For all its quirks, it remains a tremendous pedagogical and practical tool.

But without a solid grasp of the underlying statistical theory, it can be difficult to suss out just how to interpret the outputs of, say, glm(). Both statistical knowledge and R knowledge provide a basic foundation to work with. If I were designing a basic curriculum, it would include four texts -- one each on practical statistics, R, statistical theory, and statistical learning.

Practical Statistics

For a first introduction to practical statistics, the OpenIntro Statistics, Second Edition really cannot be beat, owing both to its intuitive presentation of almost every foundational concept in statistics, and to its price -- it's a freely available pdf download written by David Diez, Christopher Barr, and Mine Çetinkaya-Rundel. More than merely a book on statistics, the OpenIntro Statistics text is a primer on experimental design and the scientific method. The book uses practical motivation (and common views that R would produce) to develop ideas from the basics of probability distributions through point estimation and hypothesis testing, arriving finally at both normal-theory and logistic linear models. If I were to teach a course on statistics for beginners again, this would be my chosen text.

Among the strongest features of the book are its clear diagrams, most of which seem to be set using R, and the authors' focus on visualizing data in terms of distributions. Whereas many texts will display distributions and histograms sparingly, the OpenIntro makes them a centerpiece of theoretical explanations. After all, they form the foundation of almost all practical data explorations as well. For example, here's an early caveat to understanding distributions only in terms of mean and variance:

Also available are the datasets required for the exercises, indexed by a chapter guide in a bulk dataset download. However, the book itself does not offer a solid introduction to R.


There are several different introductions to R available for people of different skill levels. Most folks who ask me for a way into R are software engineers who already have some understanding of other languages, and are interested in picking up some basic ideas about data manipulation at the same time. So far, my favorite text to recommend is Peter Dalgaard's Introductory Statistics with R, Second Edition. Though it's costly, and DRM'ed by Springer (ick), I'm yet to find a more comprehensive and useful overview of R as a programming language and practical tool. For example:

All of the major data structures are clearly explicated, as are the core plotting, modeling, and hypothesis testing functions. However, explanations of the underlying theory behind these topics is not comprehensive -- the book is best used after statistical theory has been learned.

Statistical Theory

For the more mathematically oriented and those interested in a calculus-driven understanding of statistical theory, I really cannot recommend Bernard Lindgren's 1976 Statistical Theory (Third Edition) enough. It's practically free these days, with used copies typically selling for under $5.00. The text is lucidly written, covers sets, combinatorics, probability, distribution functions, point estimation, and more -- all with refreshing clarity. The presentation is logical and mathematical, and really contributes toward developing geometric intuition about statistical constructs:

This calculus-based thinking is incredibly helpful to understand the basic idea of what a probability distribution really is, which distributions are helpful for which purposes, and how to construct completely novel models. For those interested in building real-time processing algorithms that leverage mathematical facts based on approximations -- rather than relying on brute force in offline precomputation -- this introduction to the toolkit is invaluable.

Statistical Learning

Outside of simple linear models, there exist entire worlds of classification and prediction problems that suggest nonlinear models or sophisticated projections, as well as problems that occupy high-dimensional spaces. For more experienced folks, Trevor Hastie, Robert Tibshirani, and Jerome Friedman have provided an instant classic in The Elements of Statistical Learning, available for free in PDF form. The cohesiveness of the book's chapter organization reflects a strong theoretical framework that unites most all concepts in the state of the art of statistical learning.

However, it is the same authors' introductory text -- An Introduction to Statistical Learning with Applications in R written with Gareth James and Daniela Witten -- that will really become the best friend of a newcomer to statistical learning techniques. The theoretic and the practical are well-married in this text, with inline code examples accompanying beautiful diagrams:

While linear algebra is part of the offering, a solid foundation in statistical theory is presumed. After the basics of statistics and R are already learned, this book will take the reader to the next level of model-fitting, and I recommend it highly.

Older Entries