The webcomics blog about webcomics

I Get To Use The Word Kurtosis Again? Happy Day!

There are days I see something out on the web and the only rational response is, “Yep, that’s my lead story.” In this case, it’s information that actually makes good on something I halfway-attempted pert-near four years back.

A little history: In response to an article at Comixpedia, I suggested we try to figure out if there was a magic “break even” number on unique readers that would render a webcomic economically sustaining for the creator. Because I’m a bit of a math nerd, I put out an open call for confidential data, with the caveat that I would only do the numbers if there were at least 100 respondents (and even that wouldn’t get us very close to statistical significance). At the time, 48 creators were willing to share data (including, it must be said in retrospect, a pretty goodly proportion of those that do make their living from webcomics), but as that fell way short of my threshold, no math.

Enter George Rohac, general fixer for Oni Press, publisher of anthologies, possessor of the worlds most nervousnessinducing grin, perpetual con-scene fixture, and Master’s degree holder. It’s that last one that’s important today, as Rohac has released both his thesis, Copyright and the Economy of Webcomics [PDF], and more importantly — his data set [Microsoft Excel].

There’s nearly 300 survey responses covering unique visitors, comic creation time, business management time, comic longevity, prior projects, copyright/copyright equivalent asserted, merch offered, income derived, and self-assessment of whether or not that income provides a living wage, and it’s all Creative Commonsed, so you can squash numbers to your heart’s content.

Most interesting numbers to me: more than 80% of Rohac’s respondents reported making less than US$8000 per year on the comic, but approximately 7% reported more than US$45,000, and more than half of that number reported more than US$65,000. On the “do you earn a living wage” question (and this one is highly subjective), a few respondents down as far as the US$8000 – 14,999 range answered “yes” (on the other hand, a few respondents in the US$65,000+ range answered “no”, so take that as an example of differing costs of living).

Also, the clearest correlation that I noted on casual inspection? Higher incomes pretty much go hand-in-hand with higher numbers of weekly unique readers. Yeah, I know — no surprise there, but even the most obvious intuitive assumptions work better with numbers backing ’em up.

Now that we have a first reasonably complete sample of hard numbers (although there are a number of missing responses across the surveys, by accident or deliberate omission, and of course more responses would make any conclusions drawn more valid), it’s time to move onto the “lies” and “damned lies” part of the game. Feel free to draw your own conclusions and remember — statistics has its own set of rules, and if you’re going to argue that “the numbers say x”, you have to follow them.

It’s Scientific!

  • Randall Munroe is unleashing upon the world his latest Bigger and Better Webcomics Thing; in the past these have been webcomics that were physically huge, or of extreme duration, or sometimes both. Sometimes they were just deep holes where it’s not possible to stop digging.

    Today’s strip is pretty modest, though. At least until he releases the data set:

    The xkcd survey
    This is an anonymous survey. After it’s done, a database of everyone’s responses will be posted.

    There’s no specific reason for any of the questions. The goal is to create an interesting and unusual data set for people to play with. (This is obviously not going to be a real random sample of people, but in the interest of getting cooler data, if you’re sharing this with friends, try sending it to some people who wouldn’t normally see this kind of thing!)

    WARNING: This survey is anonymous, but your answers WILL BE MADE PUBLIC. Depending what you write, it’s possible that someone may be able to identify you by looking at your responses. None of these questions should ask about anything too private, but don’t write anything that you don’t want people to see. If you’re not comfortable answering a question, just skip it.

    I’m taking bets on what the over/under on the number of responses will be … given Munroe’s audience size (couple million), audience engagement levels (high), and the likelihood of his audience to promote the survey on his behalf (like hack webcomics pseduojournalists), I’ma start at 2.73 million responses. Which means for once in my time of doing mathematical calculations on this site, I don’t have to bitch about the sample size being too small¹; it may even be large enough to engage in higher moments of analysis like skew and kurtosis, hooray!

  • Speaking of webcomics and statistics, a comic to teach the idea of data analysis (the result of a grant received by Dante Shepherd to use comics to teach STEM concepts) is up today at Surviving the World. Here’s hoping for more of the science comics to get shared, and for more on data crunching specifically. My favorite part is how I’m pretending that that narrator character is Shepherd as a Muppet. Now when I see him next month at TopatoCon, I’m going to insist that he flail his arms around like Mister The Frog.
  • News from the Erf front today: Erfworld creator Rob Balder announced that artist David Hahn will be leaving after Friday’s update. Balder’s getting to be like Frank Zappa, a relentless creator trying to find collaborators that can execute the thoughts coming out of his brainmeats for the world to experience the way he intended them to. It’s a tough gig, given that he’s got one of the most relentlessly pedantic audiences around:

    Consider the page a little while ago where David missed the fill on Ansom’s decrypted dwagon, and nobody else on the team (there are four people who look at the art) caught the error. Instead of a red eyeball, the page posted with the dwagon having a white eyeball. This led to a discussion in Reactions about whether that was an art mistake or an important clue about the dwagon’s Signamancy.

    Not to mention lacking in certain senses of boundaries:

    I must admit I have greater frustration with your closemouthed management style than I do with the loss of an artist. You have a tendency to keep problems close to the chest and decline to tell your (by all accounts of this thread) very loyal fanbase any negative information until it has escalated to a point where a crisis is happening and you have literally no choice but to divulge information. And even when you do this, it is in the most circumspect fashion, using vague apparently details intended to conceal the breadth of the problems going on, perhaps from some heightened sense of privacy conservation?

    . . .

    These ‘creative differences’ between you and David have clearly been growing over time, to the point where something happened on Monday that was ‘the last straw’. At yet, it’s only now, once you have officially ended your creative relationship, that you inform us as to what is going on. And your plan before this mystery event was evidently to spring a new artist on us after you had found one, and David has moved on.

    . . .

    All of this gives a certain vibe that you mistrust your fanbase. When problems arise, you don’t let them know about it, until you no longer have the option to keep it concealed. Whether this is because you worry that they might leave reflexively if another problem starts showing up, or because you feel that the affairs of your creative activities and interactions with your artist are not our business, you need to open up a bit more if you want this project to succeed. Because it now is, quite literally, our collective business now. ($882 [community support donations] per update?) [emphasis added]

    Apropos of quite a lot, I met Neil Gaiman once. Had dinner with him (in the sense that we sat next to each other at an event, and he was charming). I read everything of his I can lay my hands on and pay good money to do so. And you know what? Neil Gaiman and I are not friends. I am not entitled to any more of him than he is willing to give. If I disapprove of his work or his business affairs or his personal life, my entire remedy — provided I don’t want to be a sociopath about it — is to choose to not read his stuff any longer. That’s it. He is, to paraphrase the man himself, not my bitch.

    And because Rob Balder was too polite to say it to the personquoted above, allow me: Entitled Commenter At Erfword, Rob Balder is not your bitch. Your reading of Erfworld, even your financial support (if in fact you do support it) does not entitle you to the details of Balder’s business relationships, much less obligate him to violate the privacy of others. Get over yourself.


Spam of the day:

Gtyrrell OrdernMedicaments

What a coincidence! I’m in the market for medicaments!

_______________
¹ Although the population of said sample will probably skew heavily towards representative of the sort of people that read xkcd.

I Was GONNA Write A Full Post Today

For why not, see here and here. While Phillip has done his usual magic and got things back, I now lack the time to do more than point you at Intervention (launching today) and the Webcomics Longevity/Frequency chart.

My only comment here is that it seems to take the “frequency” part a bit loosely, as lengthy hiatuses and interruptions seem not to have dislodged comics like Achewood and Megatokyo from their original frequencies (approximately five and three days a week, neither of which does so now), nor credited those comics that have upped their frequencies (Girls With Slingshots, say).

It does, however, show how useless averages are, as occasional behemoths (50+ panels in a Diesel Sweeties, say, or a thousands-of-panels Dr Mcninja page really have very little effect on the overall averages. I will pay a dollar to anybody that adds in standard deviation and variances to this chart, and another dollar for skew and kurtosis¹.

PS: Phillip, thanks the back end works now but it’s still wonky about displaying the editing page and Dashboard. Welp, gotta run!

_______________
¹ Any day I get to use the word kurtosis is a good day.

Years Later, My Prob/Stats Professor Continues To Haunt Me

Okay, so Ben Gordon has written a critique of the Halfpixel Business Model (as described in How To Make Webcomics) and come to the conclusion it doesn’t work. I wish I had time to dig into this the way it deserves, but there’s no way I’m going to be able to in the near future.

So let’s be clear that this is not a formal analysis of Gordon’s entire thesis, but specifically a response based on his numbers. I’m going to talk about this using casual terminology so as to make my thoughts as accessible as possible to everybody that doesn’t know (and, rightfully, doesn’t care) about the difference between skew and kurtosis. Onwards.

Gordon looked at a sample of webcomics, and sought to estimate how much money could be made from his reading of HTMW‘s “10% Rule” (5 – 10% of your readership will open their wallets and buy things). His calculations led him to conclude that the rule is fundamentally flawed, but pointed out:

I hope someone will find fault with my analysis, because if it is sound, it is a setback for webcomics.

I’m not sure if his conclusion can be proved or disproved (we are, after all, talking about applying mathematical rules to a creative endeavour), but if his conclusion’s based solely on the numbers, I think that I’ve found the fault he was looking for, from a purely statistical standpoint. Consider the following statements from his posting:

  • [the business model] cannot be verified by the majority of case studies
  • I’ve chosen comics in a range of sizes from a list in Wikipedia which reports comics that support their creator(s). … I removed the ones that don’t belong and analyzed the rest.
  • The formula for estimating each comic’s profit is: … We assume the average profit per sale is $5 — typical for a t-shirt
  • [five calcluations of estimated webcomic profits ranging from $975 to $24,000]

First off, we need to agree on some terminology — Gordon doesn’t have “a majority of case studies”, he’s got one study with five data points. Semantics? Nope — because the number of data points is a critical element of how much we can draw reliable conclusions from the numbers. We’ll come back to that in a moment.

Secondly, Gordon’s eliminated data that “don’t belong” (for example, Achewood was eliminated because Time magazine declared it the best graphic novel of 2007 — which may have artificially inflated its numbers, I guess), meaning that we’re not looking at a random sample. We’ll come back to that, too.

Thirdly, the assumption of profit per sale is entirely arbitrary — $5, which is described as the average profit on a t-shirt (I don’t sell shirts so I can’t say, but having ordered custom shirts from the same guy many webcomickers use, I think it’s probably a bit low). But the profit per shirt doesn’t matter anyway, because it assumes that any item the creator makes will produce the same profit. Unfortunately, this doesn’t hold up.

Case in point: I have purchased a number of originals from a number of webcomickers (some of whom describe themselves as entirely self-employed by their strips and others that do not); prices have ranged from $20 to $175. Profit on even the lowest priced of them is several times Gordon’s assumption, and on the high end it utterly destroys his model. Okay, many webcomickers sell shirts, and okay, the profit on a shirt probably occupies a fairly narrow range of values, but what do we do with all the other items? You’ve got books, prints, hoodies, skateboard decks, hot sauce, and an upsell (of $5 to $10, generally) to get the item signed/sketched. That’s an incredible variation.

That price range actually points to the real problem in Gordon’s analysis — the distribution curve of those “price per original” data would form a flat line. It’s not a set of consensus values with outliers because there’s too few points — this does not allow for meaningful statistical analyses. The same situation exists with the estimated profit figures he gives: 975, 2012, 8000, 17270, 24000 … that’s only five data points. The confidence that we can derive from any analysis over such a wide range, with a distribution curve that looks like a flat line, is vanishingly small.

Statistical analysis only works if any random datum that you select to calculate can be assumed to represent many, many, similar (to the point of being essentially identical) other data that you don’t bother to include in the analysis. The key thought here is Margin of Error. You know MoE — it’s what tells you that a political race between, say, the Harbinger of the New Golden Age and the Evil Throwback to All That’s Unholy is presently split 52% to 48%, plus or minus 4.3% (and since the MoE is greater than the difference between HNGA and ETATU, we essentially don’t know who’s ahead).

Also bear in mind that the MoE is probably only to the standard level of “95% confidence”, which means that there’s a 5% chance that the real split could be even more than 4.3%. I’m going to run one simple equation to drive this home. It’s a rule of thumb that if you want to calculate the margin of error to a 95% confidence level you can do so approximately with:

0.98/√n

where n is the number of samples. In this case, n equals 5, which gives us

0.98/2.236 = 0.438 = plus or minus 43.8%

So there’s a 95% chance that the five data points we have are representative of webcomics earnings potential, with the assumption that any number we come up with could conceivably be off by as much as 43.8% from the true value. That’s not a number that we can be very confident in. Add to that the fact that statistics in general is predicated on random samples (but Gordon selected his population), and we have numbers that can’t be relied upon to any degree, even if we take the problematic $5 assumption off the table.

Heck, even recalculating for every self-reported self-supporting webcomicker isn’t going to help, because the number is still too low to provide statistical significance (honestly, we’d want a population several thousand and a sample of at least 500 to have much confidence in the numbers). It’s still an anomaly to make a living this way, and there simply are not enough data to allow for any analysis beyond the anecdotal — which is precisely what HTMW affords. This is not to say that Gordon’s question shouldn’t be asked or that his conclusions are wrong — but it is pointless to try to draw any statistical meaning from these numbers.

Speaking of “pointless”, I strongly urge that you avoid the related thread at The Daily Cartoonist, as it quickly devolved (despite Alan Gardner’s specific request to stay on the damn topic) into truly astonishing levels of dickery re: webcomickers do not have careers/incomes/lives/redeeming qualities.

It never ceases to astonish me that individuals that I have met — and who are perfectly polite and rational in person — turn into such raging exemplars of John Gabriel’s best known theorem (minus the anonymity … weird) when discussing this particular topic. I stopped reading in disgust after about 20 comments and won’t go back there. Proceed at your own risk.

The discussion at the original post is, by contrast, civil, productive, and based on logic. Gordon has been polite in responding to questions and everybody is doing their best to treat the question as an intellectual exercise designed to figure out the truth. Bravo.