a glob of nerdishness

October 8, 2007

The Purpose Driven CAPTCHA

written by natevw @ 8:22 pm

So I was reading the BBC the other day and spotted a confusing title about book preserve spam weapons or something. Turns out there’s a respectable CAPTCHA system making the rounds, and it’s even cool enough for twitter. (You can see if I’m cool enough for twitter via this very link.)

June 19, 2007

Reality Gameshow

written by natevw @ 12:08 pm

“When the people gathered together on one of the great trial days, they never knew whether they were to witness a bloody slaughter or a hilarious wedding. This element of uncertainty lent an interest to the occasion which it could not otherwise have attained.”

Sounds like television ratings explained, but “The Lady, or the Tiger?” predates the first televised game show by almost sixty years. I wonder if there was advertising around the gladiators’ Colosseum in ages past?

May 8, 2007

A better plan for spam? [academic version]

written by natevw @ 11:36 am

I was going to also publish an even lengthier version of the “A better plan for spam?” article I wrote up. It tried harder to defend all it’s points, and to make sense to a wider audience.

However, after reading the nerd version, my wife said it made a whole lot more sense than did the windier “academic version”. I guess I’d make a better Dijkstra than Dooyeweerd. Fair enough! Since publishing the ‘nerd’ version, I’ve gotten some insightful comments — thanks and keep them coming as long as you’d like! Meanwhile, this is not the blog of schoolishness and the end of this post is right here ->.

April 28, 2007

A better plan for spam? [nerd version]

written by natevw @ 12:45 pm

I think that small tariffs on “published” messages could stop spam, and also reduce the need for advertising online. Users don’t pay to receive e-mail, but they do pay a little each time they send. Users don’t pay to read web pages, but they do pay a little to comment on them. This wouldn’t change the model of the Internet or Web greatly, but it could significantly change the result.

The need for a better solution

Stopping spam is a difficult problem. It could be tackled at the source and/or the destination. Politicians could criminalize spam, or at least tax it — ideally, stop spam at the source. Why anyone would want more regulation and more taxes is unfathomable to most computer programmers. We prefer to invent clever algorithms instead — ideally, stop spam at it’s destination.

There’s a problem with our approach. Spam isn’t caused by buggy machines, it’s caused by greedy people. If humans have trouble deciding whether an e-mail is in their best interest, how much more a disinterested computer! Even if you believe that greedy people are actually just buggy machines, are you willing to bet the spam battle on your ability to create a superior being? If you have an algorithm that takes text as input and gives a motive as output, there are 33 sets of grieving Hokies still groping for answers. (I’m not sure whether Cringley’s recent “search engine for hate” post is meant as some kind of sick joke to that affect, or what.) There’s no algorithm that can stop sin, except for a sacrificial love which overcame death.

The solution applied to e-mail

Don’t keep up the AI arms race, spammers are clever too. Don’t let the government start skimming more profits, spammers are expert evaders. Instead, let the ISPs charge to accept email on your behalf. This might require modifications to the inter-net email relaying protocols, but with things like DomainKeys and server-based spamboxes we’re doing that anyway.

The solution on the Web

There’s more to gain on today’s Web. Imagine leveraging an OpenID-like (or even OpenID-based) system in a micropayments context. Please don’t check out just yet if you think you’ve heard this all before.

Five years and who knows how many market cycles later, I still agree with most of Jakob Nielsen’s User Payments synopsis. Basically, paid content stinks, but ads and spam stink worse. “Information wants to be free” may remind me of “Arbeit macht frei”, but I still don’t want to pay to read your blog. There’s a lot of pressure keeping the Web from going back to a royalty model, perhaps rightly so. But we can flip micropayments around.

A practical scenario

The Web is currently infatuated with user-generated content, so there are plenty of other extrapolations, but weblogs are an obvious use. A successful blog currently tries to exploit it’s fixed content for money and protect it’s user feedback frome abuse. This is naïve and ugly: income via ads, protection via captchas/logins/AI plugins. Instead, why not install a micropayments plugin? Basically:

  1. The webmaster chooses a username/signature service from any number of providers
  2. These service providers act as brokers, deducting/depositing money into the user’s account
  3. The user’s info/balance is not stored with the service providers, but in an account with a larger entity

Basically, it’s PayPal for Web 2.0 — less centralized, more user friendly. The user doesn’t want to keep his credit card on file with every blog he users, but neither can one company be expected to provide plugins for every blog/forum/link-swamp-service out there. Hence the middle-man services providers. They provide an API specific to a problem-domain, and use a standard protocol to link up with any main account provider the user chooses.

Result: Blogs with active discussion have a reasonable, and practical, alternative to advertisements. Users think just a little bit before they respond. Most importantly, spammers get sick of making blog owners rich.

Potential drawbacks

Users could refuse to accept such a plan, preferring to leave the financial and spam-filtering burden completely on the content providers’ tab. But trading chaotic captchas and commercials for a painless just-charge-it is not really a bad deal. I’m paying about a tenner a month for the privilege of writing on my own site, and I’d gladly support your site with my literal 2¢. This does mean speech for the end-user would no longer be free-as-in-beer, but if adequate attention was paid to privacy concerns there would be no harm done to free speech. If a user doesn’t have a credit card, they can sign up with an account provider that foots the tab in exchange for completing Mechanical Turk-style tasks.

Spammers may not give up so easily, though. If they can’t afford to pay microtariffs on everybody’s blog, the most likely alternative is to instead target only relevant blogs. How this differs from AdSense, I don’t know. The only serious concern left then is security. Spammers must not be able to phish or botnet their way into anyone else’s main account. But this is a concern for countless other scenarios as well, and is by no means an excuse not to try. And if time is money, the time we save by tackling the problem of spam as capitalists rather than technologists will more than make up for any spurious charges encountered as the system is gaining experience.

The plea

Spam is a people problem, as is making a microtariff system acceptable. I am a programmer. If spam was a big interest of mine, I might be immersing myself in Bayesian Filtering and Neural Networks in a technologic attempt to save the world. But this series has been immersing enough, and I have no interest in becoming the PayPal of the New and Amazing Web. Currently this site is low bandwidth and SpamKarma hasn’t swatted too many of my dad’s comments. However, when this site’s readership and “spammer”ship grow beyond what my budget can bear, I would much rather join a privatized microtariff economy than Google’s ginormous AdSense scheme. If you know someone who’s looking for a get-rich-and-save-the-world,like,yesterday scheme, I’d love to hear what they think of this plan. The comments are open and, for as long as practical, still free!

April 21, 2007

Refinance our survey for free smilies

written by natevw @ 6:24 pm

It is unfair to discuss the problem of spam without bringing up its archetype. Spam is the barbarian offspring of advertising, itself an uncouth tradition. Marketing, and often with it some advertising, is important to good business. That said, it’s disgusting how prevalent and tasteless advertising has become.

The relevance of marketing

The best advertisements still serve their victim: Hey, maybe you haven’t heard of us, but we do X which might help you Y. But even the “spend time with your kids, don’t start forest fires” advertisements are still distractions from the main course.(1) As far as living goes, is spam really much less relevant to me than the hundreds of other ads I see online everyday?

In a sense, web advertising is highly relevant to my online life. Without it, countless online destinations would only be hurt by my tourism (the previous post explains). Google gets about 90% of its incredible income (note those figures are in thousands of dollars!) from advertisment. It’s hard to imagine the “google.com” domain name becoming a home for squatters. Given the fortune Google has amassed, I don’t doubt they could keep paying their top-notch engineers long past finding alternate revenue sources, but many other domains would be in dire straits if it weren’t for advertisements.

The need for advertising

Web sites do need to pay their expenses. There’s not really a good reason for CLICK THE SCROLLING FLAMING MONKEY!!!, or the weird dancing business guy(2). In case you want your memory refreshed, Jakob Nielsen has listed some other sketchy advertising techniques that are still pretty common nowadays. As he points out, nothing good comes out of making users annoyed. There is, however, certainly a place for paying the bills through corporate sponsorships, intelligently targeted graphical links, and even computer-placed text ads with all their mechanised “Sense”itivity. I just don’t see an enduring economy for so many sites getting into the ad-supported model, to say nothing of the aesthetics in that situation.

Getting away from advertisements

To be honest, it was primarily my aggravation with advertising that led me to start this series. I think the Web, perhaps even the whole Internet, is ripe for an idea that can reduce the motivation for all forms of advertising: everything from the spam kings’ unfeeling e-mail blitzes to Google’s sentient AdWords revenue scheme. In my next post, I hope to finally elucidate such an idea. We’ll be back…


  1. Although Wikipedia’s eye-opening article ontelevision ads notes that commercials have all but become an essential part of the television experience. I can hardly stand television, but I understand a great many people do. Hopefully this diatribe doesn’t come across as overly cantankerous to my more experienced readers.
  2. Do you get those? I keep seeing job site ads with this white-collar worker who was photographed rolling up his sleeve and in other quasi-professional poses, yet is animated to slowly move his legs as if he were proud that they’re put out of joint.

April 17, 2007

Who pays?

written by natevw @ 7:41 pm

You are paying to read this post. In some way, quite direct if you’re on a cell phone and rather indirect if you’re in a public library, you paid money so that you could view this Web page on your screen. This seems fair enough, especially if you are finding valuable info or entertainment [infotainment?] here. But did you know that I am paying for you to read this post, too? To make matters worse, as I continue to make this website more valuable to more people, the cost for me to give them this content will go up. On the Internet, we pay proportional to how much we’re being served, but also how much we are serving others. It wasn’t always this way.

Last fall, I was watching a Cringley interview with Brewster Kahle, the founder of the Internet Archive and other endeavors. He revealed that just before the Web really started taking off, “in the ’80’s and [early] ’90’s…, AOL would charge people about $6.00 an hour to be on their service, and 10% to 15% of that gross revenue went to the publishers that were making the experiences that the people wanted.”(1) He called this a “royalty model” — just like a book, the more popular a site was, the more money it earned its creators. But eventually, those in power realized that they controlled access for enough “eyeballs” that they could start charging the content providers for the attention they attracted. And so today’s Web model was born: guests pay for access, hosts pay for guests and we need to somehow raise enough money to host over 65 million websites.

In addition to the cost of bandwidth, spam has further raised the price of hosting a website. Why should, say, a game company have to take time from answering user questions to fix the tool they use to do so, just because spammers have moved into the boards? Multiply that by countless others, who each put up a website allowing user-feedback and a month or two later are frantically searching for a solution that will keep the crap at bay. The only reason it is affordable is because every single community-oriented site has a share in the snowballing(2) costs.

There are two problems that face community-oriented endeavors on the Web and on the Internet: how to make money, and how to keep the service from abuse. Typically the former is solved by advertising and the latter by advancing technology inspired by artificial intelligence research. I’ve discussed the trouble with the AI-based technology approach in my last two posts, and I intend to discuss the problems with advertising in my next. After that, we should be all set to look at a solution.


  1. A written transcript is available of the entire interview, which is a good view if you’ve got the time.
  2. Someone thinks of new way to have the computer decide whether a chunk of data is spam or not, hundreds have to figure out how to implement the idea in their respective contexts, and thousands, sometimes millions, in each context have to figure out what’s the best blog comment plugin, what login/captcha system will best serve their guests, and what e-mail program or filtering service won’t junk too many of their friends’ random emails. Add some old-fashioned hand deletion of the ones that still get through, and we’re finally back where we wanted to be, only with an added annoyance or two. There aren’t many people whose time is best spent on this particular problem, but there are plenty who must deal with spam anyway.

April 13, 2007

The wages of sin

written by natevw @ 12:41 am

Spam

A recurring scenario in science fiction involves humans making their machines more and more powerful until they are overthrown by them. Well, we are busy filling our online world with better and better Artificial Intelligence — designed to decide what is meaningful and what is not, what is good and what is evil. It seems that not a single open port or a single submittable form on the Internet these days can get away without some sort of AI to determine whether it is being greeted by a friend or foe. As a Christian engineering professor points out, spam is an expected consequence of sin. It should not surprise us that we must struggle with something like rampant spam.

Two approaches

The Internet revolves around two important nouns: bytes and addresses. All around us fly packets of data going from one point to another. The reasons spam is profitable are cheap data and rogue points. It’s efficient to send bytes across the wire and it’s simple to get an IP address. So if we plan to take on spam, then those are the obvious places to focus.

Bytes

Most bytes aren’t paid for directly. One buys bandwidth — a maximum rate at which bytes can be sent — typically on a subscription basis. How many bytes you get for your buck depends on how close to the limit you feed the pipes(1). A professional spammer buys industrial-strength bandwidth and milks it for all it’s worth. To make spamming less profitable, we could start charging more for bandwidth and the price of each junk e-mail would go up correspondingly.

However, that suggestion has a serious flaw. Spam is outgoing data(2). I think charging for outgoing data is abhorrent. The Internet’s current business model is already terribly skewed *against* the content providers(3). Byte-wise, spam is insignificant compared to what businesses like Download.com, YouTube, Google Image Search and the iTunes Music Store demand. If we raise the prices for spammers, we also raise the prices for non-profits like the Internet Archive, Mozilla, Sourceforge and sponsoring universities, Wikipedia, &c &c. Spammers are getting paid, not hoping for donations!

Addresses

The other obvious way to discourage spam is to tie an identity to each address. If you can trace the source, you can hold it responsible. This is some people’s worst nightmare, some citizens’ bad dream and some lawyers’ bread and butter. Needless to say, that method has privacy concerns that are beyond the scope of this essay. (Read: it might be a good solution but I’m not going there.)

Further drawbacks.

Both of these solutions would only provide more incentive for another rearing of sin’s ugly head. While some spammers spend their budget on big pipes, others use it to break into other people’s computers and send spam from there. This can be one organization with a fast connection, or a bajillion Internet Explorer users with normal connections. Increasing the cost of the pipes would only encourage more botnet-building research and development against vulnerable computers(4). I’d rather be stuck next to a shady neighbor with a mega-decibel stereo system, than one who has access to my, and all the neighbors’, volume controls!

Both of the obvious solutions have serious drawbacks. Those into politics are busy debating privacy, power and pricing. Those into programming are engaged in a battle of wits; whether to the death, the pain, or the world getting taken over by robots I can’t say just yet. I eagerly await for all things to be made new. But in the meantime, I think there is a way we can discourage spam, and I believe my professor is close to the right idea.

Further exploration

The problem is a double case of wrong perspective. As humans, we think of spammers shipping us barges full of toxic waste. In response, we do our best to implement port security. Humans are discerning creatures, so this might work in real life. But for a computer, telling the difference between toxic waste and the sacks of coffee that get us to work every morning is a hard problem.(5) The second perspective issue is much more subtle. When a barge full of dirty bomb material makes it through our port, we fume and feel victimized. We might even feel hate. We’re mad at the barge, we’re mad at the port it came from. We’re mad at our computer because it’s not competent enough to keep our inbox safe. But here is where the analogy breaks down. Spam is not motivated by hatred posing as zeal. Spam is motivated by greed. And capitalism is all about squeezing something good out of greed. I hope to explain in detail how I think we can exploit the tariff model, as well as exploring a number of side-effects, good and bad.


  1. …and whether said bandwidth is actually available or just some imaginary number that a marketing department made up.
  2. from the spammer’s perspective
  3. The better your content, the more bandwidth you will need to buy. This is just as true for non-profit organizations, and one reason even-over content hosting sites like Flickr, YouTube and Blogger are such good deals for the end-user.
  4. I.e., all of them. Vulnerability is a rank, not a switch that can be turned off.
  5. I suspect Bruce Schneier of having a reductionist view of humanness, thus, his paranoia about our nation’s recent security attempts stems from his incredible knowledge of computer security. Of course, there may reason for concern regarding Motherland Security due to experience with things such as history and human nature!