The Four Hour Web App: Venn’d

I am a huge proponent of building quick web prototypes. Launching a quick and dirty tool helps guage interest in a new product or service, exposes you to challenges and obstacles you may not have foreseen, and provides real data on usage patterns and user feedback which are invaluable during the conception and design process.

With that background: Wednesday night I saw this tweet from Jason Fried, founder and partner at 37 Signals:

For a not-yet-launched project, I have been working with some new functions in the Twitter API, only days old, that simplify the process of polling for a user’s friends and followers. I thought it would be fun and educational to launch a super-quick using the same toolset, and so  Venn’d was born. Venn’d is a dead-simple tool: given two Twitter users, it calculates the overlap between each of their collections of friends and followers, and presents the results using a Venn diagram. Here’s a behind-the-scenes look at the services and tools I used to build it.

The core functionality of Venn’d is driven by Twitter’s API, specifically the calls to request a user’s friends and followers.

For a quick, “free”, scalable hosting platform I chose Google’s AppEngine. I put free in quotes because eventually you’ll be able to purchase more AppEngine computing resources if you go over the free quotas, but that functionality hasn’t been rolled out yet. Google says that the platform will remain free for applications requiring less than 500 megabytes of storage and generating fewer than 5 million monthly pageviews. Beyond those levels, the expected pricing is:

  • $0.10 – $0.12 per CPU core-hour
  • $0.15 – $0.18 per GB-month of storage
  • $0.11 – $0.13 per GB outgoing bandwidth
  • $0.09 – $0.11 per GB incoming bandwidth

The current functionality of Venn’d only required a few of AppEngine’s service APIs: DataStore to record the details of each request (except for Twitter username and password information) and URL Fetch to access the Twitter API. I used my own modified version of a Python REST framework to streamline the code under the hood (can’t find the original source I used at the moment, I’ll update this post if I track it down later). My knowledge of Python is only just passable; the nuts and bolts of Python and AppEngine would’ve been faster for someone who really knew their stuff.

I chose jQuery as the JavaScript/AJAX framework. I typically use Prototype, because of its tight integration into Rails, and thought this would be a good opportunity to learn more about an alternate tool. The jQuery form plugin and validation plugin made it easy to add AJAX validation and submission to the main form. Instead of hosting jQuery locally within the application, I referenced the copy of the library that Google makes accessible through their AJAX Libraries API. Because a signficant number of other sites around the web do the same, the odds that a user will already have a cached version in their browser goes up, improving page load times and performance.

To generate the Venn diagrams, I used the Google Charts API. This made it quick and easy to visualize the overlap between each user’s friends and followers. The designer in me wanted more control over the look of the charts, so I spent a little time looking into generating custom charts from scratch. Turns out it would be a huge pain, and not worth the effort. One of AppEngine’s restrictions is that you have no access the filesystem of the server running your code. Essentially all of the available image generation libraries available for Python presume such access, so you can’t use them. Google recently released an Image API for AppEngine, but it can only handle basic rotate/resize/crop type manipulations, not generating an image from scratch. There are a lot of forum threads clamoring for better image support, so this may be functionality Google provides in the future. I briefly considered using a browser-based solution like SVG or CSS with transparent images, but thought both would be too much effort and have browser compatibility issues. Using an API is always a compromise between the speed and convenience of using an off-the-shelf solution and the power and customization of writing your own code. For a quick and dirty tool like Venn’d, using an existing API was the way to go.

As a small nod towards making the app more viral, I added a “tweet this” link to each generated diagram to make it easy for a user to share the results on Twitter if they wanted to. To deal with Twitter’s 140 character cap, I used the bit.ly API to generate a shortened version of the permalink to each new page.

I make it a point to add Google Analytics from the get go to every site I build. It might not be necessary at first, but having good data on how your application is being used is invaluable as your traffic grows. And check it out! 40 whole pageviews! :)

Finally, a few notes on design. I didn’t want to invest much time or energy into the visual look of the app. I grabbed a royalty-free pattern image to use as the background. For a color scheme, I sampled a few of the primary colors from twitter.com itself using the ColorZilla Firefox plugin’s eyedropper tool. I test-drove a few CSS typography schemes using Typechart and chose one I liked.

That’s it! The “finished” product: Venn’d

Crowdsourcing, Attention and Productivity

The reigning currency of the internet is attention. Bands posting free songs on MySpace, bloggers churning out a steady stream of screeds, and teenagers pouring their hearts out to their webcams on YouTube all crave attention from their peers or the wider internet community. Yet as critical as this dynamic is to today’s web, I’m aware of almost no research which quantifies its inner workings. So I found this recent paper from HP’s Social Computing Lab very interesting: Crowdsourcing, Attention and Productivity.

In the paper, Bernardo Huberman and his co-authors set out to measure how the popularity of someone’s web content influences their decision to produce further content. The researchers used a dataset of users who had contributed videos to YouTube. Their first insight is that user contributions are “bursty” — on average, users posted nothing in 66% of the two-week time periods the calendar was sliced into. More importantly, there was strong evidence that attention, in the form of video views, encourages users to produce more videos. The more attention a user’s videos received in one period, the more likely they more to upload more video in the next (see the paper for a full description of the linear regression model used). On the flip side, users who stopped uploading content tended to do so after a steady decline in the views they received.

How does this relate to strategy? Let’s say that YouTube could choose between two scenarios: 500 user-contributed videos with 100,000 views each, or 5,000 videos with 10,000 views each. Which would they prefer? Both scenarios deliver 50 million pageviews, roughly what the site currently gets per month. In the first scenario, only 500 users are receiving the attention that will encourage them to continue creating content, while in the latter that number is ten times higher. The latter, larger pool of users will be the engine of growth, creating more content and adding value to the site.

This isn’t just a thought experiment. Websites have powerful user experience levers with which to influence the shape of their communities. YouTube’s home page could be a simple list of the top 10 most-watched videos this month, reinforcing those videos already popular and creating a site with a few “blockbusters” and a great many more videos languishing in obscurity. Alternatively, the home page could list 10 videos being watched right now, a few videos selected at random, and maybe a selection of videos whose popularity is rising quickly or already strong among a given subset of the community. This would be a more egalitarian YouTube, where niche content had a greater chance to find its audience and grow in popularity.

This flatter, more egalitarian YouTube strategy finds support in the economics of web audiences as well. Niche audiences, though small, are worth more per capita to advertisers than the monolithic audiences of mass media because their interests are more specific and can be matched more precisely to relevant advertising. (Online advertising wonks: feel free to push back on this argument.) The ad buying world may not have fully adapted to this view yet, but a YouTube comprised of thousands of micro-communities has significantly more value than one with a single undifferentiated audience. The Social Computing Lab’s research provides quantitative support necessary to cultivate these communities.

Microapps for Fun and Profit

The future of the web is tiny. We are moving from an era of big, monolithic web sites to small web services being stitched together endlessly in new, innovative ways — small pieces, loosely joined, to use David Weinberger’s phrase. These tiny pieces of the web can be thought of as “microapps“, and MicroApps.org describes the philosophy in greater detail:

MicroApps are small REST applications that are designed from the ground up to be integrated with other applications. Usually, they are not directly useful on their own, but must be integrated into other applications (this is what differentiates a MicroApp from a regular REST application).

And further…

The core idea of Microapps is basically using the web (and REST specifically) as a component architecture to build applications. A microapp is a small application with a very tight focus that can be integrated with other microapps or other web applications via HTTP and a common data format (usually XML, JSON, or RDF).

Don’t worry about the technical details about REST architectures and data formats; the key point is that microapps are small pieces of functionally designed to be put together into larger applications. But if you browse the list of existing microapps, something strange becomes apparent. The vast majority of these microapps are not hosted services, but rather Plain Old Software, which you download, install, configure and manage yourself. These tools would be far more valuable if made available as public, hosted web services. For one, the potential audience would be vastly larger, expanding from the relatively modest set of developers with access to their own web servers running Python, to, well, just about any web developer at all. Secondly, a lot of collective effort could be avoided if each new user of a microapp didn’t have to install and manage the software separately. So what’s the problem here?

Erik Kastner gave a presentation at RailsConf 2008 called Microapps for Fun and Profit. He defines three categories of motivation for someone to create a microapp: “fun” (play, learning), “profit” (AdSense, donations, sponsorship), and “better than money” (reputation, your personal brand, experience). Erik also makes reference to “microprofit” (seen in the snip of one of his slides above), and this hints at the barriers to turning microapps into fully public, hosted services.

Coming up next: why there aren’t more microapps, and what we can do about it.

Sneak Peak 3: TwipJar

Twitter-powered social micropayments platform.

Amazon Start-Up Tour

I took a break this aftrenoon from the Web 2.0 Expo for the Amazon Web Services Start-Up Tour. It was a cool event highlighting startups which are building their businesses on top of AWS, as well as some good talks by VCs about what they’re looking for in today’s startup space. Highlights included Adaptive Blue founder Alex Iskold describing some unique ways his company is using the services and a talk by Nick Beim of Matrix Partners about doing deep analyses of your customer microeconomics.

Amazon’s Web Services are a hugely important addition to the web 2.0 space, and one who’s possibilities are only starting to be fully capitalized upon. AWS Evangelist Mike Culver described how using Amazon Web Services can eliminate the “undifferentiated heavy lifting” that stands between idea and execution for a startup. I’ve made use of S3, SimpleDB, and SQS for my recent web projects, and am working on incorporating the Flexible Payment Service (FPS) into another prototype now. Taking advantage of these tools helps startups focus on the areas where they really add value, while leveraging Amazon’s infrastructure for the rest.

5 Reasons APIs Suck

I have spent much of the last few months working with APIs, and the experience has inspired this list of what’s wrong with APIs:

  1. Exposing an API makes scaling headaches worse

    Preparing for traffic spikes has always been a challenge on the web, when a single link from Slashdot or Digg can send hordes of traffic your way and bring your site to its knees. Yet the situation is even worse if you offer an API, because the code on the other end accessing your API may be unrelenting in its pinging or simply badly written, hammering your infrastructure harder than human visitors ever could.

  2. APIs have no “eyeballs” to “monetize”

    Advertising has long been the lazy man’s substitute for coming up with a real, viable online revenue model. Attract visitors, then just wait for the barrels of money, right? Google’s AdSense and a host of other online advertising tools made it dead-simple to turn visitors into money, if not always much of it. But the “visitors” to your API are other web sites or other services, and even Google hasn’t figured out a way to make a computer interested in seeing an advertisement (yet).

  3. The incentives to expose an API are often weak

    Points 1 and 2, taken together, imply that publishing an API can have real costs in terms of bandwidth, servers, support, and administration. At the same time, there is no obvious revenue model through which to recoup these expenses. This reality can discourage people from making useful APIs available, a point which I’ll post more about soon.

  4. There is no standard way to access an API

    The battle to standardize access to web services has raged for over a decade, producing no clear victor and leaving a trail of abandoned and aborted proposals in its wake. Supporters of WS-* and REST have an almost reflexive distaste for one another. Your web service architecture might be RESTful, but is it “high REST”, “low REST”, or simply “REST-ish”?

  5. APIs are unreliable

    What if an API you rely on is down temporarliy, or, god forbid, permanently? Isn’t it dangerous to build your business on top of a resource controlled by someone else? Users of conventional software packages have contracts or Service Level Agreements to help protect them against these risks. APIs rarely have the types of guarantees offered by other types of hosted software.

Twitters fail whale... a full year ago, Twitters API already had 10x the traffic of its website.

Twitter's "fail whale"... a full year ago, Twitter's API already had 10x the traffic of its website, causing the site to go down repeatedly.

Now, I have an admission: I don’t actually think that APIs suck. APIs have made the internet a much more interesting place, promoting the type of remix culture which can really accelerate innovation. APIs also make economic sense, because they encourage specialization and the associated gains from comparative advantage. In non-economic terms, the availability of APIs encourages creators on the internet to focus on what they’re best at, and leverage APIs for the rest of the functionality they need. Yet in order to realize the full value of APIs and services on the web, we’ll have to find solutions to these five problems. Stay tuned for some of my ideas about a path forward.

I Know What You Did Last Summer

Yesterday I was at a conference about location-based services put on by the Columbia Institute for Tele-Information at Columbia Business School (and organized by my friend and classmate Alison Albeck Lindland of American Express Interactive). The event was called The Focus on Locus, and fostered interesting discussions about the business, social, and privacy aspects of the latest generation of services which are location-aware. Ironically, the conference coincided with Apple’s launch of the iPhone 3G, causing AT&T’s Dorothy Attwood to joke that she should donate her speaking time to give people a chance to run over to the 5th Avenue Apple Store to get in line.

NOT a movie about location-based services

NOT a movie about location-based services

For me, some of the most interesting issues raised centered around privacy. A recent Northeastern University study (publicized by Nature here and here) of 100,000 anonymous cellphone users showed that most of us are creatures of habit, travelling between the same two or three locations most days. It has been suggested — though a quick search didn’t turn up any references on the web — that many “anonymous” records of a person’s location can in fact reveal a person’s identity once cross-referenced against other databases, such as home and work locations. On a less paranoid note, datasets of human movements would certainly yield important insights for economists, epidemiologists, urban planners, sociologists, and others. There is a whole new science of the mobile human environment waiting to be unleashed if we can design a location monitoring and disclosure framework which has appropriate privacy safeguards.

“Where are you right now?” — this is the particularly narrow view that many in the location-based services space have of the foundation they are building services on top of. The reality is that aggregate or longitudinal location data will likely turn out to be more valuable than the single data point of someone’s current location. In many cases the fact that you are at a certain restaurant right now is less useful than the knowledge that you’ve been there 8 times this month, usually for a weekday lunch. Hedge fund-backed startup Sense Networks is one example of a company working on more sophisticated methods with which to analyze this type of richer locational dataset.

John Verdi of the Electronic Privacy Information Center warned companies against the huge liability of retaining user location data unnecessarily. The personal nature of this data makes it a potential goldmine for civil litigators, e.g. a divorce attorney very interested to know where his client’s husband was when out at night. Retained personal data can create a temptation for a company’s own employees to snoop as well, so the best way to avoid legal headaches related to privacy invasion is to store only the minimal amount of location data necessary to drive one’s services. This is, of course, easier said than done. Today we rarely understand the full value of the data we collect until well after it has been collected. Pair this fact with the rapidly falling cost of computer storage, and you have powerful incentives to store as much user data as is available, and worry about what to do with it later. Perhaps in the future we’ll have a personal data ownership framework where individuals will control complete datasets of their own behavior, and can choose to expose it to another company or organization if the incentives are right.

Perhaps the biggest unanswered question of the day was how to handle what might be called “peer-to-peer privacy”. What do you do when someone else posts a photo with you in it on Facebook against your will? This is an area where even the most well considered, easily comprehensible, and user friendly online privacy frameworks can still fail to preserve user privacy. It remains to be seen what mix of standards, technology, legal regulations, and social norms will help address this latest generation of privacy concerns.

Sneak Peak 2: Chattrbox

A fresh take on online community chat: real-time, lightweight, object-oriented sociality.

Google AppEngine Hack-a-thon

AppEngine swag

AppEngine swag

Google held a hack-a-thon at their New York office today, showing off their new AppEngine platform and giving developers a chance to take it for a test drive. My python programming experience is limited to some cramming with the O’Reilly book over the last few days, but it was still fairly easy to get a basic web application up and running. AppEngine looks to be an interesting new addition to the cloud computing marketplace.

Unfortunately, for my current needs, AppEngine has some critical limitations. There is no filesystem access, meaning I can’t use any python libraries which require the filesystem for processing. In addition, the only way to pull data into your application is through the URL Fetch API, which is limited to files one megabyte or smaller. This restriction dashed my plans to build an mp3 metadata indexing service on the platform, since a typical mp3 file is at least 5 megabytes today.

While preventing filesystem access may be a fundamental piece of the AppEngine security architecture, it seems likely that other limitations, such as that on URL Fetch, will be removed as the platform matures. I’m hopeful Google will move quickly to remove these barriers and make AppEngine a more suitable platform for a broad spectrum of web applications.

Sneak Peak 1: Street Cred

Pop culture is one of the primary drivers of the ebbs and flows in popularity of major consumer brands. While explicit celebrity endorsements have a long history as a marketing tool, the market exposure that brands gain from being mentioned in musical lyrics is perhaps a more pervasive form of influence. Street Cred is an experiment in brand analytics which tracks and analyzes these brand mentions in the lyrics of hip hop tracks. Each week, the most played hip hop tracks across the country are logged. Street cred searches the internet for the transcribed lyrics of each track, then runs these lyrics through a regular expression engine tuned to capture mentions of the top consumer brands. A measure of consumer exposure is calculated for each brand: its Street Cred Score. This continuously-updated metric blends the current airplay numbers for the most-popular hip hop tracks with the brands mentioned in the lyrics to determine which brands have the most Street Cred. Here’s a preview.

Top 10 tracks for the week:

The Brand Cloud, where the size of each brand is proportional to its current Street Cred score:

A sample brand intelligence page, this one for Victoria’s Secret, showing its history of brand mentions: