Use your widget sidebars in the admin Design tab to change this little blurb here. Add the text widget to the Blurb Sidebar!
Posted: October 9th, 2011 | Author: owocki | Filed under: startups | Tags: devops, startupsatscale | No Comments »
This post is pretty technical. If you don’t cross your 1s or circle your 0s, then it’s probably best to move on to something more fruitful for you, business monkey.
I’ve always thought that a major challenge in building a dev team is continuously improving how effectively you can respond to changes in your metrics day-to-day. One of those tasks I face is a sweep daily of our error logs. If you, like us, run a website, and you’re properly logging everything, then your error logs probably look something like this:
(These error logs have been scrubbed of any actual usage or error data, and their use has been approved by my employer)
Oct 7 01:34:14 10.182.41.217 httpd: app14 JSError http://www.DOMAIN.com/Inbox/ 31732521Script error.0 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 [version1.xxx] [OnErr:] [_GET:uerystring:EL/,E:JSError,EI:http://www.DOMAIN,EI2:31723 Script erro,type:error]
Oct 7 01:34:22 10.182.36.33 httpd: app1 404 http://ww1.DOMAIN.com/http://ww1.DOMAIN.com/Referred from UID Email: None IP:66.249.171.179 Location – Los Altos, CA [version1.xxx]
… etc ad nauseum ….
The problem: a lack of actionable information
Confusing? Probably. Boring? Absolutely. I’d be willing to bet that your eyes just skipped right past those error logs to the beginning of this paragraph. Getting one of these in my inbox every morning is pretty-much a recipe for a wild goose hunt every morning. (and not-so-fun one at that) It’s a real gumption trap for my team.

Providing Value
This post is all about making that wild goose chase into an effective process. With that in mind, I recently delivered a project to increase the effectiveness of Ignighter’s error log maintenance process. There are some questions that I’ve set out to automatically answer for my team, and that’s what this post is about.
- What’s the root cause of each error?
- What is it’s urgency?
- How often is it happening?
- What does the timeline for these errors look like?
- Who’s in charge of figuring them out?
If you ever work for an Ignighter development team, here is an example of an error log email you might get from me.
Subject: Error Action Items for Thursday 10/06/2011 (109629 items)
LEADERBOARD:
kevin : 941 (801 phpErr's,140 SQL errs )
steve : 56427 (56408 unserializableClsoure errs, 2 Redis Errs, ) - you're way past your ballmer peak bro
mike : 59529 (56468 EmailWithoutMessageException errs, 3061 404 errs) - every day this happens, a fairy dies
john : 353 (353 phpOutOfMem errs ) - less than 500. keep it up!
joe : 20975 (20975 JSError's ) - you MUST watch this video today ==>
The error log aggregator is the arbitrator of cleanup responsibility. The first thing in the email ‘leaderboard’ which heckles a developer if a service they are maintaining has many errors! That means I don’t need to send needy emails to a developer asking them to clean things up anymore. It’s worth noting that heckling your team will only work if you have the kind of culture that supports lighthearted fun and constructive criticism. ( we do )
Next, we display the most prevalent items in the error log. Nothing fancy here, but I now see what the most pressing issues are.
(Again, These error logs have been scrubbed of any actual usage or error data)
LOG SUMMARY (20975 items):
56468 EmailWithoutMessageException
56408 unserializableClsoure
20975 JSError
2466 404
800 phpErr
353 phpOutOfMem
140 SQL
2 RedisConnectInstance
From there, I aim to provide as much actionable detail about each error type as possible for the critical types. See below example of MySQL errors (of which much Ignighter-specific information has been scrubbed).

So, what?
Now that we’ve all aware of the volume/priority/assignee of each type of issue our system is encountering, we’re all much more efficient.
Want to build your own? If you’re any good with awk or graphite, you could probably do the same for your team with a modest hour or two investment.
How does your team keep on top of application metrics and logs? Leave a comment below.
If you’re a developer who is looking to work in an efficient, fun, environment that empowers you, check out Ignighter’s open positions..
Note: The usage of any information of proprietary value to my employer has been removed, and this post has been approved by my employer.
Posted: June 22nd, 2011 | Author: owocki | Filed under: startups | Tags: business intelligence, knowing your shit, logging, startupsatscale | 1 Comment »
One thing that hasn’t changed during the span of my time at Ignighter is the importance of our in-house analytics. Ever since our first lecture at Techstars 2008, when we were prodded to “obsess over core metrics”, we’ve been obsessed with our usage data. Having the right information on-demand is essential to being nimble in your decision making as a management team
“If you aren’t measuring it, you can’t manage it” – Greg Tisch
As CTO, the responsibility of maintaining our business intelligence infrastructure has fallen to me. So I’ve been logging anything that’s remotely significant to our decision making process. We’ve been doing this for years and it’s not my first rodeo, but shit, so many things have changed as we’ve scaled.
Many of these things may evolve for your project too.
- The business model
- The volume of data
- The usage patterns
- The systems
- The technical architecture
- The reporting system
- The market
When we first started the company, I was logging all of our usage data to our MySQL database. A DB write (maybe several) on every pageload!? Boy, was that dumb! It didn’t take long before the site was crippling over the load of our usage. One of the first rules of disk-bound databases is that writes are the most expensive operation you can perform. Even when you set up a master-slave MySQL, you cannot scale writes by much, since all write queries must be performed on the slave in order to keep it up to data!
Let me give you a snapshot of how things look nowadays, when we’re on the order of many many millions (maybe more – I’m not at liberty to say) of loggable operations daily.
- We’ve built a Logger class which allows us to pass in error, debug, user usage, user statistics, and general info to either the filesystem or into the database.
- For flat filesystem logs use open source syslog, syslog-ng to log operations as they happen. Since syslog-ng can support up to 150k loggable operations per second, it’s an ideal tool.
- For database logs, we use memcached as a buffer. Basically, what you want is an ‘Aggregated Stat’ class, which has an interface that updates a counter in memcache every time an action happens, then periodicly flushes the results to the database.
After this you just need to decide what’s relevant and loggable, and whether to put it in the filesystem or database. Filesystem logging is more scalable, but there’s advantages to having data in our database too. There’s no way to query your flat logs from your application. For example, in the application, I like being able to know how many times user x has logged in the past month. That’s as easy as a
mysql> SELECT SUM(`Value`) as `NumLogins` FROM `AggregatedStats` WHERE `Segment1` = 'LoginsByUser' and `Segment2` = '[uid]' AND `AddDate` > (UNIX_TIMESTAMP() - 60 * 60 * 24 * 30) LIMIT 1
Whereas, with the flat logs, it looks much more like:
$ cat LOG_STATS.log | awk -F'\t' '{print $3$4$5}' | grep LoginsByUser | grep [uid] | wc -l
Now I can access data about anything at any time. This system scales, and it’s nibble enough to handle queries you did not foresee. Of course, having the ability to view this information does not mean anyone’s actually going to do it.
As a matter of practicality, I’ve found it useful to provide the following tools (and make sure they are blazing – fast ). All of them are plain-vanilla open-source and 100% FREE too!

Informed decision making made easy! Watch out Zoltar, Now even us mere mortals can tell you anything about anything.
The usual warnings apply. These are all just ideas and your mileage may vary based upon your technical ability, execution, and your gumption. I’d love to hear what your team uses and how it compares to what I’ve outlined in this post! Leave me a tweet or a comment below.
Did you know? Ignighter is hiring. We’re based in NYC, work hard, have a lot of fun, build cool shit, and we’re backed by some of the best investors in the business . Check out our open development positions.
Note: Any information of proprietary value to my employer has been removed or approved, and this post has been approved by my employer.
Posted: June 18th, 2011 | Author: owocki | Filed under: Uncategorized | Tags: startupsatscale | 2 Comments »
I’ve been thinking how much things have changed lately at Ignighter. We’re starting to get some press, a bunch of daily registrations, and a bunch more messages piping through our once-rinky dating website. It’s beginning to feel like we’re not such a small startup anymore.
When milestones like those pass, it really changes the way your team builds software. These days, I’m obsessed with building at scale. Back when we founded Ignighter back in 2008, we built a system that was as efficient and scalable as an old rusty tricycle. For the past 3 years, we’ve re-engineered (and re-engineered, and re-engineered) the system, and these days it feels like we’re building a jetliner right just as it’s taking off.
It’s a lot of fun.
Anyway, I wanted to share a new project I’m working on to help me keep an eye on everything as we fire up the afterburners. This week, we launched the Ignighter Early Warning System. It’s a private twitter account that keeps an eye on our logs and let’s me (and my team) know when there is significant movement, up or down, in them. I’ve configured it to tweet out changes in our error, stats, info, or debug logs. And since it’s entirely homebrew, it’s completely customizable.


These metrics are for illustration purpose only; They do not represent actual Ignighter.com usage.
Since our developers are all already on it, it makes sense to share this information on twitter. While this system supplements, not replaces, traditional regression and Unit Testing, this project allows us to act on production issues before they snowball into a full-blown catastrophe.
Time from hallucination to first iteration: 3 hours. If you’re interested in building one for your project: here are the tools I used:
- Syslog-NG to aggregate our logs
- Plain vanilla bash scripting, scheduled via a crontab
- TTYTwitter to tweet updates
- And, of course, twitter
Next up, I’m looking to extend the project to text us when a particularly egregious swing our statistics occurs.
If you build your own, I’d love to hear about it! Leave me a tweet or a comment below.
Did you know? Ignighter is hiring. We’re based in NYC, work hard, have a lot of fun, build cool shit, and we’re backed by some of the best investors in the business . Check out our open development positions.
Note: Any information of proprietary value to my employer has been removed or approved, and this post has been approved by my employer.