Use your widget sidebars in the admin Design tab to change this little blurb here. Add the text widget to the Blurb Sidebar!

Startups @ Scale: Make the abstract actionable

Posted: October 9th, 2011 | Author: owocki | Filed under: startups | Tags: , | No Comments »

This post is pretty technical. If you don’t cross your 1s or circle your 0s, then it’s probably best to move on to something more fruitful for you, business monkey.

I’ve always thought that a major challenge in building a dev team is continuously improving how effectively you can respond to changes in your metrics day-to-day. One of those tasks I face is a sweep daily of our error logs. If you, like us, run a website, and you’re properly logging everything, then your error logs probably look something like this:

(These error logs have been scrubbed of any actual usage or error data, and their use has been approved by my employer)

Oct 7 01:34:14 10.182.41.217 httpd: app14 JSError http://www.DOMAIN.com/Inbox/ 31732521Script error.0 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 [version1.xxx] [OnErr:] [_GET:uerystring:EL/,E:JSError,EI:http://www.DOMAIN,EI2:31723 Script erro,type:error]

Oct 7 01:34:22 10.182.36.33 httpd: app1 404 http://ww1.DOMAIN.com/http://ww1.DOMAIN.com/Referred from UID Email: None IP:66.249.171.179 Location – Los Altos, CA [version1.xxx]

… etc ad nauseum ….

The problem: a lack of actionable information

Confusing? Probably. Boring? Absolutely. I’d be willing to bet that your eyes just skipped right past those error logs to the beginning of this paragraph. Getting one of these in my inbox every morning is pretty-much a recipe for a wild goose hunt every morning. (and not-so-fun one at that) It’s a real gumption trap for my team.

Providing Value

This post is all about making that wild goose chase into an effective process. With that in mind, I recently delivered a project to increase the effectiveness of Ignighter’s error log maintenance process. There are some questions that I’ve set out to automatically answer for my team, and that’s what this post is about.

  • What’s the root cause of each error?
  • What is it’s urgency?
  • How often is it happening?
  • What does the timeline for these errors look like?
  • Who’s in charge of figuring them out?

If you ever work for an Ignighter development team, here is an example of an error log email you might get from me.

Subject: Error Action Items for Thursday 10/06/2011 (109629 items)

LEADERBOARD:

	kevin : 941 	(801 phpErr's,140 SQL errs )
	steve : 56427 	(56408  unserializableClsoure errs, 2 Redis Errs, )		- you're way past your ballmer peak bro
	mike : 59529 	(56468 EmailWithoutMessageException errs, 3061 404 errs)		- every day this happens, a fairy dies
	john : 353 	(353 phpOutOfMem errs )		- less than 500.  keep it up!
	joe : 20975 	(20975 JSError's )		- you MUST watch this video today ==>

The error log aggregator is the arbitrator of cleanup responsibility. The first thing in the email ‘leaderboard’ which heckles a developer if a service they are maintaining has many errors! That means I don’t need to send needy emails to a developer asking them to clean things up anymore. It’s worth noting that heckling your team will only work if you have the kind of culture that supports lighthearted fun and constructive criticism. ( we do )

Next, we display the most prevalent items in the error log. Nothing fancy here, but I now see what the most pressing issues are.

(Again, These error logs have been scrubbed of any actual usage or error data)

LOG SUMMARY (20975 items):

	56468	EmailWithoutMessageException
	56408	unserializableClsoure
	20975	JSError
	2466	404
	800	phpErr
	353	phpOutOfMem
	140	SQL
	2	RedisConnectInstance 

From there, I aim to provide as much actionable detail about each error type as possible for the critical types. See below example of MySQL errors (of which much Ignighter-specific information has been scrubbed).

So, what?

Now that we’ve all aware of the volume/priority/assignee of each type of issue our system is encountering, we’re all much more efficient.

Want to build your own? If you’re any good with awk or graphite, you could probably do the same for your team with a modest hour or two investment.

How does your team keep on top of application metrics and logs? Leave a comment below.

If you’re a developer who is looking to work in an efficient, fun, environment that empowers you, check out Ignighter’s open positions..

Note: The usage of any information of proprietary value to my employer has been removed, and this post has been approved by my employer.


Startups @ Scale: Log Everything, then you can Manage Anything.

Posted: June 22nd, 2011 | Author: owocki | Filed under: startups | Tags: , , , | 1 Comment »

One thing that hasn’t changed during the span of my time at Ignighter is the importance of our in-house analytics.  Ever since our first lecture at Techstars 2008, when we were prodded to “obsess over core metrics”, we’ve been obsessed with our usage data.  Having the right information on-demand is essential to being nimble in your decision making as a management team

“If you aren’t measuring it, you can’t manage it” – Greg Tisch

As CTO, the responsibility of maintaining our business intelligence infrastructure has fallen to me. So I’ve been logging anything that’s remotely significant to our decision making process.  We’ve been doing this for years and it’s not my first rodeo, but shit, so many things have changed as we’ve scaled.

Many of these things may evolve for your project too.

  1. The business model
  2. The volume of data
  3. The usage patterns
  4. The systems
  5. The technical architecture
  6. The reporting system
  7. The market

When we first started the company, I was logging all of our usage data to our MySQL database.   A DB write (maybe several) on every pageload!? Boy, was that dumb!  It didn’t take long before the site was crippling over the load of our usage. One of the first rules of disk-bound databases is that writes are the most expensive operation you can perform.  Even when you set up a master-slave MySQL, you cannot scale writes by much, since all write queries must be performed on the slave in order to keep it up to data!

Let me give you a snapshot of how things look nowadays, when we’re on the order of many many millions (maybe more – I’m not at liberty to say) of loggable operations daily.

  1. We’ve built a Logger class which allows us to pass in error, debug, user usage, user statistics, and general info to either the filesystem or into the database.
  2. For flat filesystem logs use open source syslog, syslog-ng to log operations as they happen.  Since syslog-ng can support up to 150k loggable operations per second, it’s an ideal tool.
  3. For database logs, we use memcached as a buffer.  Basically, what you want is an ‘Aggregated Stat’ class, which has an interface that updates a counter in memcache every time an action happens, then periodicly flushes the results to the database.

After this you just need to decide what’s relevant and loggable, and whether to put it in the filesystem or database.  Filesystem logging is more scalable, but there’s advantages to having data in our database too.  There’s no way to query your flat logs from your application.   For example, in the application, I like being able to know how many times user x has logged in the past month.  That’s as easy as a

mysql> SELECT SUM(`Value`) as `NumLogins` FROM `AggregatedStats` WHERE `Segment1` = 'LoginsByUser' and `Segment2` = '[uid]' AND `AddDate` > (UNIX_TIMESTAMP() - 60 * 60 * 24 * 30) LIMIT 1

Whereas, with the flat logs, it looks much more like:

$ cat LOG_STATS.log | awk -F'\t' '{print $3$4$5}' | grep LoginsByUser | grep [uid] | wc -l

Now I can access data about anything at any time.  This system scales, and it’s nibble enough to handle queries you did not foresee.  Of course, having the ability to view this information does not mean anyone’s actually going to do it.

As a matter of practicality, I’ve found it useful to provide the following tools (and make sure they are blazing – fast ).  All of them are plain-vanilla open-source and 100% FREE too!

  • Nightly email script that rolls up the ERROR and STAT logs, and sends the most interesting tidbits to the team on a nightly basis.
  • Make the data available  in our open source graphing system, Graphite.  I’m a huge huge fan of graphite.  Importing data into it is as easy as writing script that scrapes the flat logs periodically and passing into an included python script.  Big ups to Esty for letting me know about graphite. Check out these sample graphs from their implementation of graphite:
     

    Did I mention that I’m a super-fan of graphite yet?  It’s super nimble, fast, and it scales.  If you choose one tool from this post to implement, choose graphite.

  • Plug the data into your team’s private twitter-bot.
    Here’s a sample tweet.  Note this data is not actual usage data.
  • Make the data available the your admin section of your application.  I’ve found it useful to write queries that I frequently run, give them a name that even the business monkeys can understand, and make them available to everyone via a ‘reports’ section.  (Just kidding Adam and Dan, you’re not monkeys)
  • Make the data available via a board-level reporting system that ONLY includes key metrics.   The exclusivity of this reporting system is what makes it special.  Only the KEY metrics make it into here!


  • I have a data porn (get it, cause data is fun to look at? ;) ) box with several monitors in the office to show me how everythings going for the past 24 hours.    I especially like chartbeat for this.
  • Nagios is a great tool for informing your team of Systems issues. Munin allows you to see system-level information (CPU usage, load average, network transfer, swap i/o) over time.
     

 

Informed decision making made easy!  Watch out Zoltar, Now even us mere mortals can tell you anything about anything.

 

The usual warnings apply.  These are all just ideas and your mileage may vary based upon your technical ability, execution, and your gumption. I’d love to hear what your team uses and how it compares to what I’ve outlined in this post!  Leave me a tweet or a comment below.

Did you know? Ignighter is hiring.  We’re based in NYC, work hard, have a lot of fun, build cool shit, and we’re backed by some of the best investors in the business . Check out our open development positions.

 

 

 

 

Note: Any information of proprietary value to my employer has been removed or approved, and this post has been approved by my employer.


Know a kick-ass PHP developer in NYC? Ignighter is hiring!

Posted: February 17th, 2011 | Author: owocki | Filed under: startups, Uncategorized | Tags: | No Comments »

Ignighter is hot off the heels of our Series A and is hiring part-time PHP developers in NYC.

We’re a Venture-funded team of 6. We’re young, fun, we’ve got some rapid growth in our target market, but we’ve got a chip on our shoulders and a lot of work to do. We’ve been in the game for a few years, we earned our wings during Techstars Boulder 2008.

Do I fit the profile? If you’re young, hungry, do fantastic work, are looking to make an impact and make some new friends, then you just might be the hacker-ninja-badass we’re looking for. For a list of technical skills, check the deets here: http://newyork.craigslist.org/mnh/eng/2218416800.html

What’s in it for me? Great learning experience and the chance to get hooked into the NYC startup scene. A chance to sharpen your skillz, make a few new friends, and be a part of something big. Oh, and we’ll pay you.

Sounds sweet, what’s next? Send us a blurb about you, your resume, and a link to some of your work. If we like you, we’ll reach out and invite you to our sweet Union Square Office for coffee.

Know someone who might qualify? Bonus points / drinks on me if you pass this post around.

Note: Any information of proprietary value to my employer has been removed or approved, and this post has been approved by my employer.