Browsed by
Category: data

Why Open Source a High Frequency Trader?

Why Open Source a High Frequency Trader?

I built a cryptocurrency trading framework in January, and open sourced it as 'pytrader' 2 weeks ago.  Among the wave of interest, there were a handful of cynics: npx writes, on HN: I can't imagine why anyone would actually want to release a profitable trader. Isn't this kind of a tautologically dumb thing to do? .. I just can't see any motivation to make it profitable and then issue a pull Read the article >>
Data Warehousing #5: Dimensional Design Process

Data Warehousing #5: Dimensional Design Process

This is the 5th in series of posts on datawarehousing.  To see the entire post list, click here. The Dimensional Design Process The four key decisions that are made when designing a Kimball-style dimensional star schema are as follows: Identify the Business Process Identify the Grain Identify the Dimensions Identify the Facts When embarking upon the design process, your final deliverable Read the article >>
Data Warehousing #4: Star Schemas

Data Warehousing #4: Star Schemas

This is the 4th in series of posts on datawarehousing.  To see the entire post list, click here. Star Schemas A star schema is the simplest type of data mart in dimensional modeling.  A star schema is one or more fact tables foreign key'd to any number of dimensions, and, when viewed through a visual schema planning tool, looks like a star: ---------------------------------------------------------------------------- |_______dim_user Read the article >>
Data Warehousing #3: Dimensions vs. Facts

Data Warehousing #3: Dimensions vs. Facts

This is the 3rd in series of posts on datawarehousing.  To see the entire post list, click here. Onto the schema design portion of the the series! There are two main types of tables used to store information in a data warehouse: Table Type #1: Dimension Table A dimension table contains the attributes by which users will query your data warehouse.  They are the content of the WHERE clause in Read the article >>
HNTrends: Google Trends For Hacker News

HNTrends: Google Trends For Hacker News

I've been brushing on technologies since putting in my notice at Simple Energy, and I'm proud to announce the launch of my latest product, built in Ruby on Rails with gratuitous reliance on Amazon CloudSearch as my persistence layer : HNTrends.com: Google Trends for Hacker News TLDR -- Check it out here, click here to download the source data as a PostGres Database.  The thesis behind this Read the article >>
Data Warehousing #2: Data Flow

Data Warehousing #2: Data Flow

This is the 2nd in series of posts on datawarehousing.  To see the entire post list, click here. Independent Data Marts Like many others, I've worked in organizations which have evolved data marts like the one below: These systems are characterized by the necessity to query from multiple independent sources in order to meet a business objective.  Not only is this an inefficient way of processing Read the article >>
Data Warehousing For Fun & Profit

Data Warehousing For Fun & Profit

Coming from a world of B2C web applications, I am more than familiar with the reporting challenges associated with pulling analytics from transactional databases: (1) Profile information is often overwritten when UPDATEs are performed, (2) Information is segmented in different systems and worse, with different data access methods (If you want a headache, try writing an R script to mash up information Read the article >>
The data will set you free. What does Twitter sentiment analysis say about major Airlines?

The data will set you free. What does Twitter sentiment analysis say about major Airlines?

This Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").  Thanks CrowdFlower for the data. Raw number of tweets, segmented by Sentiment Comments United had the highest raw volume of negative tweets. US Airways had the highest Read the article >>
Data Scientists Slack Chat Team

Data Scientists Slack Chat Team

datascientists.slack.com is a global community of startup data scientists, data warehousers, and BI types.  We currently have 90 members from all over the world:   The top skills of the contributors are as follows: If you munge data for a living (or you want to) click here to request an invite.. Read the article >>