Working at DataSift

DataSift

I have worked at DataSift  since the beginning of 2011 and a lot has changed since then. We were around 15 employees and all based in a small office in Reading. Since then we have expanded like crazy and have 120 employees spread over offices in Reading, San Francisco and New York.

So what do we do? We are the only independent reseller of social data in the world. We ingest data from tons of sources and that list is expanding all the time, for instance we are one of a handful of companies in the world that has access to firehoses from Twitter, Tumblr, WordPress, Disqus and many more. These streams give us every single public tweet/post/photo/reblog in real-time, equating to tens of thousands of pieces of data per second! Most of these sources are in real-time and we help brands, agencies, researchers, analysts and many other types of companies, access all this data to get more information on what their customers, competitors and audiences are doing. This is all in real-time with sub-second latency from end to end. Some great use cases:

  • Using social media data to help make casting decisions for major movie productions
  • Customer service systems
  • Monitoring performance of TV adverts/programmes

Where are we located?

Engineering remains solely in our Reading office in the UK. There are around 30 of us that write code; we are a pretty close team and have a great time solving problems you won’t see at most other companies. Problems like:

  • How do you make petabytes of data searchable quickly?
  • How do we deliver gigabytes of data into a customers data store?
  • How do we filter tens of thousands on tweets, tumblr posts, bitly clicks, news articles, wordpress posts and comments in real-time with a latency of under a second from end to end?
  • How do we also resolve every link in those pieces of data?
  • How do we detect the language of every piece of content flowing through the platform?

What do our engineers do?

We solve problems like these with a wide variety of languages. Most of the engineers in the office know at least 2 of these. Personally I know PHP, JavaScript and Java:

  • PHP – Powers a chunk of our real-time filtering engine, 90% of our Push Delivery system and 90% of our services for managing authentication etc
  • JavaScript (node.js) – Powers our real-time streaming API
  • Java/Scala – Powers our historics cluster (petabytes of tasty datas!) and managed input sources
  • C++ – Powers the core of our filtering engine that processes thousands of pieces of data a second
  • Ruby – Powers lots of devops management tools
  • Python – Gradually sneaking into our devops tools and some other services
  • Go – This too is gradually sneaking onto the platform

What does the DataSift platform look like?

This is what the platform looks like. It has increased in size quite drasitcally of the time I’ve been here. This should give you an idea of the scale of the platform and how we’re able to handle so much date so quickly and efficiently.

DataSift Architecture

DataSift Architecture (click for a bigger version)

What does my team do on the DataSift platform?

I’m the Engineering Team Lead for Delivery, so we look after all the possible ways we can get data to our customers, this includes our Push data destinations. These are third-party databases, servers and services that we push data to but we also have a Streaming API. On the architecture diagram we are the big red box on the lower right size (and the small blue box below it). Say you want to see a stream of all tweets and tumblr posts related to the World Cup, we can deliver them to you in many ways:

  • Live real-time HTTP or WebSocket stream
  • Fetch from our API
  • POST to your HTTP server
  • Amazon S3 bucket
  • MySQL database
  • Redis queue or set
  • ElasticSearch index
  • MongoDB collection
  • FTP server for old-school file delivery
  • SFTP for the security concious old-school file delivery
  • Splunk Storm/Enterprise
  • Google BigQuery
  • Stream to our HDFS cluster for further analysis

In addition to that, most of the PHP engineers, work on the smaller services of the platform that handle tasks such as rate limiting, licensing, billing, authentication, QA tools and monitoring.

How do we have fun?

All of the above is great fun and very satisfying to work on, but we also know how to have fun. Firstly, we are very well taken care of! We get lunch every single day from a local catering company, we also have a huge fridge for a cold drink should you fancy one and snacks to keep you going. We also have a massive stack of board games that we play on a dedicated games night (and during lunch!) where we get pizza or takeaway ordered for us. We also have a pool table and a table tennis table to blow off steam!

A few times a year we will have a trip out somewhere, in the last year we’ve been go-karting several times, been to theme parks and had  meals out.

What is our office like?

At my desk alone I have a laptop and desktop (you have a choice of any Linux distro, Mac or Windows) all paid for by DataSift, our bank of 6 desks has its own monitoring screen with a Raspberry Pi, Apple TV and Chromecast! We either have metrics or a live stream of the International Space Station on ours! In the canteen area we have a 5m wide projector screen which is perfect for watching the Formula 1 at a weekend with extra monitors rolled in for live timings, we do like to geek out on that sort of stuff.

Can you spot the hidden motive??

If you haven’t guessed by now, we are expanding quickly and need software, test and devops engineers! We have quite a few open positions across various teams. Not just engineering too. If you are interested at all, you can use the links below to see full job descriptions or get in touch with me via Twitter @OllieParsley or ollie[at]datasift[dot]com.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s