Working at DataSift

DataSift

I have worked at DataSift  since the beginning of 2011 and a lot has changed since then. We were around 15 employees and all based in a small office in Reading. Since then we have expanded like crazy and have 120 employees spread over offices in Reading, San Francisco and New York.

So what do we do? We are the only independent reseller of social data in the world. We ingest data from tons of sources and that list is expanding all the time, for instance we are one of a handful of companies in the world that has access to firehoses from Twitter, Tumblr, WordPress, Disqus and many more. These streams give us every single public tweet/post/photo/reblog in real-time, equating to tens of thousands of pieces of data per second! Most of these sources are in real-time and we help brands, agencies, researchers, analysts and many other types of companies, access all this data to get more information on what their customers, competitors and audiences are doing. This is all in real-time with sub-second latency from end to end. Some great use cases:

  • Using social media data to help make casting decisions for major movie productions
  • Customer service systems
  • Monitoring performance of TV adverts/programmes

Continue reading

Building a PHP RPM with the OCI PDO Driver on CentOS

If you are reading this post then you are either curious about what it’s about, you are investigating using PHP + Oracle on CentOS and building your own PHP RPM, or you’re familiar with the your RPM install complaining about libclntsh.so.11.1. This post should hopefully help you out.

Oracle Instant Client RPM

The first thing to note is that the RPM’s that Oracle supply will work fine when building PHP and setting the –with-pdo-oci=instantclient,/usr/,11.2 flag. But as soon as you try to install PHP and it brings in the Instant Client dependency, or if it is already installed you will see the following error:

Requires: libclntsh.so.11.1()(64bit)

You may well, like me, spend hours investigating why this occurs especially when

  • I can see the file is installed
  • I’ve symlinked it everywhere I could think of
  • Specified a custom ld.conf.d config with the location of the .so
  • I’ve installed the basic, devel and sdk versions of instant client

So you’ll be pleased to know that it isn’t your fault at all! The Oracle-built RPMs done support the “provides” arugment. Meaning that the file may well be on the filesystem, but if the instant client RPM doesn’t explicity declare that it provided this file then you will have dependency issues. Thanks to this CentOS forum for helping me! So the solution is to rebuild the Oracle instant client libraries yourself or find another reputable source. I used this RPM: http://rpm.pbone.net/index.php3/stat/4/idpl/23447083/dir/centos_6/com/oracle-instantclient-11.2.0.2.0-11.2.x86_64.rpm.html you should notice that the “Provides” sections correctly list libclntsh.so.

Building PHP

Now that you have an RPM that works, you can move on to building PHP! All you need to do is set the –with–pdo-oci flag.

./configure --with-pdo-oci=instantclient,/usr/,11.2

Bish bash bosh!

That should be you all sorted :). Hopefully this was useful to you, I pulled my hair out for several days trying to get to the bottom of it. Thankfully it’s all grown back now!

Linux + PHP + PDO -> SQL Server

From the title of this post you might think, why are you doing this? What self-respecting software engineer wants to connect from a Linux distro to a Microsoft SQL Server instance? Well, never you mind!! Quit being nosey!! I thought I’d share my experience of trying to do this, partly to share the knowledge and partly to remind myself in a few years when I need to upgrade or revisit my code. It was an “interesting” and not at all frustrating experience…

Word of warning. You’ll never have as good an experience connecting from Linux to SQL Server as you would with PHP running on windows. This is the best solution that I could find without resorting to using Windows ew!

FreeTDS

FreeTDS is a library and a few binaries that allow you to connect to SQL Server and Sybase instances. It is available in most package manager repos. For instance I tried both Ubuntu (apt) and CentOS (yum). All you need to do is install the devel package as we’ll be building PHP from source.

Ubuntu

sudo apt-get install freetds-dev libxml2-dev

CentOS

sudo yum install freetds-devel libxml2-devel

PHP

At the time of writing this I am migrating from php 5.3 to php 5.5. So I’ll give you instructions for both version. The first thing to note is that you need to build PHP yourself are there are several configure flags that need to be set. So lets get on and install it.

Get the source

wget http://uk3.php.net/distributions/php-5.3.28.tar.bz2
tar -vxf php-5.3.28.tar.bz2
cd php-5.3.28

Configure

Now that we have the source and we’re in the correct directory we need to configure our PHP install. We need to enable a both PDO support and the “dblib” PDO driver, which will be using our FreeTDS library under-the-hood. Please not this will not enable the mssql_* functions, it will just enable us to connect to a SQL Server instance using PDO classes/methods.

./configure --with-pdo-dblib

If you build PHP yourself normally you’ll probably have other configure flags to set. This shouldn’t conflict with any of them and can safely be added to your existing list of arguments.

Make

We’ll this step isn’t too difficult

make
sudo make install

Done!

Now we can use PHP to connect to a SQL Server instance:

<?php
try {
    $pdo = new PDO("dblib:my.database.com:1449;dbname=mydb", "myusername", "mypassword");
    $result = $pdo->query("select * from mytable");
} catch (PDOException $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

I hope this is useful to you. Please let me know if it was, or if you had any issues.

Using MQTT as a GCM replacement for android push notifications

mqtt

The first thing you might think while reading the title is, why would you not want to use Google Cloud Messaging for android push notifications? Well let me justify it.

I need to support the Kindle Fire and as many other android forks as possible, I’d like to support Android 2.2 and below. Amazon done have push notifications yet, at the time of writing they have a beta push notification service, but I can’t wait for an indeterminate amount of time for that to some out of beta before being able to release an app. I also wanted to support one version of the app and not have different versions for the different app market places and android fork.

To allow push notifications on any android fork I need a transport mechanism that will work across them all. To do this I researched quite a few alternatives. I happened to come across MQTT and discovered that Facebook use an MQTT background service to receive messages to and from the Facebook and Facebook Messenger apps.

What benefits does MQTT have over GCM?

  • Pub/Sub: A client can subscribe to multiple topics
  • Quality of Service (QoS): GCM is pretty much “fire and forget”. You receive acknowledgement that the message has been received but the server, but not that it is delivered to the client, so you will never know if the client got the message unless they communicate back to the server (which my code used to do). MQTT has 3 levels of QoS. Which vary from “fire and forget” to full acknowledgement that the client received the message.

But there are a few things you might miss if you move away from GCM:

  • The massive Google infrastructure
  • You don’t need to manage your own long-lived TCP connections in a service

For my app I ended up doing was using a similar technique to Facebook. I created a new Service has a permanent connection open to an MQTT broker on a server I host with Linode (which I highly recommend). It receives a message on a topic specific to the device that is connected to it, then it broadcasts a message in a similar way to GCM does. This meant that the rest of my code needed minimal changes in order to support this new transport. The code was based on an awesome blog post from Dale Lane. I recommend giving that a good read through, it goes into a lot of detail and a full implementation of the service.

If you are interested, the app I am developing that uses this technique is Lockate, android mobile security software.

The blog is back

For several years now I’ve been redirecting my ollieparsley.com domain to my about.me page. I’ve been working on several small projects and wanted a place to talk about them and blog about things I care about, the easiest way was to sort my site out again. This was part due to the fact that about.me haven’t implemented the most requested feature of supporting CNAME’s.

Anyway enough of a mini rant, enjoy my blog! :p