Working at DataSift

DataSift

I have worked at DataSift  since the beginning of 2011 and a lot has changed since then. We were around 15 employees and all based in a small office in Reading. Since then we have expanded like crazy and have 120 employees spread over offices in Reading, San Francisco and New York.

So what do we do? We are the only independent reseller of social data in the world. We ingest data from tons of sources and that list is expanding all the time, for instance we are one of a handful of companies in the world that has access to firehoses from Twitter, Tumblr, WordPress, Disqus and many more. These streams give us every single public tweet/post/photo/reblog in real-time, equating to tens of thousands of pieces of data per second! Most of these sources are in real-time and we help brands, agencies, researchers, analysts and many other types of companies, access all this data to get more information on what their customers, competitors and audiences are doing. This is all in real-time with sub-second latency from end to end. Some great use cases:

  • Using social media data to help make casting decisions for major movie productions
  • Customer service systems
  • Monitoring performance of TV adverts/programmes

Continue reading

Advertisements

Building a PHP RPM with the OCI PDO Driver on CentOS

If you are reading this post then you are either curious about what it’s about, you are investigating using PHP + Oracle on CentOS and building your own PHP RPM, or you’re familiar with the your RPM install complaining about libclntsh.so.11.1. This post should hopefully help you out.

Oracle Instant Client RPM

The first thing to note is that the RPM’s that Oracle supply will work fine when building PHP and setting the –with-pdo-oci=instantclient,/usr/,11.2 flag. But as soon as you try to install PHP and it brings in the Instant Client dependency, or if it is already installed you will see the following error:

Requires: libclntsh.so.11.1()(64bit)

You may well, like me, spend hours investigating why this occurs especially when

  • I can see the file is installed
  • I’ve symlinked it everywhere I could think of
  • Specified a custom ld.conf.d config with the location of the .so
  • I’ve installed the basic, devel and sdk versions of instant client

So you’ll be pleased to know that it isn’t your fault at all! The Oracle-built RPMs done support the “provides” arugment. Meaning that the file may well be on the filesystem, but if the instant client RPM doesn’t explicity declare that it provided this file then you will have dependency issues. Thanks to this CentOS forum for helping me! So the solution is to rebuild the Oracle instant client libraries yourself or find another reputable source. I used this RPM: http://rpm.pbone.net/index.php3/stat/4/idpl/23447083/dir/centos_6/com/oracle-instantclient-11.2.0.2.0-11.2.x86_64.rpm.html you should notice that the “Provides” sections correctly list libclntsh.so.

Building PHP

Now that you have an RPM that works, you can move on to building PHP! All you need to do is set the –with–pdo-oci flag.

./configure --with-pdo-oci=instantclient,/usr/,11.2

Bish bash bosh!

That should be you all sorted :). Hopefully this was useful to you, I pulled my hair out for several days trying to get to the bottom of it. Thankfully it’s all grown back now!

Linux + PHP + PDO -> SQL Server

From the title of this post you might think, why are you doing this? What self-respecting software engineer wants to connect from a Linux distro to a Microsoft SQL Server instance? Well, never you mind!! Quit being nosey!! I thought I’d share my experience of trying to do this, partly to share the knowledge and partly to remind myself in a few years when I need to upgrade or revisit my code. It was an “interesting” and not at all frustrating experience…

Word of warning. You’ll never have as good an experience connecting from Linux to SQL Server as you would with PHP running on windows. This is the best solution that I could find without resorting to using Windows ew!

FreeTDS

FreeTDS is a library and a few binaries that allow you to connect to SQL Server and Sybase instances. It is available in most package manager repos. For instance I tried both Ubuntu (apt) and CentOS (yum). All you need to do is install the devel package as we’ll be building PHP from source.

Ubuntu

sudo apt-get install freetds-dev libxml2-dev

CentOS

sudo yum install freetds-devel libxml2-devel

PHP

At the time of writing this I am migrating from php 5.3 to php 5.5. So I’ll give you instructions for both version. The first thing to note is that you need to build PHP yourself are there are several configure flags that need to be set. So lets get on and install it.

Get the source

wget http://uk3.php.net/distributions/php-5.3.28.tar.bz2
tar -vxf php-5.3.28.tar.bz2
cd php-5.3.28

Configure

Now that we have the source and we’re in the correct directory we need to configure our PHP install. We need to enable a both PDO support and the “dblib” PDO driver, which will be using our FreeTDS library under-the-hood. Please not this will not enable the mssql_* functions, it will just enable us to connect to a SQL Server instance using PDO classes/methods.

./configure --with-pdo-dblib

If you build PHP yourself normally you’ll probably have other configure flags to set. This shouldn’t conflict with any of them and can safely be added to your existing list of arguments.

Make

We’ll this step isn’t too difficult

make
sudo make install

Done!

Now we can use PHP to connect to a SQL Server instance:

<?php
try {
    $pdo = new PDO("dblib:my.database.com:1449;dbname=mydb", "myusername", "mypassword");
    $result = $pdo->query("select * from mytable");
} catch (PDOException $e) {
    echo "Error: " . $e->getMessage() . "\n";
}
?>

I hope this is useful to you. Please let me know if it was, or if you had any issues.