Building a time series database: 10^12 rows and counting

AHL is a systematic hedge fund where data is central to the business. Challenged by performance and scalability problems when storing and retrieving time series data using traditional data stores, we built our own.

Arctic is the result of that work. It’s a high performance time series column store built with Python on MongoDB. With compression and chunking arctic gives query performance orders of magnitude better than commercial (and open source) dedicated time series databases. We ingest 800M ticks per day, and read data at millions of rows per second (in pure Python). Our aim is to efficiently ship data to cheap compute, rather than run all computation on expensive (in software/hardware terms) dedicated database servers.

The talk explores the solution space of existing time series data stores, and the route we’ve taken to build a simple library with a beautiful API for numeric data storage.

What will I learn?

How simple ideas can be used to build a high throughput time series processing system using off-the-shelf open source components, and how important the data model is when scaling a system.

About James Blackburn

James is the lead technologist on the market data platform at Man AHL. Recently he’s been involved in open sourcing AHL’s time series database: Arctic.

Back to top
James Blackburn

James Blackburn

Lead Technologist at Man AHL

From antiquated to engineer: Today's DBA

The entire world of IT is shifting, and the job of database administration is rapidly losing relevancy. In this talk, Laine discusses the job and role of the Database Engineer, their place in the worlds of reliability engineering and devops, and the skills needed to stay ahead of the curve. By the end of this, you should have a feel for what is required of today’s database engineer, and a path to get there.

What will I learn?

The career path for today’s DBA, what needs to be developed, focused and broadened. What paradigms in IT are shifting the job of the database engineer.

About Laine Campbell

Laine is currently the CTO of OrderWithMe, formerly AVP of Pythian’s open-source database practice, CEO and co-founder of Blackbird, and a founder of PalominoDB. Laine has been an Oracle, MySQL and Cassandra DBA architect and designer for 16 years with such organizations as Obama for America, Travelocity, Zappos, Chegg, LiveJournal, Disney Mobile, and Adobe. Laine is also an open-source proponent, and advocate for bringing technology, job opportunities, and privileges to underserved populations.

Laine is co-author of O’Reilly Media’s Databases at Scale.

Back to top
Laine Campbell

Laine Campbell

CTO at OrderWithMe

PostgreSQL is YeSQL!

The database landscape has changed a lot in the recent years. The NoSQL movement has taken the world by storm and you may wonder if there is still room for relational databases. In this talk we will learn about the strengths that make PostgreSQL more relevant than ever. We’ll survey its architecture, availability tradeoffs, durability with one or more servers, and yes even SQL!

What will I learn?

From this talk you will learn to appreciate how versatile SQL really is and the many use-cases it can be applied to and solve elegantly. You will also be introduced to the basics for PostgreSQL High Availability and Scaling, a must-know in this time of high availability.

About Dimitri Fontaine

Dimitri is a PostgreSQL Major Contributor (Extensions, Event Triggers, Bi Directional Replication). Dimitri also develops pgloader and other PostgreSQL related software.

Back to top
Dimitri Fontaine

Dimitri Fontaine

CEO at 2ndQuadrant France

Getting data out of databases: a surprisingly tricky problem

Writing to a database is easy, but getting the data out again is surprisingly hard.

Of course, if you just want to query the database and get some results, that’s fine. But what if you want a copy of your database contents in some other system — for example, to make it searchable in Elasticsearch, or to pre-fill caches so that they’re nice and fast, or to load it into a data warehouse for analytics, or if you want to migrate to a different database technology?

As the data is constantly changing, a one-off snapshot of the database is not enough: you need to tap into the ongoing stream of writes to the database. This technique is called Change Data Capture (CDC). At companies like LinkedIn and Facebook, this is how caches and indexes are kept up-to-date.

This talk explains why change data capture is so useful, and how it prevents race conditions and other ugly problems. Martin will explore the practical details of implementing CDC with PostgreSQL and Apache Kafka, and discuss the approaches you can use to do the same with various other databases.

What will I learn?

How you can use change data capture to reliably keep several databases, indexes and caches in sync.

About Martin Kleppmann

Martin co-founded Rapportive, worked on scalable data systems at LinkedIn, and is writing an O’Reilly book on Data-Intensive Applications

Back to top
Martin Kleppmann

Martin Kleppmann

Author of @intensivedata. Committer at @samzastream.

Upgrade your database: without losing your data, your perf or your mind

Upgrading databases can be terrifying and perilous, and for good reason: you can totally screw yourself! Every workload is unique and standardised test suites will never give you enough information to evaluate how an upgrade will perform for your query set. We will talk about how paranoid you should be about various types of workloads and upgrades, how to balance risk vs engineering effort, and how to safely execute the most challenging upgrades by capturing and replaying real production workloads. The principles apply to any db, but we’ll go particularly deep into war stories and tooling options for MongoDB and MySQL.

What will I learn?

You’ll learn how to evaluate the riskiness of any db upgrade and migration, as well as how to realistically assess your organisational appetite for risk. You will also learn about how to gain confidence really scary, high-risk upgrades using strategies like shadowing production traffic or capturing and replaying workloads offline.

About Charity Majors

Charity is an Engineering Manager at Parse/Facebook, the best way to build great mobile apps. She loves whiskey and taming chaos.

Back to top
Charity Majors

Charity Majors

Production Engineering Manager at Parse / Facebook

Break your database before it breaks you

As Uber scales up fast, we’ve run into problems keeping all of our databases working reliably. Our solution has been to adopt Chaos Monkey-style failure testing for all production systems, even databases. This talk will cover our experience with production database failures, failure testing, and the fault-tolerant systems we are building to resist these failures.

What will I learn?

After this talk, you’ll be able to better assess the risk from the different failure modes of databases in your system’s architecture.

About Matt Ranney

Matt works on architecture, performance, and distributed systems at Uber. Before Uber, he was co-founder of Voxer.

Back to top
Matt Ranney

Matt Ranney

Senior Staff Engineer at Uber

Do what matters, not what's shiny and new

When considering the next improvement in your infrastructure, it’s easy to forget that significant business value can be created by making unglamorous changes that introduce no new technologies or involve potentially hazardous upgrades. This talk is the story of a database migration of Intercom’s customer data stored in MongoDB databases from a third party provider to self-managed infrastructure hosted inside Amazon EC2, and shows how a relatively drab data migration significantly improved both our bottom line and security posture – both of which definitely matter.

What will I learn?

Along with some low-level MongoDB details, you’ll gain insight into prioritising and executing work to maximise business impact.

About Brian Scanlan

Brian Scanlan is an engineer with Intercom, based out of Dublin, Ireland. He works on on their platform team, processing user data and building and operating Intercom’s APIs, SDKs and integrations. He tends to work somewhere in the overlap of systems engineering, software development and fixing large scale outages.

Back to top
Brian Scanlan

Brian Scanlan

Engineer at Intercom

Adventures in building your own database

Bloomberg LP began development 11 years ago on an internally used database system called Comdb2. In this talk, Alex Scotti, the original author and head of this project will give insight into decisions that led us down this path at the time, and lessons we have learned going through this effort. We will discuss the changes in the landscape of database products that occurred in the background during this time and see if the problems we initially set out to solve are no longer such hard problems.

What will I learn?

From this talk, you will gain insight into the high level architecture of Bloomberg’s database system, and the specific types of problems Bloomberg faced (and still does) which makes custom implementation a viable strategy.

About Alex Scotti

Alex Scotti is the original architect and programmer of the Bloomberg’s proprietary Comdb2 database system.

Back to top

Scaling MySQL at Facebook

In this talk Rongrong Zhong will give a brief introduction to the MySQL deployment at Facebook, what problems they in order solved to run their database at large scale and how they achieved this and some improvements they contributed to the MySQL project. Rongrong will also cover case studies about things that seem simple at small scale but bring interesting problems as the scale grows.

What will I learn?

How to scale up and manage your database deployments, as well as potential problems when running a system at large scale that you should watch out for.

About Rongrong Zhong

Rongrong works on the WebScaleSQL team to improve storage efficiency, and solve other problems to make MySQL more manageable and performant at Facebook.

Back to top
Rongrong Zhong

Rongrong Zhong

Software Engineer at Facebook

Organising committee

Ines Sombra

Ines Sombra

Engineer at Fastly and board member of Ruby Together.
Simon Metson

Simon Metson

Engineer at IBM Cloudant and a project committer on Apache CouchDB.
Ruth Yarnit

Ruth Yarnit

Managing Director of White October Events, organisers of developer conferences and training.

Sponsors

If you’d like to get involved in supporting All Your Base, please request a sponsor pack.