/images/avatar.png

A venture in continous integration

Green lines are code deploys and red ones are requests going to a part of the cluster. I had a big deploy today, and I'm glad to come out of it relatively unscathed.
  • Frameworks now use Composer packages, more code reuse now
  • Updated structural framework, base classes
  • Relatively smart and simple dependency injection
  • Proper exception handling and notification

Obviously there was some breakage. I had to fix about 5 incorrectly implemented interfaces, out of which one was affecting work (3 minutes to fix in production!). There were some conflicts with legacy systems like WAP and a few other breaks due to some structural inconsistencies created over the years. Most of it was a bunch of quick fixes. It’s weird but most of the breaks were due to one quick fix created a long long time ago. In the past we were trying to be intelligent and catch a few programmer errors instead of enforcing a bit of strictness, and now that we enforce this strictness we had a bit of code to clean up. Some bad code was caught before the deploy, and unfortunately due to fragmentation, some was caught after. Due to the changes implemented with the deploy, we’ll be able to run more unassisted deployment CI tasks, which will mainly save us time. And who doesn’t like to use their time creatively? - Tit Petric

Advice

If I could give advice to any software developer it would be this: Plan your software wisely. If the client needs new features - think about them. Don't just think about if they scale, think about where the project is going. Think about the work you'll do now, and the work you or someone else might have to do down the line because of your decisions. Having a procedure how to scale the work that the client sees is better than having to scale the part of your software which the client never gets to see. Enable your client, don't let your client disable you. Sometimes rebuilding is the only option - and if it comes to that, you most certainly didn't follow my advice up to this point. Learn from your mistakes and rebuild your ruins well. - Tit Petric

API Backpressure

It's days like this where I love my job. I'm implementing back pressure for API calls that communicate to an external service. Depending on how overloaded the external service is, my API interface adapts to use caching more extensively, or to use a service friendly request retry strategy, minimizing impact on infrastructure and possibly even resolving problems when they occur. This is done by keeping track of a lot of data - timeouts, error responses, request duration, ratios between failed and successful requests,... It's a nice day when I have problems like this. - Tit Petric

API development methodology

Let's say you're writing an API service. You need this API to be highly available, distributed, fast... the requirements are several pages long. The problem with API calls is, that one call might not use the same cache objects as another, uses different data sources due to partitioning or other technical reasons. A typical PHP programmer might just create an instance of every cache class, database class and others he might need. This way typical PHP applications end up creating objects which are never used during the course of execution. You can make some assumptions that optimize a few of these cases away, but you usually have overhead. Your API service needs to be fast. I'm trying to keep everything below 10ms, with access to data from SQL databases and cache servers. Connecting to them takes time. I usually don't hit the 10ms goal without bypassing PHP altogether. My (old but current) API layer currently pushes everything out at 22ms and 30ms, at the 50th and 95th percentile. You live you learn - I instantiate database objects and cache objects and things I don't need which take up valuable time from creating and using only what you need. So:
  • I needed a way to create and connect only to those services I actually need to run. If I only need to connect to Redis, I don’t need to connect to MySQL as well.
  • Leverage PHP language constructs to this end. Injecting dependencies should be language driven, not programmer driven.

I want you to think about this. Language driven vs. programmer driven - this is the main point. When it comes to programmer driven dependency injection, it will happen that your programmer(s) will be using their time writing methods that will get, set and report missing dependencies. It is a flawed concept from the beginning, as it introduces human error into the equation. The programmer can and will forget to pass a dependency into an object causing a fatal error which doesn’t trigger an exception that could be caught. The problems mount - the programmer requires attention to detail and discipline, and uses his time writing boiler plate code instead of developing new features. You can reduce this risk by implementing your dependencies as language driven constructs. PHP 5.4 introduced traits that could be leveraged to infer dependencies to be injected into objects, after they are instantiated. A typical execution flow would then be:

On PSR-0 standards, namespaces and code (re-)use

When it comes to working on several projects with different people, having a set of standards to dictate code use and code re-use is a good thing to have. PSR-0 is one such accepted standard. It took a while to realize it, as I so often do, that many useful improvements to PHP mean a few steps forward, a few steps back. I’m going to try to list a few patterns which are the cause of some conflict when implementing PSR-0.

Batch resolving of promises

I tend to have a lot of development ideas stemming from repetitive workloads or from an optimization standpoint. I tend to obsess over inefficient code structures in both. I've literally had dreams that provided me with answers which I implemented during the day. If only we could code at night during sleep. In retrospective, the Pareto principle applied to that subconsciously-influenced code base, meaning 80% of it's usage was fine, and 20% was outside of the scope I was trying to solve and introduced other problems. More about that some other time. We have a fairly complex setup over at RTV Slovenia. The landing page, which takes the majority of all requests, is constructed from a variety of data sources. There are news items, comment counts, menu items, static content, recent items in the social section of the web site, video news and a lot of relationships that make the whole mess the most complex part of the web site by a wide margin. There is little chance of rewriting it. But it's an interesting logical problem on how to optimize it without throwing it away. One of the common optimization techniques we use is that we group our data together, when possible. Given the traditional programming flow, this is sometimes quite tedious to optimize globally, instead the optimizations happen on lower levels - display items for one section, fetch all items, fetch all related comment counts, fetch all related videonews,... But we display about 10 sections of new items. We could fetch all news items in one bulked request, but it would take some significant refactoring. And there's still the other data sources we would need to worry about. The concept of Futures and promises seems to be a good solution to this model. A Promise is defined as a deferred Value. In practice this is an object which value can be resolved at a later time. Seems perfect, all we need to add to extend this model is: 1) Promise relationships
2) Batch resolving of multiple promises When I say "relationships" I'm trying to approach this from a data driven standpoint, and not program flow per-se. I don't want to use the then() keyword to trigger resolving of promises, and I don't want to keep track of the promises in a sea of closures when they resolve. A Promise containing more Promises, containing more Promises is a good way to specify relationships between promises. It's not that simple, you still need some program flow when creating promises, but this is nothing that can't be solved with a getPromises() method and some recursion. A Promise defines a set of Promises to be resolved. A news item would define a comment count promise and return it here. We stray from the traditional use of Promises here. Using Promise objects in this way gives us the ability of batch resolving promises whereas you don't get that ability from using Promises when you're implementing common Promises programming patterns. And we're maintaining the data relationships between them. All that is left to do it resolve the promises in the final data tree. My approach was to traverse the tree and reference it by class name in a final list. This way it was possible to resolve a list of promises using a resolveAll($promises) method defined in a specific promise class. This is the batching function which takes all the promises of the same type and resolves them using one function call. This function takes care of fetching the data and resolving promises. You would do this in MySQL by using a query with the SET type, or you could use memcache::get or redis::mget. You can check out my attempt at a solution here: https://github.com/titpetric/research-projects/tree/master/php-resolve So, while the landing page would still need significant refactoring, this is a step in the right direction. The resulting data tree is perfect because it is resolved with no data duplication and the maximum amount of batching. Whatever data source you use, chances are it would only add one SQL query to get all results. And optimizing one SQL query call is much easier as having to optimize 20 of them over your complete application stack. It is also so nice to reduce the number of SQL queries you're working with in case you need to implement sharding, moving the database or some other data management changes. Additional thoughts: The approach is sequential and you're given your data tree directly after execution of the resolve / resolveAll calls. There exists an opportunity to fetch data asynchronously, depending on the source of your data. If you're consuming API responses over HTTP, SQL queries over MySQL or any kind of data over a non-blocking connection, the resolving could be adapted to take advantage of this. Fetching the data in such a way is a nice optimization, but it needs to be implemented over your complete MVC solution to really take advantage of the benefits. The goal is to come as close to possible to complete coverage, so none of your data calls get duplicated. There is some thought that needs to be put into how your MVC framework can live with this data model, and where it should be avoided. The thing to keep in mind is, that this is basically an efficient model for fetching data while keeping relationships between data. This is somewhat a superset of DAO / DAL logic, since it approaches this data from a global viewpoint, and not a specific data structure viewpoint. p.s. a significant pitfall here is also the PHP engine. I'm sure the performance could/would increase dramatically if this was running in a JVM. While the benchmark is not bad, the 95th percentile shows significant overhead in the initial runs, before PHP does some of it's pre-allocation magic to speed things up.