Welcome!

Machine Learning Authors: William Schmarzo, Kevin Jackson, Stackify Blog, Elizabeth White, Pat Romanski

Related Topics: Ruby-On-Rails, Java IoT, Industrial IoT, Microservices Expo, Open Source Cloud, Machine Learning , PHP

Ruby-On-Rails: Article

Monitoring Background Jobs in Ruby’s Resque

How to get visibility into an important component of any complex system: the messaging queue

Here at AppNeta, we get to see a lot about how people build their web applications. From simple PHP scripts to heavily service-oriented Java clouds to monolithic Django apps, everybody’s product is architected a little differently. We’re still out to trace everything, and today I want to talk how to get visibility into an important component of any complex system: the messaging queue. Specifically, let’s look at how to trace a job from Rails using Resque.

Messaging Queues
If you haven’t used a messaging queue in your app, the idea is simple. Instead of forcing all the work to happen during the request, while the user is waiting, you can delay some of the more time-consuming tasks. You can do anything in these tasks, ranging from a simple insert to kicking off a series of user analytics that touch all parts of your infrastructure. The advantage is that you can return a speedy response to the user, or, if they are actually waiting on the task results, give them a better loading interface than a white screen and browser loading bar.

A Quick Resque Tutorial
In Ruby, Resque is a task runner, which by default stores the task descriptions in Redis (though other options are available). Resque jobs are just Ruby classes, with a single mandatory method perform. Resque will call perform with the arguments given in the task description. Let’s look at a minimal task, that takes a single argument and prints it. (Useless, I know.)

LogInfo

The @queue variable defines a name that a worker can bind to, in case you want to spread different types of jobs across different machines. To create a task that this worker could run, we just call it from our request:

HoorayTrace

And that’s our job! Maybe not the most interesting job, and probably not prone to performance issues, but we don’t know that yet. So let’s measure it!

Tracing a Resque Task
Now that we’ve added this to our system, we should have monitoring around it. The easiest way to do this would be to just measure the time each task takes, and log that information:

MeasureLog

Unfortunately, the data presentation here leaves a bit to be desired, so I’m going to use TraceView to log this information instead. This also has the benefit of logging any SQL queries, cache accesses, or service calls that we might do in a more complex task, as well as reporting errors. To start a trace fresh, we can wrap this call in the start_trace block:

That’s a start! We’ve now got some visibility into our Resque jobs, and we can rest easy knowing that this is running smoothly in production.

Tracing a Resque Job (with multiple tasks!)
For cron-style jobs, the approach of tracing each task individually works fine. For reference, let’s look at the events we’re generating with that code:

Tracing Resque

Pretty straightforward. Now let’s consider a more complicated set of tasks: a document-processing pipeline. That code might look like this:

ProcessPipeline

In this case, our first task takes a document, and the second one archives it. If we have multiple tasks, each one gets logged separately, and we can figure out same statistics for each — average, std. dev., percentiles, and the like. But what if you have a job that spans multiple tasks? We can further aggregate the stats, but we might be starting to miss things, like large inputs that cause the entire pipeline to slow down.

What we’d really like is to correlate the related tasks, so instead of timing the each task, we’re timing the entire job. Under the hood, TraceView generates a token for each request. If we pass this ID (generally stored in xtrace, after the X-Trace header it’s passed around in) to each task, we can correlate those timings before storing them, and retrieve them all together. To do this, we can modify each task to take this token, and trace using that ID. ProcessDoc then becomes:

ProcessDoc

Now we need to start the trace somewhere, but we’re not doing it in the job. We could start it in the first task, or we could link this one step further up the chain and tie it back to the web request that started it in the first place. In a default rails stack, that request generates the following events:

Tracing Resque

To add in the task queue call to the logged request, we can call the following function:

TaskQueue-LoggedRequest

We have to force a fork in the execution path to indicate that we’re running an asynchronous task, possibly in parallel, with the rest of the web request, which is done with the call to fromString. Aside from that, this is the same underlying call as is done by start_trace above — log that we’re entering a named block of code, and start timing it.

When we put it all together, we get a secondary execution path attached to the web request, and the logged events look like this:

Tracing Resque

Now we’ve got everything: the original request, all tasks, individual timing information, and a global view of how the process performed. Not that we now have an additional timing measurement here: the delay before starting the task at all. In this case, we waited a full 500ms between queuing the job actually executing it! Once we were in the pipeline, the tasks happened much faster (only 25ms between processing and archiving).

Caveats
Lest you think that everything was easy, there’s a couple things to keep in mind when you use this in your own application.

  • Because we’re starting the timing in the web request and ending it in a task queue, we’re relying on those two processes to have an identical clock. If they’re on the same machine, it won’t be a problem, but on different machines, any clock skew will effect the timing.
  • I’ve quietly assumed everything in this system is reliable, which is almost certainly wrong. Whatever your error handling is, make sure you always log the exit event for ‘job’, or you may never know that you have errors!

As long as I haven’t totally dissuaded you from trying this out, all the code is available in one place in this gist, and you can try in out in your application today with our free version of TraceView!

Related Articles

Ruby 2.0 Released: Let The Tracing Begin!

AppNeta Rubygems Verified

Relieve Event Binding Aches in Backbone.js

More Stories By TR Jordan

A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. After a few years as Software Development Lead with Thermopylae Science and Techology, he left to join Tracelytics as its first engineer. Following Tracelytics merger with AppNeta, TR was tapped to run all of its developer and market evangelism efforts. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code. Read more of his articles at www.appneta.com/blog or visit www.appneta.com.

@CloudExpo Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...