Machine Learning Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Jason Bloomberg, William Schmarzo

Related Topics: Microservices Expo, Java IoT, Industrial IoT, Open Source Cloud, Machine Learning , Python

Microservices Expo: Article

Performing Under Pressure | Part 1

Load-Testing with Multi-Mechanize

Many types of performance problems can result from the load created by concurrent users of web applications, and all too often these scalability bottlenecks go undetected until the application has been deployed in production.  Load-testing, the generation of simulated user requests, is a great way to catch these types of issues before they get out of hand.  Last month I presented about load testing with Canonical's Corey Goldberg at the Boston Python Meetup last week and thought the topic deserved blog discussion as well.


In this two-part series, I'll walk through generating load using the Python multi-mechanize load-testing framework, then collect and analyze data about app performance using Tracelytics.

Also, a request: there's mechanize documentation available, but I unfortunately haven't found any full documentation of the python mechanize API online-post a comment if you know where to find it!

Meet the app: Reddit
The web app that I'll be using for all the examples in this post is an open source reddit running on a single node in EC2. You don't need to understand how it works in order to enjoy this post, but if you do want to play along, there is a super-easy install script that sets up a whole stack.

Generating the data
Performance testing can start off simple: hit pages in your app, and monitor how long they take to load. You can automate this using something like the mechanize library in Python, or even something more low level like httplib/urllib2.  This is a good start, but today we're looking for concurrency as well.

Enter multi-mechanize. Multi-mechanize takes simple request or transaction simulation scripts you've written and fires them repeatedly from many threads simultaneously in configurable patterns. It's as easy as writing a few scripts that simulate users doing different actions on your website (login, browse, submit comment, etc.) and then writing a short config that tells multi-mechanize how to play them back.

Since our site is reddit, I'm going to write a few scripts that read discussion threads, one that posts comments, and one that votes on comments. I'll walk through writing to of the scripts: a simple read-only request, and more complicated one that logs in and submits a comment. The rest of them are available in the full source of the load tests on github.

A simple mechanize script

First, the test is wrapped in a Transaction class-this is how multi-mechanize will run each of your scripts. The __init__() is called once per worker thread, then run() will be invoked repeatedly to generate the requests of your transaction. For development and debugging, it's easier to just run the scripts individually, so the __main__ block at the end provides that functionality.

All of the work here is happening in the run() method. A mechanize browser is instantiated, and our simple request for the front page of Reddit is performed. Finally, we make sure that a valid page was returned.

There's one more thing: custom timers.  Multi-mechanize can collect timing information about the requests it performs. If you store that information in a correctly-named dict, it will be able to generate charts of the data later.

A more complicated script
Now, let's take a look at a slightly more complicated script. This one posts a comment on a particular story, so it'll have to take the following actions:

  • Log in as a user
  • Open thread page on Reddit
  • Post comment

It's a bit longer, so I've broken it up according to the bullets above with accompanying explanation.  The full version of the example can be found here:  https://gist.github.com/1529242

The first thing that's happening here is familiar: pulling up a page in our mechanize browser. In order to login, though, we need to start interacting with forms on the page, and this means tweaking some of the default mechanize browser settings.  We need to set three attributes on our browser: follow 30x redirects (lets the login redirect back), specify the Referrer page header (validation for comment post), and ignore robots.txt rules (Reddit doesn't like robots playing human).

After that, it's on to forms.  The mechanize browser interface with forms is pretty simple: you can list all the forms on the page with browser.forms, select a form to interact with using select_form, and then manipulate the fields of the form using the browser.form object.

select_form  can take a variety of selection predicates, most of which revolve around using attributes such as the form's CSS ID. Our example, Reddit, doesn't have much identifying information associated with forms, so I've used numeric selection to grab them. The login form happens to be form 1.

Pretty straightforward: head to the thread page now that we're logged in.  Now we want to actually submit the comment.  Here's the heavy lifting:

Comment submission is a little bit different because it works via AJAX.  The mechanized browser doesn't process JavaScript, meaning that we'll have to take things into our own hands here.  So, we inspect two forms to grab the state information that JavaScript on the page would use to submit the form, and we construct our own request manually.  (Form 0 provides the ‘uh' value in a hidden field; form 12 is the top-level comment submit form.)

In this simple example, user credentials are provided in __init__. However, a more realistic example might involve many different users logging in. In the code on github, I've written auser pool implementation that takes care of this problem by instantiating a pool of logged in users for each script (then, each invocation of run can check out a different user).

(Debugging note for those playing along: if comments are not showing up in the thread but are showing up in the users profile, that means that some of the background jobs may not be running correctly. The site re-caches the comment tree asynchronously after posts.)

Running the full load test
After writing a few individual mechanize scripts, the final step is putting them all together with a multi-mechanize config. Multi-mechanize organizes load tests in terms of "projects" which are represented by subdirectories of a directory called projects. Each project contains a config file and a directory called test_scripts which contains your individual load tests scripts. It should look like this:

The config file specifies how long the load test should run for, whether it should ramp up the amount of pain or keep it constant,  a few output and statistics settings, and of course the number of threads and scripts you want to run. Here's an example config:

Runtime sets the duration of the test, in seconds. Ramp up, if nonzero, tolls multi-mechanize to linearly increase the number of threads up to the specified numbers during the wrapup.

And here's how to invoke them, finally:

Learning from our load tests
Multi-mechanize collects statistics about timing information that you provide in your tests (custom_timers) and dumps the output in a results subdirectory of your project. This can easily be plotted in your favorite graphics package.  Here's an example of the average times from a read load increasing over 30 minutes:

Ok, so it's getting slower, but why??  These timers treat the application like a black box-they'll show you that it can be slow, but you won't know why or what layers of the stack are slow. In the next article, we'll talk about how to gather actionable data from your load tests.

Related Articles

Performing under pressure, pt. 2: Collecting and visualizing load-test performance data

Python and Gevent

Tracing Python - An API

More Stories By Dan Kuebrich

Dan Kuebrich is a web performance geek, currently working on Application Performance Management at AppNeta. He was previously a founder of Tracelytics (acquired by AppNeta), and before that worked on AmieStreet/Songza.com.

@CloudExpo Stories
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...