Machine Learning Authors: Yeshim Deniz, William Schmarzo, Flint Brenton, Pat Romanski, Rene Buest

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Linux Containers, Containers Expo Blog, Agile Computing, @DXWorldExpo

@CloudExpo: Article

Understanding Application Performance on the Network | Part 4

Packet Loss

We know that losing packets is not a good thing; retransmissions cause delays. We also know that TCP ensures reliable data delivery, masking the impact of packet loss. So why are some applications seemingly unaffected by the same packet loss rate that seems to cripple others? From a performance analysis perspective, how do you understand the relevance of packet loss and avoid chasing red herrings?

In Part II, we examined two closely related constraints - bandwidth and congestion. In Part III, we discussed TCP slow-start and introduced the Congestion Window (CWD). In Part IV, we'll focus on packet loss, continuing the concepts from these two previous entries.

TCP Reliability
TCP ensures reliable delivery of data through its sliding window approach to managing byte sequences and acknowledgements; among other things, this sequencing allows a receiver to inform the sender of missing data caused by packet loss in multi-packet flows. Independently, a sender may detect packet loss through the expiration of its retransmission timer. We will look at the behavior and performance penalty associated with each of these cases; generally, the impact of packet loss will depend on both the characteristics of the flow and the position of the dropped packet within the flow.

The Retransmission Timer
Each packet a node sends is associated with a retransmission timer; if the timer expires before the sent data has been acknowledged, it is considered lost and retransmitted. There are two important characteristics of the retransmission timer that relate to performance. First, the default value for the initial retransmission timeout (RTO) is almost always 3000 milliseconds; this is adjusted to a more reasonable value as TCP observes actual path round-trip times. Second, the timeout value is doubled for subsequent retransmissions of a packet.

In small flows (a common characteristic of chatty operations - like web pages), the retransmission timer is the method used to detect packet loss. Consider a request or reply message of just 1000 bytes, sent in a single packet; if this packet is dropped, there will of course be no acknowledgement; the receiver has no idea the packet was sent. If the packet is dropped early in the life of a TCP connection - perhaps one of the SYN packets during the TCP 3-way handshake, or an initial GET request or a 304 Not Modified response - the dropped packet will be retransmitted only after 3 seconds have elapsed.

Triple Duplicate ACK
Within larger flows, a dropped packet may be detected before the retransmission time expires if the sender receives three duplicate ACKs; this is generally more efficient (faster) than waiting for the retransmission timer to expire. As the receiving node receives packets that are out of sequence (i.e., after the missing packet data should have been seen), it sends duplicate ACKs, the acknowledgement number repeatedly referencing the expected (missing) packet data. When the sending node receives the third duplicate ACK, it assumes the packet was in fact lost (not just delayed) and retransmits it. This event causes the sender to assume network congestion, reducing its congestion window by 50% to allow congestion to subside. Slow-start begins to increase the CWD from that new value, using a relatively conservative congestion avoidance ramp.

As an example, consider a server sending a large file to a client; the sending node is ramping up through slow-start. As the CWD reaches 24, earlier packet loss is detected via a triple duplicate ACK; the lost data is retransmitted, and the CWD is reduced to 12. Slow-start resumes from this point in its congestion avoidance mode.

While arguments abound about the inefficiency of existing congestion avoidance approaches, especially on high-speed networks, you can expect to see this behavior in today's networks.

Transaction Trace Illustration
Identifying retransmission timeouts using merged trace files is generally quite straightforward; we have proof the packet has been lost (because we see it on the sending side and not on the receiving side), and we know the delay between the dropped and retransmitted packets at the sending node. The Delta column in the Error Table indicates the retransmission delay.

Error Table entry showing a 3-second retransmission delay caused by a retransmission timeout (RTO)

For larger flows, you can illustrate the effect of dropped packets on the sender's Congestion Window by using the Time Plot view. For Series 1, graph the sender's Frames in Transit; this is essentially the CWD. For Series 2, graph the Cumulative Error Count in both directions. As errors (retransmitted packets or out-of-sequence packets) occur, the CWD will be reduced by about 50%.

Time Plot view showing the impact of packet loss (blue plot) on the Congestion Window (brown plot)

For more networking tips click here for the full article

More Stories By Gary Kaiser

Gary Kaiser is a Subject Matter Expert in Network Performance Analytics at Dynatrace, responsible for DC RUM’s technical marketing programs. He is a co-inventor of multiple performance analysis features, and continues to champion the value of network performance analytics. He is the author of Network Application Performance Analysis (WalrusInk, 2014).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
With privacy often voiced as the primary concern when using cloud based services, SyncriBox was designed to ensure that the software remains completely under the customer's control. Having both the source and destination files remain under the user?s control, there are no privacy or security issues. Since files are synchronized using Syncrify Server, no third party ever sees these files.
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
"We are an integrator of carrier ethernet and bandwidth to get people to connect to the cloud, to the SaaS providers, and the IaaS providers all on ethernet," explained Paul Mako, CEO & CTO of Massive Networks, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
I believe that this may finally be the year that the CIO role ‘crosses the Rubicon,' leaving behind its traditional, IT-focused orientation. But I don't believe that either of the previous predictions of this outcome — fading into oblivion or rising to a business executive level — is correct. Instead, I think this is the year that we will see the role of the CIO transformed into something altogether different.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
"Calligo is a cloud service provider with data privacy at the heart of what we do. We are a typical Infrastructure as a Service cloud provider but it's been designed around data privacy," explained Julian Box, CEO and co-founder of Calligo, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
CI/CD is conceptually straightforward, yet often technically intricate to implement since it requires time and opportunities to develop intimate understanding on not only DevOps processes and operations, but likely product integrations with multiple platforms. This session intends to bridge the gap by offering an intense learning experience while witnessing the processes and operations to build from zero to a simple, yet functional CI/CD pipeline integrated with Jenkins, Github, Docker and Azure...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.