February 25, 2021 By Kean Kuiper
Saju Mathew
Rei Odaira
4 min read

In the third part of this five-part series, we will explain how we examined data structures in the Linux kernel to diagnose the packet loss issue described in Part 1.

We will show how we used SystemTap to probe the status of queues and then how we experimented with different configurations of the queues to observe how they affected the packet loss. This is part of the series of blogs that is intended for network administrators and developers who are interested in how to diagnose packet loss in the Linux network virtualization layers.

Revisiting the source code

The following is a simplified version of the tap_handle_frame() function in the Linux version 4.15.0. Please refer to Part 2 for the details. This is the function where packets were dropped in our case. The execution can jump to the drop label in line 16 from lines 5, 9, 12, and 14. In Part 2, by using SystemTap, we confirmed that the execution could never jump from lines 9 or 12:

 1: rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 2: {
    ...
 3:         q = tap_get_queue(tap, skb);
    ...
 4:         if (__skb_array_full(&q->skb_array))
 5:                 goto drop;
    ...
 6:         if (netif_needs_gso(skb, features)) {
 7:                 struct sk_buff *segs = __skb_gso_segment(skb, features, false);
 8:                 if (IS_ERR(segs))
 9:                         goto drop;
    ...
10:         } else {
    ...
11:                 if (skb_checksum_help(skb))
12:                         goto drop;
13:                 if (skb_array_produce(&q->skb_array, skb)
14:                         goto drop;
15:         }
    ...
16: drop:
17:         if (tap->count_rx_dropped)
18:                 tap->count_rx_dropped(tap);
    ...
19: }

Checking the status of the macvtap queues

The only remaining possibility was either at lines 5 or 14. Both conditions become true when the macvtap queue is full. As described in Part 2, there are three macvtap queues in our system. In line 3, tap_handle_frame() selects one of the macvtap queues based on the hashed value of the received packet. Line 4 performs an early check on whether the queue is full or not. Later, in line 13, it inserts the packets into the queue. In both cases, if it finds the queue full, it drops the packet.

SystemTap allows users to check not only whether a kernel function is called, but also the state of a kernel data structure. We wrote a script to dump the status of all of the three macvtap queues every time macvtap_count_rx_dropped() was called — that is, every time a packet was dropped. We omit the execution results of the script, but they indicated that even when a packet was dropped, none of the queues were full. They contained, at most, one packet. This observation contradicts our analysis. Why were packets dropped at non-full queues?

A caveat is that our SystemTap script did not acquire any necessary lock to inspect the queue status. Because multiple host ksoftirqd threads can access the mavtap queues, a correct protocol to access one of the queues is to first acquire its lock. However, it would be prohibitively tedious to write such code in SystemTap, so we did not go down the path. As a result, there was always a chance of race condition where the dumped queue status might not have been a consistent snapshot.

To completely understand what was going on, we required an approach from another angle.

Changing the number of queues

In Figure 1, we present the Linux network virtualization layers that we explained in Part 2. Since we had many different paths in the layers, we figured it might help to simplify the setup to understand the problem further:

Figure 1

One way to simplify the environment was to reduce the number of VF queues feeding a macvtap device. Reducing from four to a single VF queue simplified the topology significantly, as shown in Figure 2.

Some quick benchmarks at this state provided interesting results, and the observed macvtap loss has disappeared. This was interesting and worthy of further experimentation. Reducing the four VF queues to one queue led to no observed packet loss. So, what about some other scenarios?

We modified the environment with number of producer VF queues from 1 to 2 and 4. We then varied the number of consumer virtqueues — and, hence, the number of macvtap queues — to conduct a series of further experiments, as shown in Figure 3. Subsequently, we discovered that no packet loss occurred when the number of virtqueues were multiples of the number of VF queues:

Figure 2

Figure 3

We can surmise from the data that when multiple VF queues distribute into the same macvtap queue, loss can be observed. An example is indicated with the red arrows leading into the first macvtap queue in Figure 1. This, combined with the previous observation using SystemTap, strongly indicated a multi-threading problem in macvtap, specifically the producers’ writing into the queues.

Summary

In this post, we have examined the packet loss — first by instrumenting the macvtap queues using SystemTap and then by changing the number of VF queues, macvtap queues and virtqueues. Based on our observation, we suspected that there was a concurrency bug in the macvtap driver of Linux. In the next post, we will explain the root cause of the packet loss and will present how we confirmed our hypothesis, using SystemTap.

Read more

Was this article helpful?
YesNo

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

The power of the mainframe and cloud-native applications 

4 min read - Mainframe modernization refers to the process of transforming legacy mainframe systems, applications and infrastructure to align with modern technology and business standards. This process unlocks the power of mainframe systems, enabling organizations to use their existing investments in mainframe technology and capitalize on the benefits of modernization. By modernizing mainframe systems, organizations can improve agility, increase efficiency, reduce costs, and enhance customer experience.  Mainframe modernization empowers organizations to harness the latest technologies and tools, such as cloud computing, artificial intelligence,…

Modernize your mainframe applications with Azure

4 min read - Mainframes continue to play a vital role in many businesses' core operations. According to new research from IBM's Institute for Business Value, a significant 7 out of 10 IT executives believe that mainframe-based applications are crucial to their business and technology strategies. However, the rapid pace of digital transformation is forcing companies to modernize across their IT landscape, and as the pace of innovation continuously accelerates, organizations must react and adapt to these changes or risk being left behind. Mainframe…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters