Diagnosing Packet Loss in Linux Network Virtualization Layers: Part 3

In the third part of this five-part series, we will explain how we examined data structures in the Linux kernel to diagnose the packet loss issue described in Part 1.

We will show how we used SystemTap to probe the status of queues and then how we experimented with different configurations of the queues to observe how they affected the packet loss. This is part of the series of blogs that is intended for network administrators and developers who are interested in how to diagnose packet loss in the Linux network virtualization layers.

Revisiting the source code

The following is a simplified version of the tap_handle_frame() function in the Linux version 4.15.0. Please refer to Part 2 for the details. This is the function where packets were dropped in our case. The execution can jump to the drop label in line 16 from lines 5, 9, 12, and 14. In Part 2, by using SystemTap, we confirmed that the execution could never jump from lines 9 or 12:

 1: rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 2: {
    ...
 3:         q = tap_get_queue(tap, skb);
    ...
 4:         if (__skb_array_full(&q->skb_array))
 5:                 goto drop;
    ...
 6:         if (netif_needs_gso(skb, features)) {
 7:                 struct sk_buff *segs = __skb_gso_segment(skb, features, false);
 8:                 if (IS_ERR(segs))
 9:                         goto drop;
    ...
10:         } else {
    ...
11:                 if (skb_checksum_help(skb))
12:                         goto drop;
13:                 if (skb_array_produce(&q->skb_array, skb)
14:                         goto drop;
15:         }
    ...
16: drop:
17:         if (tap->count_rx_dropped)
18:                 tap->count_rx_dropped(tap);
    ...
19: }

Checking the status of the macvtap queues

The only remaining possibility was either at lines 5 or 14. Both conditions become true when the macvtap queue is full. As described in Part 2, there are three macvtap queues in our system. In line 3, tap_handle_frame() selects one of the macvtap queues based on the hashed value of the received packet. Line 4 performs an early check on whether the queue is full or not. Later, in line 13, it inserts the packets into the queue. In both cases, if it finds the queue full, it drops the packet.

SystemTap allows users to check not only whether a kernel function is called, but also the state of a kernel data structure. We wrote a script to dump the status of all of the three macvtap queues every time macvtap_count_rx_dropped() was called — that is, every time a packet was dropped. We omit the execution results of the script, but they indicated that even when a packet was dropped, none of the queues were full. They contained, at most, one packet. This observation contradicts our analysis. Why were packets dropped at non-full queues?

A caveat is that our SystemTap script did not acquire any necessary lock to inspect the queue status. Because multiple host ksoftirqd threads can access the mavtap queues, a correct protocol to access one of the queues is to first acquire its lock. However, it would be prohibitively tedious to write such code in SystemTap, so we did not go down the path. As a result, there was always a chance of race condition where the dumped queue status might not have been a consistent snapshot.

To completely understand what was going on, we required an approach from another angle.

Changing the number of queues

In Figure 1, we present the Linux network virtualization layers that we explained in Part 2. Since we had many different paths in the layers, we figured it might help to simplify the setup to understand the problem further:

Figure 1

One way to simplify the environment was to reduce the number of VF queues feeding a macvtap device. Reducing from four to a single VF queue simplified the topology significantly, as shown in Figure 2.

Some quick benchmarks at this state provided interesting results, and the observed macvtap loss has disappeared. This was interesting and worthy of further experimentation. Reducing the four VF queues to one queue led to no observed packet loss. So, what about some other scenarios?

We modified the environment with number of producer VF queues from 1 to 2 and 4. We then varied the number of consumer virtqueues — and, hence, the number of macvtap queues — to conduct a series of further experiments, as shown in Figure 3. Subsequently, we discovered that no packet loss occurred when the number of virtqueues were multiples of the number of VF queues:

Figure 2

Figure 3

We can surmise from the data that when multiple VF queues distribute into the same macvtap queue, loss can be observed. An example is indicated with the red arrows leading into the first macvtap queue in Figure 1. This, combined with the previous observation using SystemTap, strongly indicated a multi-threading problem in macvtap, specifically the producers’ writing into the queues.

Summary

In this post, we have examined the packet loss — first by instrumenting the macvtap queues using SystemTap and then by changing the number of VF queues, macvtap queues and virtqueues. Based on our observation, we suspected that there was a concurrency bug in the macvtap driver of Linux. In the next post, we will explain the root cause of the packet loss and will present how we confirmed our hypothesis, using SystemTap.

Was this article helpful?

YesNo

Kean Kuiper

Senior Engineer

Saju Mathew

Senior Engineer

Rei Odaira

Research Staff Member

In the third part of this five-part series, we will explain how we examined data structures in the Linux kernel to diagnose the packet loss issue described in Part 1.

Revisiting the source code

Checking the status of the macvtap queues

Changing the number of queues

Summary

Read more

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

The power of the mainframe and cloud-native applications

Modernize your mainframe applications with Azure

IBM Newsletters