Using recursive common table expressions and recursive views

A recursive common table expression can be used to query data that is recursive in nature.

Suppose you want to find out what cities you can fly to if you start in Chicago, and how many separate flights it will take to get there. The following query shows you that information.
WITH destinations (origin, departure, arrival, flight_count) AS    
    (SELECT a.departure, a.departure, a.arrival, 1 
            FROM flights a
            WHERE a.departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b
            WHERE r.arrival = b.departure)
SELECT origin, departure, arrival, flight_count 
    FROM destinations;

This query returns the following information.

Table 1. Results of the previous query
ORIGIN DEPARTURE ARRIVAL FLIGHT_COUNT
Chicago Chicago Miami 1
Chicago Chicago Frankfurt 1
Chicago Miami Lima 2
Chicago Frankfurt Moscow 2
Chicago Frankfurt Beijing 2
Chicago Frankfurt Vienna 2
Chicago Moscow Tokyo 3
Chicago Tokyo Hawaii 4

This recursive query is written in two parts. The first part of the common table expression is called the initialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.

The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.

In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city.

A recursive view looks very similar to a recursive common table expression. You can write the previous recursive common table expression as a recursive view like this:
CREATE VIEW destinations (origin, departure, arrival, flight_count) AS    
     SELECT departure, departure, arrival, 1 
            FROM flights 
            WHERE departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b
            WHERE r.arrival = b.departure);

The iterative fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression. For comparison, note that connect by recursion is allowed anywhere a SELECT is allowed, so it can easily be included in a view definition.

Example: Two starting cities using recursive common table expressions

Suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT a.departure, a.arrival, 0, price
            FROM flights a 
            WHERE a.departure = 'Chicago' OR
                  a.departure = 'New York'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                  r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
SELECT departure, arrival, connections, cost 
       FROM destinations;

This query returns the following information.

Table 2. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Frankfurt 0 480
New York Paris 0 400
New York London 0 350
New York Los Angeles 0 330
Chicago Lima 1 830
Chicago Moscow 1 1,060
Chicago Beijing 1 960
Chicago Vienna 1 680
New York Madrid 1 780
New York Cairo 1 880
New York Rome 1 740
New York Athens 1 690
New York Tokyo 1 860
Chicago Tokyo 2 1,740
New York Nicosia 2 970
New York Hawaii 2 1,190
Chicago Hawaii 3 2,070

For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.

Example: Two tables used for recursion using recursive common table expressions

Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.

The following query returns that information:

WITH destinations (departure, arrival, connections, flights, trains, cost) AS
  (SELECT f.departure, f.arrival, 0, 1, 0, price 
          FROM  flights f 
          WHERE f.departure = 'Chicago' 
   UNION ALL
   SELECT t.departure, t.arrival, 0, 0, 1, price 
          FROM trains t 
          WHERE t.departure = 'Chicago'
   UNION ALL
   SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains, 
             r.cost + b.price
          FROM  destinations r, flights b  
          WHERE r.arrival = b.departure
   UNION ALL
   SELECT r.departure, c.arrival, r.connections + 1 ,
             r.flights, r.trains + 1, r.cost + c.price  
          FROM destinations r, trains c 
          WHERE r.arrival = c.departure) 
SELECT departure, arrival, connections, flights, trains, cost 
       FROM destinations;

This query returns the following information.

Table 3. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS FLIGHTS TRAINS COST
Chicago Miami 0 1 0 300
Chicago Frankfurt 0 1 0 480
Chicago Washington 0 0 1 90
Chicago Lima 1 2 0 830
Chicago Moscow 1 2 0 1,060
Chicago Beijing 1 2 0 960
Chicago Vienna 1 2 0 680
Chicago Toronto 1 1 1 340
Chicago Boston 1 0 2 140
Chicago Tokyo 2 3 0 1,740
Chicago Hawaii 3 4 0 2,070

In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.

Example: DEPTH FIRST and BREADTH FIRST options for recursive common table expressions

The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.

Note: The search clause is not supported directly for recursive views. You can define a view that contains a recursive common table expression to get this function.

The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.

In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.

The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.

The following query returns that information:

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT f.departure, f.arrival, 0, price       
            FROM flights f 
            WHERE f.departure = 'Chicago'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH DEPTH FIRST BY arrival SET ordcol 
SELECT * 
   FROM destinations 
   ORDER BY ordcol;

This query returns the following information.

Table 4. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Lima 1 830
Chicago Frankfurt 0 480
Chicago Moscow 1 1,060
Chicago Tokyo 2 1,740
Chicago Hawaii 3 2,070
Chicago Beijing 1 960
Chicago Vienna 1 680

In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.

Next, you can run the same query but request the result to be ordered breadth first.

WITH destinations (departure, arrival, connections, cost) AS 
    (SELECT f.departure, f.arrival, 0, price
            FROM flights f 
            WHERE f.departure = 'Chicago' 
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH BREADTH FIRST BY arrival SET ordcol 
SELECT * 
  FROM destinations 
  ORDER BY ordcol;

This query returns the following information.

Table 5. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Frankfurt 0 480
Chicago Lima 1 830
Chicago Moscow 1 1,060
Chicago Beijing 1 960
Chicago Vienna 1 680
Chicago Tokyo 2 1,740
Chicago Hawaii 3 2,070

In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order. As you can see, there is no ordering done based on any values of the column used for depth or breadth first processing. To get ordering, the ORDER SIBLINGS BY construct available with the CONNECT BY form of recursion can be used.

Example: Cyclic data using recursive common table expressions

The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.

Note: The cycle clause is not supported directly for recursive views. You can define a view that contains a recursive common table expression to get this function.

For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.

The following query returns that information:

INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440)


WITH destinations (departure, arrival, connections, cost, itinerary) AS
    (SELECT f.departure, f.arrival, 1, price, 
                CAST(f.departure CONCAT f.arrival AS VARCHAR(2000))
            FROM flights f  
            WHERE f.departure = 'New York'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1 ,
                r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000))
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    CYCLE arrival SET cyclic_data TO '1' DEFAULT '0' 
SELECT departure, arrival, itinerary, cyclic_data  
      FROM destinations;  

This query returns the following information.

Table 6. Results of the previous query
DEPARTURE ARRIVAL ITINERARY CYCLIC_DATA
New York Paris New York     Paris 0
New York London New York     London 0
New York Los Angeles New York     Los Angeles 0
New York Madrid New York     Paris     Madrid 0
New York Cairo New York     Paris     Cairo 0
New York Rome New York     Paris     Rome 0
New York Athens New York     London     Athens 0
New York Tokyo New York     Los Angeles     Tokyo 0
New York Paris New York     Paris     Cairo     Paris 1
New York Nicosia New York     London     Athens     Nicosia 0
New York Hawaii New York     Los Angeles     Tokyo     Hawaii 0

In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query. You can choose to exclude cyclic rows by adding a predicate: WHERE CYCLIC_DATA = 0.