               Prescriptive Topology Module (PTM)
             =====================================

1. Introduction

Detour: http://en.wikipedia.org/wiki/Linguistic_prescription

- In the context of intra-DC, the topology follows a specific definition
   (e.g. a fat tree, CLOS, a BCube, ...)
- There are, thus, a set of rules and patterns about the interconnections.
- This leads to a software abstraction to ensure that the rules are 
  followed:
  + Are the cables connected properly?
  + Is the Path MTU consistent ?
  + Does the topology have the desired property wrt non-blocking/non-
    interfering?
  + Point out the mistakes. Suggest new ways.
  + Etc.

The goal of PTM is to build such a layer. For the first phase, we choose
to address only the "cabling check". For each link on a system, PTM sets
a property - "topology status" that indicates whether the link is connected
to the right system on the remote end.

Note the important distinction between prescriptive topology layer and
the routing layer:

- The prescriptive layer ensures the desired (logical) topology.
- The routing layer ensures prefix reachability and (multipath) forwarding.

That said, these two layers do communicate. The prescriptive topology layer
notifies the routing layer about link topology status. The routing layer
evaluates the link topology status in addition to the physical link status 
(e.g. RTM_NEWLINK/RTM_DELLINK) before bringing routing sessions up. The
prescriptive topology layer MAY even bootstrap the configuration of the
routing layer to eliminate some of the operational "grunt work". More on
that later.

In summary, the network definition has three distinct substrates:

- interface layer, associated with addressing and things such as MTU.
  Configuration is specified in /etc/network/interfaces 
- prescriptive topology layer, specifies and enforces the correct physical
  connectivity. Configuration is specified in /etc/cumulus/ptm.dot.
- routing layer which enables packet forwarding, advertising the interface
  addresses over an enforced prescribed topology. Configuration is specified in
  routing config files such as zebra.conf, bgpd.conf, ospfd.conf etc.

2. Implementation

PTM is implemented as a daemon. The following figure gives the high
level description of PTMD:

                                    ___
                                       ) ptm.dot
                                    <-`
                     [CTL_MODULE] +------+
     +--------------------------->| PTMD | 
     |                            +------+
     |               [NBR_MODULE] ^  ^ [LLDP_MODULE]
     |                            |  | 
     |                    +-------+  |
     |         neighbor   |          | lldpctl_client
     V       notifications|          |
 +-------+                |       +------+
 |Clients|            Netlink     | LLDP |
 +-------+                        +------+

It is designed as a set of (easily changeable) modules:

(a) LLDP_MODULE:
    - Open a UNIX domain socket with LLDPD (using lldpctl_client interface)
      to receive link-level notifications. At connection init time, it
      receives a dump of the current link topology and then registers for
      further notifications.
(b) NBR_MODULE:
    - Open a netlink socket with the kernel to get a list of V4 and V6
      neighbor addresses. This is required for routing auto-configuration
      (described later).
(c) CTL_MODULE:
    - Create a UNIX domain server socket (abstract namespace) and listen
      on it so clients can connect to PTMD for either notifications or for
      a CLI-like interface.
    - Cater to each new client connection in a registration-like mechanism.

A generic set of function pointers are defined that each module implements:
- init_cb
- process_cb

As an example client, Zebra uses PTM notification to determine "topology 
status" of each link on which routing is enabled and bring
up routing sessions. That is, the following two conditions need to be
met:
(a) The interface is UP and layer 3 status is UP (RTM_NEWLINK and 
    RTM_NEWADDR).
(b) The topology status of the interface is UP (from PTM).

As an optimization, the updated "topology status" of each link should be
cached locally so that system reloads are faster. Indeed, when the system
is booting up, the PTM layer can honor the previously cached link topology
status.

3. ptm.dot

The idea is to define a file that enumerates the network graph of a DC.
It follows the syntax of DOT language:

  <router1>:<port1> -> <router2>:<port2>

A sample definition file is as follows:

+------------------------------------------------------------------------+
| # This file describes the network graph in a DOT format. The statements|
| # are of the form: <router1>:<port1> -> <router2>:<port2>              |
|                                                                        |
| digraph G {                                                            |
|      graph [hostidtype="hostname", version="1:0", date="04/12/2013"];  |
|      edge [dir=none, notify="log"];                                    |
|      rut:swp1 -> r1:swp1;                                              |
|      rut:swp2 -> r3:swp1;                                              |
|      r1:swp2 -> r2:swp1;                                               |
| }                                                                      |
|                                                                        |
+------------------------------------------------------------------------+

Note that the network graph needs to be complete. If certain ports are not
specified, the corresponding routing sessions will not come up.

4. Autogenerate Routing Config

It may be desirable, in some cases, to use 'ptm.dot' information to
autogenerate the routing configuration. For example, if the admin has
decided to use IBGP as the routing protocol, PTM layer can write the
appropriate BGP configuration in bgpd.conf (quagga) for each valid
neighbor. This is optional. Further details will be specified later.

Appendix A: Weak checks

- Enhance LLDP with some new TLVs:
  = Link spin TLV: Up/Down
  = Link IPv4 address TLV. Note only required for IPv4. For v6, we can
    autoconfig it/use link-local addressing.

The idea: with "link spin" TLV information for each link, PTM can check if
links with the "correct spin" are connected together. For example, it becomes
easy to check if one leaf node's 'Up' link is connected to another leaf
node's 'Up' link.

