NREN Conversion to a Point-to-Point

NREN Conversion to a Point-to-Point

Full Mesh Core for Multicasting Services

Mark Foster, Hugh LaMaster

January 1999

1. Summary

Currently, the NREN network is a two-level hierarchy, connecting routers via a full mesh of ATM switch PVCs. Although the PVCs are configured as a full mesh, the router ATM interfaces use Cisco’s point-to-multipoint feature. This feature permits a simplified router configuration since it effectively treats the ATM infrastructure as a pseudobroadcast medium: outbound traffic is replicated on each PVC of the interface. This note describes conversion of the router ATM interfaces to use point-to-point connections, and the motivation for doing so.

2. Discussion

Why consider wrecking something that works? What we have doesn't fully "work".

For distribution of routing updates, the replication of traffic at the ATM layer hasn’t posed a significant problem. Currently, routing updates are multicast throughout the network. This process is implemented by Cisco’s multicast extensions to IP over ATM as "broadcast to all VC’s in a subinterface", including non-router workstation hosts that are logically part of the subnet. Although this mechanism sounds "wasteful", the link-state routing protocols generate a trivial amount of traffic relative to high-speed link bandwidth, so the wasted bandwidth is very small if that is the only use of multicast.

However, as NREN has deployed native multicast as a facility for distributing real-time data, including "Mbone"-style videoconferences, the amount of "gratuitous" traffic has grown substantially. Usually only a small number of other routers (1-3) actually need to receive the multicast data stream, based on the set of recipients for the stream, but the data are replicated and sent out every PVC in the subinterface regardless of this set of receivers. As a consequence, the aggregate multicast traffic can consume a significant portion of the available capacity on the ARC-Sprint connection. Just the ongoing Mbone multicast traffic (2-4 Mbps) consumes 15-30 Mbps, when replicated to each of the NREN backbone routers.

Traffic from NREN-ARC to other NREN routers via Sprint is currently limited by the available bandwidth into the Sprint ATM cloud: a single OC-3 port. Already, during periods of high multicast traffic, the ATM interface has experienced 100 Mbps rates for extended periods, saturating the link, and the router CPU. The router CPU saturation is a result of another feature of multipoint subinterfaces: multicast traffic forwarding is 100% process-switched, and does not take advantage of Cisco hardware-accelerated fast/express switching. It has been observed on a 7513 that about 1 Mbps of IP traffic over the ATM translates to about 1% CPU utilization. Thus, even if NREN had OC-12 service from Sprint, total multicast traffic would still be limited to about 100 Mbps using the existing routers.

Upcoming applications that will use NREN multicasting infrastructure have significant bandwidth requirements (sourcing 15-40 Mbps; perhaps more). If we attempt to push this much multicast traffic into the NREN backbone with the current configuration, we will quickly exhaust the OC-3 link capacity (even without consideration for bandwidth sharing with NISN on these OC-3 links). In fact, 40 Mbps would require almost three times the bandwidth we have available.

To contemplate supporting large bandwidth multicasting applications, we need to deploy a routing architecture that will replicate multicast traffic only where it is required—that is, only to routers where there are downstream receivers of the traffic. In a LAN environment, Cisco provides several options that make use of ATM signalling to establish point to multipoint PVCs precisely at the location of traffic divergence. In a WAN environment, the effectiveness of these options would depend on the ability to signal Sprint’s core ATM switches. This capability is not currently available from Sprint. Consequently, the core needs to be treated as a non-broadcast medium. In other words, the links between the routers must be configured as a mesh of explicit point-to-point connections.

In addition, Cisco has created the Multicast Source Discovery Protocol (MSDP), which will allow the backbone to be operated in Protocol Independent Multicast (PIM) Sparse-Mode instead of PIM Dense-Mode. Using PIM-SM eliminates flood-and-prune traffic and router state required by PIM-DM.

By converting the core to a point-to-point mesh, and by deploying PIM-SM with MSDP, the bandwidth requirements for multicasting on any particular backbone link will be dramatically reduced. We expect to see multicast capacity scale according to the scope of the receivers of the traffic, rather than according to the extent of the entire NREN network.

The following sections describe a suggested approach for implementing such changes. The process assumes a complete shutdown of all core traffic during the transition, with reliable out-of-band access for reconfiguration services. An alternative approach would involve building a second mesh of PVC’s throughout the network, to simultaneously support both the point-to-point links and the point-to-multipoint links.

3. Implementation Overview

Convert to point-to-point full mesh:

3.1. Define /30 address assignments mappings for each link

3.2. Document PVC mappings

3.3. Snapshot existing configurations, upload to tftpboot server

3.4. Modify uploaded snapshots to reflect new addressing, p2p links

3.5. Dowload modified configs & reboot with new configs

3.6. Verify OSPF functionality, connectivity for all p2p links

3.7. Verify BGP/MBGP and PIM Sparse-Dense functionality, external connectivity

Deploy MSDP, PIM-SM:

3.8. Designate Rendezvous Point (RP) for multicast group membership coordination

3.9. Enable MSDP at all MBGP border routers; run either MSDP peering, or, proxy for dense-mode neighbors (primarily at Chicago).

3.10. Verify PIM-SM multicast connectivity

4. Implementation Details

Assuming we break 198.10.64/24 into a bunch of /30’s for the point to point mesh, the following general and specific examples show how the routers should be configured.

ospf general configuration

interface Loopback0

ip address 198.10.80.xxx 255.255.255.255

interface ATM X/Y/Z.S point-to-point

description myname->remotename vc JJJ

ip address 198.10.64.xxx 255.255.255.252

atm pvc …

router ospf 24

redistribute connected subnets

redistribute rip subnets

network 198.10.64.0 0.0.15.255 area 0

Notes:

OSPF link updates traverse loopback0

OSPF routing process id = 24 for all routers

OSPF network type allowed to use default, links are ptp full mesh

Connected subnets are redistributed into OSPF

Subnets learned through RIP are redistributed into OSPF

The collection of NREN core subnets (198.10.64.0/20) comprise area 0

ospf specific configuration – ARC

interface Loopback0

ip address 198.10.80.1 255.255.255.255

interface ATM4/0/0.1 point-to-point

description ARC->ANTL vc 101

ip address 198.10.64.14 255.255.255.252

no ip directed-broadcast

ip ospf priority 5

atm pvc 18 0 101 aal5snap

interface ATM4/0/0.30 point-to-point

description ARC->ARCswitch management vc 100

ip address 198.10.66.17 255.255.255.252

atm pvc 11 0 100 aal5snap

router ospf 24

redistribute connected subnets

redistribute rip subnets

passive-interface fddi1/0/0

network 198.10.64.0 0.0.15.255 area 0

network 198.10.80.1 0.0.0.0 area 0

default-information originate

Notes:

OSPF priority = 5 sets ARC as OSPF Designated Router

Passive used to prevent the advertisement of OSPF updates on those i/f’s

Default-info originate sets ARC advertising default into backbone

bgp general configuration

interface Loopback 0

ip address 198.10.80.xxx 255.255.255.0

router bgp 24

no synchronization

network 198.10.64.0 mask 255.255.240.0

network 198.10.80.0

neighbor 198.10.80.x remote-as 24 nlri unicast multicast

neighbor 198.10.80.x update-source Loopback0

neighbor 198.10.80.x send-community

neighbor 192.203.230.5 remote-as 297

neighbor 192.203.230.5 update-source Loopback0

neighbor 192.203.230.5 send-community

neighbor 192.203.230.5 unsuppress-map ibgpall

access-list 189 permit ip host 198.10.1.0 host 255.255.255.0

access-list 189 permit ip host 198.10.80.0 host 255.255.255.0

access-list 189 permit ip host 198.10.64.0 host 255.255.240.0

route-map ibgpall permit 10

match ip address 189

Notes:

Network 198.10.64.0 mask 255.255.240.0 provides injection of all /30’s (as a supernet).

pim specific configuration - ARC

interface ATM4/0/0.1 point-to-point

pim sparse-dense-mode

ip pim send-rp-announce Loopback0 scope 4

ip pim send-rp-discovery scope 4

Notes:

Every point-to-point interface between routers should use pim sparse-dense-mode. This provides automatic handling/detection of sparse-mode capable links, and provides auto-rp. The designated RP (ARC) sends rp announce and discovery messages to the core.

special cases

The Sioux Falls and DC routers will not be configured as part of the full mesh backbone. The Chicago router will be a route reflector for Sioux Falls, the Goddard router will be for D.C.; in the bgp neighbor statement, the IP address of the remote (client) router is used to cause transmission of the IBGP routes to the client. Both client routers should have bgp neighbor statements only for their respective route reflector servers.

! nren-chi-rtr

router bgp 24

neighbor 198.10.64.193 route-reflector-client

! nren-gsfc-rtr

router bgp 24

neighbor 198.10.64.198 route-reflector-client

4.1 Schedule

The primary goal of the schedule is to compress the total reconfiguration time into two or three days, with a very high probability of success in the short period. Test procedures will be used to altert us to key difficulties that may arise during the transition.

Jan 19	Define /30 address assignments mappings for each link, preliminary implementation plan
Jan 21	Finalize address assignments, verify isolan and dialup accessibility
Jan 22	Testbed setup plan
Jan 25	Presentation and agreement on final implementation plan
Jan 26	Evaluate implementation plan on testbed; use results to refine implementation plan and this schedule
Jan 29	Report on results from implementation test
Feb 1- 2	Retest refinements, if needed. Complete text editing of config snapshots with new addresses, parameters
Feb 3	Download updated configs to routers & switches, commence reconfiguration
Feb 4	Continue reconfiguration, re-establish router-router connectivity. Verify unicast functionality and OSPF routing
Feb 5	Verify BGP/MBGP routing, PIM-Sparse-Dense multicast functionality

4.2 Operational Considerations

During testing, a handful of problems were identified. These problems and their solutions or workarounds are highlighted here. Some of the problems relate specifically to the renumbering work, and some are more generic. When possible, a distinction(*) has been made for the renumbering-specific difficulties.

4.2.1 IOS

Starting BGP with an active ATM interface may crash the router

Before first starting the BGP process, shutdown any active ATM interfaces; subsequent reconfigurations of BGP don’t require such action.

ATM multipoint subinterface removal requires reload (*)

A subinterface can be deleted with "no interface", but it cannot be converted from p2mp to p2p without first doing a reload. The cleanest approach is to "erase start" then "copy p2p.cfg start" followed by a reload, where p2p.cfg is the downloaded config with the renumbering changes applied.

Cannot have p2mp subint’s with distributed route caching or distributed cef

Distributed CEF and distributed route caching misbehaves (or fails altogether) when there are point-to-multipoint subinterfaces on the interface with the distributed service enabled. This situation is truee even if route caching is specifically disabled for the p2mp subinterface. All p2mp links should be converted to point-to-point links. If this approach is insufficient, then a second ATM port adapter could be used to handle the point-to-multipoint traffic.

4.2.2 ForeThought

ami (ascii) version of cdb is inadequate

The ascii version of the cdb cannot be used to completely reconfigure a switch. Some items cannot be done non-interactively and other steps may conflict with the current state of the switch (even a just-rebooted switch). In particular, adding or changing some classicalIP pvcs may require first deleting the underlying vcc’s. Once the CLIP pvc is established, the vcc’s can be recreated.

To successfully load a (modified) ascii cdb, the current cdb should be initialized, then the switch rebooted. At this point, it will have no network configuration, and will require out-of-band access.

Always use "restore -ignore_errors" to process the full ascii ami batch file.

Default routes are needed (*)

When transitioning from multipoint interfaces, all switches must have a default route that goes to a router. If a static default is not used, precise IP addresses would have to be used for each switch, based on where you are connecting from. It is suggested that the default route refer to the nearest router.

4.2.3 Out-of-Band Access

(*) We found maintaining out-of-band access cricital to performing the readdressing while reusing the existing PVC mesh. Th e only option for part of the work on the Fore switches is via the console port (dialin). Suitable access should be able to be maintained via ethernet or FDDI ISOLAN interfaces on the routers, providing that suitable routing tables are in place (the most obvious would be a static route back to a specific management workstation).