NREN Conversion to a Point-to-Point
Full Mesh Core for Multicasting Services
Mark Foster, Hugh LaMaster
January 1999
1. Summary
Currently, the NREN network is a two-level hierarchy, connecting routers via a full mesh of ATM switch PVCs. Although the PVCs are configured as a full mesh, the router ATM interfaces use Ciscos point-to-multipoint feature. This feature permits a simplified router configuration since it effectively treats the ATM infrastructure as a pseudobroadcast medium: outbound traffic is replicated on each PVC of the interface. This note describes conversion of the router ATM interfaces to use point-to-point connections, and the motivation for doing so.
2. Discussion
Why consider wrecking something that works? What we have doesn't fully "work".
For distribution of routing updates, the replication of traffic at the ATM layer hasnt posed a significant problem. Currently, routing updates are multicast throughout the network. This process is implemented by Ciscos multicast extensions to IP over ATM as "broadcast to all VCs in a subinterface", including non-router workstation hosts that are logically part of the subnet. Although this mechanism sounds "wasteful", the link-state routing protocols generate a trivial amount of traffic relative to high-speed link bandwidth, so the wasted bandwidth is very small if that is the only use of multicast.
However, as NREN has deployed native multicast as a facility for distributing real-time data, including "Mbone"-style videoconferences, the amount of "gratuitous" traffic has grown substantially. Usually only a small number of other routers (1-3) actually need to receive the multicast data stream, based on the set of recipients for the stream, but the data are replicated and sent out every PVC in the subinterface regardless of this set of receivers. As a consequence, the aggregate multicast traffic can consume a significant portion of the available capacity on the ARC-Sprint connection. Just the ongoing Mbone multicast traffic (2-4 Mbps) consumes 15-30 Mbps, when replicated to each of the NREN backbone routers.
Traffic from NREN-ARC to other NREN routers via Sprint is currently limited by the available bandwidth into the Sprint ATM cloud: a single OC-3 port. Already, during periods of high multicast traffic, the ATM interface has experienced 100 Mbps rates for extended periods, saturating the link, and the router CPU. The router CPU saturation is a result of another feature of multipoint subinterfaces: multicast traffic forwarding is 100% process-switched, and does not take advantage of Cisco hardware-accelerated fast/express switching. It has been observed on a 7513 that about 1 Mbps of IP traffic over the ATM translates to about 1% CPU utilization. Thus, even if NREN had OC-12 service from Sprint, total multicast traffic would still be limited to about 100 Mbps using the existing routers.
Upcoming applications that will use NREN multicasting infrastructure have significant bandwidth requirements (sourcing 15-40 Mbps; perhaps more). If we attempt to push this much multicast traffic into the NREN backbone with the current configuration, we will quickly exhaust the OC-3 link capacity (even without consideration for bandwidth sharing with NISN on these OC-3 links). In fact, 40 Mbps would require almost three times the bandwidth we have available.
To contemplate supporting large bandwidth multicasting applications, we need to deploy a routing architecture that will replicate multicast traffic only where it is requiredthat is, only to routers where there are downstream receivers of the traffic. In a LAN environment, Cisco provides several options that make use of ATM signalling to establish point to multipoint PVCs precisely at the location of traffic divergence. In a WAN environment, the effectiveness of these options would depend on the ability to signal Sprints core ATM switches. This capability is not currently available from Sprint. Consequently, the core needs to be treated as a non-broadcast medium. In other words, the links between the routers must be configured as a mesh of explicit point-to-point connections.
In addition, Cisco has created the Multicast Source Discovery Protocol (MSDP), which will allow the backbone to be operated in Protocol Independent Multicast (PIM) Sparse-Mode instead of PIM Dense-Mode. Using PIM-SM eliminates flood-and-prune traffic and router state required by PIM-DM.
By converting the core to a point-to-point mesh, and by deploying PIM-SM with MSDP, the bandwidth requirements for multicasting on any particular backbone link will be dramatically reduced. We expect to see multicast capacity scale according to the scope of the receivers of the traffic, rather than according to the extent of the entire NREN network.
The following sections describe a suggested approach for implementing such changes. The process assumes a complete shutdown of all core traffic during the transition, with reliable out-of-band access for reconfiguration services. An alternative approach would involve building a second mesh of PVCs throughout the network, to simultaneously support both the point-to-point links and the point-to-multipoint links.
3. Implementation Overview
Convert to point-to-point full mesh:
3.1. Define /30 address assignments mappings for each link
3.2. Document PVC mappings
3.3. Snapshot existing configurations, upload to tftpboot server
3.4. Modify uploaded snapshots to reflect new addressing, p2p links
3.5. Dowload modified configs & reboot with new configs
3.6. Verify OSPF functionality, connectivity for all p2p links
3.7. Verify BGP/MBGP and PIM Sparse-Dense functionality, external connectivity
Deploy MSDP, PIM-SM:
3.8. Designate Rendezvous Point (RP) for multicast group membership coordination
3.9. Enable MSDP at all MBGP border routers; run either MSDP peering, or, proxy for dense-mode neighbors (primarily at Chicago).
3.10. Verify PIM-SM multicast connectivity
4. Implementation Details
Assuming we break 198.10.64/24 into a bunch of /30s for the point to point mesh, the following general and specific examples show how the routers should be configured.
ospf general configuration
!
interface Loopback0
ip address 198.10.80.xxx 255.255.255.255
!
interface ATM X/Y/Z.S point-to-point
description myname->remotename vc JJJ
ip address 198.10.64.xxx 255.255.255.252
atm pvc
!
router ospf 24
redistribute connected subnets
redistribute rip subnets
network 198.10.64.0 0.0.15.255 area 0
!
Notes:
OSPF link updates traverse loopback0
OSPF routing process id = 24 for all routers
OSPF network type allowed to use default, links are ptp full mesh
Connected subnets are redistributed into OSPF
Subnets learned through RIP are redistributed into OSPF
The collection of NREN core subnets (198.10.64.0/20) comprise area 0
ospf specific configuration ARC
!
interface Loopback0
ip address 198.10.80.1 255.255.255.255
!
interface ATM4/0/0.1 point-to-point
description ARC->ANTL vc 101
ip address 198.10.64.14 255.255.255.252
no ip directed-broadcast
ip ospf priority 5
atm pvc 18 0 101 aal5snap
!
interface ATM4/0/0.30 point-to-point
description ARC->ARCswitch management vc 100
ip address 198.10.66.17 255.255.255.252
atm pvc 11 0 100 aal5snap
!
router ospf 24
redistribute connected subnets
redistribute rip subnets
passive-interface fddi1/0/0
network 198.10.64.0 0.0.15.255 area 0
network 198.10.80.1 0.0.0.0 area 0
default-information originate
!
Notes:
OSPF priority = 5 sets ARC as OSPF Designated Router
Passive used to prevent the advertisement of OSPF updates on those i/fs
Default-info originate sets ARC advertising default into backbone
bgp general configuration
!
interface Loopback 0
ip address 198.10.80.xxx 255.255.255.0
!
router bgp 24
no synchronization
network 198.10.64.0 mask 255.255.240.0
network 198.10.80.0
neighbor 198.10.80.x remote-as 24 nlri unicast multicast
neighbor 198.10.80.x update-source Loopback0
neighbor 198.10.80.x send-community
neighbor 192.203.230.5 remote-as 297
neighbor 192.203.230.5 update-source Loopback0
neighbor 192.203.230.5 send-community
neighbor 192.203.230.5 unsuppress-map ibgpall
!
access-list 189 permit ip host 198.10.1.0 host 255.255.255.0
access-list 189 permit ip host 198.10.80.0 host 255.255.255.0
access-list 189 permit ip host 198.10.64.0 host 255.255.240.0
!
route-map ibgpall permit 10
match ip address 189
!
Notes:
Network 198.10.64.0 mask 255.255.240.0 provides injection of all /30s (as a supernet).
pim specific configuration - ARC
!
interface ATM4/0/0.1 point-to-point
pim sparse-dense-mode
!
ip pim send-rp-announce Loopback0 scope 4
ip pim send-rp-discovery scope 4
Notes:
Every point-to-point interface between routers should use pim sparse-dense-mode. This provides automatic handling/detection of sparse-mode capable links, and provides auto-rp. The designated RP (ARC) sends rp announce and discovery messages to the core.
special cases
The Sioux Falls and DC routers will not be configured as part of the full mesh backbone. The Chicago router will be a route reflector for Sioux Falls, the Goddard router will be for D.C.; in the bgp neighbor statement, the IP address of the remote (client) router is used to cause transmission of the IBGP routes to the client. Both client routers should have bgp neighbor statements only for their respective route reflector servers.
! nren-chi-rtr
!
router bgp 24
neighbor 198.10.64.193 route-reflector-client
!
! nren-gsfc-rtr
!
router bgp 24
neighbor 198.10.64.198 route-reflector-client
4.1 Schedule
The primary goal of the schedule is to compress the total reconfiguration time into two or three days, with a very high probability of success in the short period. Test procedures will be used to altert us to key difficulties that may arise during the transition.
Jan 19 |
Define /30 address assignments mappings for each link, preliminary implementation plan |
Jan 21 |
Finalize address assignments, verify isolan and dialup accessibility |
Jan 22 |
Testbed setup plan |
Jan 25 |
Presentation and agreement on final implementation plan |
Jan 26 |
Evaluate implementation plan on testbed; use results to refine implementation plan and this schedule |
Jan 29 |
Report on results from implementation test |
Feb 1- 2 |
Retest refinements, if needed. Complete text editing of config snapshots with new addresses, parameters |
Feb 3 |
Download updated configs to routers & switches, commence reconfiguration |
Feb 4 |
Continue reconfiguration, re-establish router-router connectivity. Verify unicast functionality and OSPF routing |
Feb 5 |
Verify BGP/MBGP routing, PIM-Sparse-Dense multicast functionality |
4.2 Operational Considerations
During testing, a handful of problems were identified. These problems and their solutions or workarounds are highlighted here. Some of the problems relate specifically to the renumbering work, and some are more generic. When possible, a distinction(*) has been made for the renumbering-specific difficulties.
4.2.1 IOS
Before first starting the BGP process, shutdown any active ATM interfaces; subsequent reconfigurations of BGP dont require such action.
A subinterface can be deleted with "no interface", but it cannot be converted from p2mp to p2p without first doing a reload. The cleanest approach is to "erase start" then "copy p2p.cfg start" followed by a reload, where p2p.cfg is the downloaded config with the renumbering changes applied.
Distributed CEF and distributed route caching misbehaves (or fails altogether) when there are point-to-multipoint subinterfaces on the interface with the distributed service enabled. This situation is truee even if route caching is specifically disabled for the p2mp subinterface. All p2mp links should be converted to point-to-point links. If this approach is insufficient, then a second ATM port adapter could be used to handle the point-to-multipoint traffic.
4.2.2 ForeThought
The ascii version of the cdb cannot be used to completely reconfigure a switch. Some items cannot be done non-interactively and other steps may conflict with the current state of the switch (even a just-rebooted switch). In particular, adding or changing some classicalIP pvcs may require first deleting the underlying vccs. Once the CLIP pvc is established, the vccs can be recreated.
To successfully load a (modified) ascii cdb, the current cdb should be initialized, then the switch rebooted. At this point, it will have no network configuration, and will require out-of-band access.
Always use "restore -ignore_errors" to process the full ascii ami batch file.
When transitioning from multipoint interfaces, all switches must have a default route that goes to a router. If a static default is not used, precise IP addresses would have to be used for each switch, based on where you are connecting from. It is suggested that the default route refer to the nearest router.
4.2.3 Out-of-Band Access
(*) We found maintaining out-of-band access cricital to performing the readdressing while reusing the existing PVC mesh. Th e only option for part of the work on the Fore switches is via the console port (dialin). Suitable access should be able to be maintained via ethernet or FDDI ISOLAN interfaces on the routers, providing that suitable routing tables are in place (the most obvious would be a static route back to a specific management workstation).