DRAFT: The impact of brushes on Q3 engine runtime performance.

note: this is a rough draft.

Introduction

This article describes the some of the effects of brushes on runtime performance of maps for q3 engine games. Henceforth, q3 refers to Quake III, Return to castle Wolfenstein, Wolfenstein:Enemy Territory and similar games. Some other Quake III engine licenses may have diverged significantly from what is described here. Testing was done using Wolfenstein:Enemy Territory, GtkRadiant and q3map2 (based on version 2.5.16 with some custom modifications).

Background

Q3 performance in general

On modern hardware, the performance of q3 engine games tends to be limited by CPU speed (note: OC testing), along with memory and cache and bus bandwidth. The most common way to measure the cost of a particular scene is 'r_speeds' (number of triangles) (note: r_speeds), however experience shows that there are many other factors which affect frame rate. A significant portion of CPU and memory bandwidth is used by non-rendering tasks, but information on how to control these costs is not widely available.

The IBSP format

The .bsp file contains more than just the raw BSP tree. It defines drawing surfaces, solid and interactive geometry and game related entities. Kekoa Proudfoots Unofficial Quake 3 Map Specs gives a good overview of the format. A general understanding of the format and terminology used to describe it is required to fully understand the rest of this article.
(note: q3map -info can be used to list the sizes and number of entries in various BSP lumps)
(note: http://www.planetquake.com/spog/stuff/technical.html)

The key characteristics relevant to this article are:

Collision geometry (note: collision geometry refers to all game interactions with the world, not just what players can collide with)and drawing geometry are independent, defined by brushes and drawsurfaces respectively (note: drawsurfaces are referred to as faces in Kekoa's document) (note: except for patch meshes, which is a special case this article doesn't cover)
The characteristics of surfaces in game are defined by the brushes.
The BSP leafs are used to organize drawing and collision geometry in space, using the leafbrushes and leaffaces lumps.

The compilation process

Most map geometry is created using brushes in the editor to describe both the collision properties and drawing textures. The compilation process turns the drawing geometry of the brushes into the relevant entries in the surfaces (and related lumps) and stores the brushes in leafbrushes. With a couple of minor exceptions (note: origin) every brush in the map is stored as a leafbrush in the .bsp static models (misc_model) are converted directly to drawsurfaces. each leafnode of the BSP file contains of list of brushes and drawsurfaces that it contains (note, a brush or surface can be present in more than one leafnode)

Runtime use of leafbrushes

Interaction with the geometry of the world in Q3 is done use the trace function. This traces a solid (or point) through the world, and finds out what it would run into. (note: q3engine code/qcommon/cm_trace.c) Trace first uses the BSP to find out which leaves are potentially of interest, and then tests against every leafbrush in that leaf. (note: CM_TraceThroughLeaf) There are some early outs, but in general, every brush in a leaf will be seen every time a trace crosses that leaf. This is true even if the surface or contents of the brush are not of interest (e.g. a nonsolid is still check in collision detection.

Server and client performance

In general, mappers do their testing on a 'listen server', where the server and client are run in the same process. For games with bot-play, this is essential, as it reflects the end user environment. For online play, it is important to realize that both server and client performance of interest. A poorly constructed map can cause 'laggy' online performance, even if all the clients get good frame rates. Obviously, rendering performance can only be a factor on the client. Physics performance may be a problem on either. (note: b_optimizeprediction) (note: client FPS affect on server performance)

Efficient maps and "caulk-hull" construction

So called "caulk hull" construction is a method of building maps which divorces structural (in the BSP sense) brushwork from the drawing details. This allows the mapper greater control over the BSP process, at the expense of more brushes.

Objective

The objective of this article is to describe the performance impact of leaf brushes, and use this to develop best practices for creating efficient maps.

The following specific questions are investigated:

What is the performance impact of non-drawing brushes ?
How is this impact split between client and server ?
Does splitting up a navigable area into multiple leafs reduce the performance impact of non-drawing brushes ?
Do non-solid brush have less impact than solid ones ?
What is the relative performance and file size of a misc_model compared to brushwork ?
Is it possible to achieve misc_model performance from brushwork ?

Test cases

The test maps consist of a single structural room with spawn point and 16K of brushes of various types, or a model generated from those brushes. The large number of brushes is used to ensure the results are clearly seen on a high end machine.
(note: test client system: Athlon 64 @2.1 ghz, 1GB Dual channel DDR @426, Geforce 6600GT @550/1100)
(note: test server system: Athlon XP @1.4 ghz, 768MB DDR @266)
(note: compiling the checkerboard drawing test case maps uses over 700MB of memory, and requires ~20 minutes on the test system)
(note: the test cases with nodraw will have no texture at all on the floor. This will cause HOM if you don't have r_fastsky on)
all maps compiled with bsp -meta (required for ET).

Name	Description	Comments
nodraw-solid-flush	nondrawing, solid brushes, flush with (completely embedded in) the floor	Basic test of the cost of non-drawing brushwork
nodraw-solid-flush-b256	nondrawing, solid brushes, flush with (completely embedded in) the floor, blocksize 256	Test the impact of putting the above brushwork into multiple leafs
nodraw-solid-split	nondrawing, solid brushes, partly protruding from the floor	What happens if the brushes occupy more than one leaf ?
nodraw-solid-nosplit	nondrawing, solid brushes, completely above the floor (inside the structural box)	What happens if they are completely inside, not overlapping anything else.
nodraw-nonsolid-flush	nondrawing, nonsolid brushes, flush with the floor	as nodraw-solid-flush, but nonsolid.
draw-solid-flush	solid, drawing brushes, flush with the floor	Baseline drawing brushes.
draw-nonsolid-flush	nonsolid, drawing brushes, flush with the floor	Drawing with nonsolid brushes. Note: uses custom q3map2 feature and shader to ensure brushes are nonsolid.
model	ASE misc_model (generated from drawing brushwork above), flush with the floor	Triangle model generated from above brushwork. Same geometry but no brushes.
discard-flush	drawing brushwork with brushes discarded, flush with the floor	Same as draw-solid-flush, but using a modified q3map2 which discards the brushes (and associated resources) after compile.

bsp command


	q3map2 -v -game et -fs_basepath "<basepath>" -meta -mv 1024 -mi 6144 "<mapname>"

vis command


q3map2  -game et -fs_basepath "<basepath>" -vis -saveprt "<mapname>"

(note: all but nodraw-solid-flush-b256 are unaffected by vis, since they only have 1 cluster.)

Measuring client and server performance

client performance is measured in frames per second. This obviously depends on the client system, but provides a relative indicator
Server performance is measured as %CPU usage, as reported by top. (note: server CPU load required to run a client depends directly on the clients frame rate. Thus comparable measurements on different maps, the client must be capped at the same frame rate.)

Results

test case	FPS	opt	Server CPU @43	@max	.BSP size
caulk-flush	116-118	640+*	9%-11%	100%*	1MB
clip-flush	71-73	600+*	26%-30%	100%*	1MB
clip-split	48-49	230	60%-63%	100%*	1.1M
clip-flush-b256	291-294	715*	1%-2%	8%-11%	1MB
nonsolid-flush	79-80	700+*	16-17%	100%*	1MB
draw-solid-flush	88-89	157-159	9-11%	34%-35%	3.6MB
model	200-202	206-207	0.7%-2%	1%-3%	2.8MB
discard-flush	194-195	198-199	0.7-2%	1%-3%	2.7MB

(opt = client fps w/b_optimizeprediction)
(@max = server load with client at max FPS)
(note: Server CPU load goes up significantly while moving. Client FPS goes down with opt, less without.)
(opt* FPS = connection interrupt due to high frame rate)
(@max* Server load = server overloaded, connection interrupt on client)
(note: draw-solid-flush is odd. Numbers double checked. Verify maps)

Conclusions

Even non-drawing brushes have a noticeable, and in some cases significant, impact on runtime performance.
The number of brushes per leaf is a significant factor. Thus, splitting up your BSP even where it does not contribute to viability calculations may be desirable, if you have lots of complex brushwork. (note that this isn't free either, of course)
Non-solid brushes have less impact, but still have far more than no brushes at all.
Brushes which exist in multiple leafs cost more than brushes which exist in only one leaf, both in terms of file size and performance. Note however that they are cheaper than having unique brushes in each leaf.
In situations where complex drawing geometry is required, but complex collision geometry is not, significant performance gains and size reduction can be achieved by omitting the collision geometry.
A method for directly creating drawing geometry in the editor, without producing collision geometry is useful, especially for 'caulk hull' style construction.