DRAFT: The impact of brushes on Q3 engine runtime performance.

note: this is a rough draft.

Introduction

This article describes the some of the effects of brushes on runtime performance of maps for q3 engine games. Henceforth, q3 refers to Quake III, Return to castle Wolfenstein, Wolfenstein:Enemy Territory and similar games. Some other Quake III engine licenses may have diverged significantly from what is described here. Testing was done using Wolfenstein:Enemy Territory, GtkRadiant and q3map2 (based on version 2.5.16 with some custom modifications).

Background

Q3 performance in general

On modern hardware, the performance of q3 engine games tends to be limited by CPU speed (note: OC testing), along with memory and cache and bus bandwidth. The most common way to measure the cost of a particular scene is 'r_speeds' (number of triangles) (note: r_speeds), however experience shows that there are many other factors which affect frame rate. A significant portion of CPU and memory bandwidth is used by non-rendering tasks, but information on how to control these costs is not widely available.

The IBSP format

The .bsp file contains more than just the raw BSP tree. It defines drawing surfaces, solid and interactive geometry and game related entities. Kekoa Proudfoots Unofficial Quake 3 Map Specs gives a good overview of the format. A general understanding of the format and terminology used to describe it is required to fully understand the rest of this article.
(note: q3map -info can be used to list the sizes and number of entries in various BSP lumps)
(note: http://www.planetquake.com/spog/stuff/technical.html)

The key characteristics relevant to this article are:

The compilation process

Most map geometry is created using brushes in the editor to describe both the collision properties and drawing textures. The compilation process turns the drawing geometry of the brushes into the relevant entries in the surfaces (and related lumps) and stores the brushes in leafbrushes. With a couple of minor exceptions (note: origin) every brush in the map is stored as a leafbrush in the .bsp static models (misc_model) are converted directly to drawsurfaces. each leafnode of the BSP file contains of list of brushes and drawsurfaces that it contains (note, a brush or surface can be present in more than one leafnode)

Runtime use of leafbrushes

Interaction with the geometry of the world in Q3 is done use the trace function. This traces a solid (or point) through the world, and finds out what it would run into. (note: q3engine code/qcommon/cm_trace.c) Trace first uses the BSP to find out which leaves are potentially of interest, and then tests against every leafbrush in that leaf. (note: CM_TraceThroughLeaf) There are some early outs, but in general, every brush in a leaf will be seen every time a trace crosses that leaf. This is true even if the surface or contents of the brush are not of interest (e.g. a nonsolid is still check in collision detection.

Server and client performance

In general, mappers do their testing on a 'listen server', where the server and client are run in the same process. For games with bot-play, this is essential, as it reflects the end user environment. For online play, it is important to realize that both server and client performance of interest. A poorly constructed map can cause 'laggy' online performance, even if all the clients get good frame rates. Obviously, rendering performance can only be a factor on the client. Physics performance may be a problem on either. (note: b_optimizeprediction) (note: client FPS affect on server performance)

Efficient maps and "caulk-hull" construction

So called "caulk hull" construction is a method of building maps which divorces structural (in the BSP sense) brushwork from the drawing details. This allows the mapper greater control over the BSP process, at the expense of more brushes.

Objective

The objective of this article is to describe the performance impact of leaf brushes, and use this to develop best practices for creating efficient maps.

The following specific questions are investigated:

  1. What is the performance impact of non-drawing brushes ?
  2. How is this impact split between client and server ?
  3. Does splitting up a navigable area into multiple leafs reduce the performance impact of non-drawing brushes ?
  4. Do non-solid brush have less impact than solid ones ?
  5. What is the relative performance and file size of a misc_model compared to brushwork ?
  6. Is it possible to achieve misc_model performance from brushwork ?

Test cases

The test maps consist of a single structural room with spawn point and 16K of brushes of various types, or a model generated from those brushes. The large number of brushes is used to ensure the results are clearly seen on a high end machine.
(note: test client system: Athlon 64 @2.1 ghz, 1GB Dual channel DDR @426, Geforce 6600GT @550/1100)
(note: test server system: Athlon XP @1.4 ghz, 768MB DDR @266)
(note: compiling the checkerboard drawing test case maps uses over 700MB of memory, and requires ~20 minutes on the test system)
(note: the test cases with nodraw will have no texture at all on the floor. This will cause HOM if you don't have r_fastsky on)
all maps compiled with bsp -meta (required for ET).
NameDescriptionComments
nodraw-solid-flush nondrawing, solid brushes, flush with (completely embedded in) the floor Basic test of the cost of non-drawing brushwork
nodraw-solid-flush-b256 nondrawing, solid brushes, flush with (completely embedded in) the floor, blocksize 256 Test the impact of putting the above brushwork into multiple leafs
nodraw-solid-split nondrawing, solid brushes, partly protruding from the floor What happens if the brushes occupy more than one leaf ?
nodraw-solid-nosplit nondrawing, solid brushes, completely above the floor (inside the structural box) What happens if they are completely inside, not overlapping anything else.
nodraw-nonsolid-flush nondrawing, nonsolid brushes, flush with the floor as nodraw-solid-flush, but nonsolid.
draw-solid-flush solid, drawing brushes, flush with the floor Baseline drawing brushes.
draw-nonsolid-flush nonsolid, drawing brushes, flush with the floor Drawing with nonsolid brushes. Note: uses custom q3map2 feature and shader to ensure brushes are nonsolid.
model ASE misc_model (generated from drawing brushwork above), flush with the floor Triangle model generated from above brushwork. Same geometry but no brushes.
discard-flush drawing brushwork with brushes discarded, flush with the floor Same as draw-solid-flush, but using a modified q3map2 which discards the brushes (and associated resources) after compile.
bsp command
q3map2 -v -game et -fs_basepath "<basepath>" -meta -mv 1024 -mi 6144 "<mapname>"
vis command q3map2 -game et -fs_basepath "<basepath>" -vis -saveprt "<mapname>" (note: all but nodraw-solid-flush-b256 are unaffected by vis, since they only have 1 cluster.)

Measuring client and server performance

Results

test caseFPSoptServer CPU @43@max.BSP size
caulk-flush 116-118640+* 9%-11%100%* 1MB
clip-flush 71-73600+* 26%-30%100%* 1MB
clip-split 48-49230 60%-63%100%* 1.1M
clip-flush-b256 291-294715* 1%-2%8%-11% 1MB
nonsolid-flush 79-80700+* 16-17%100%* 1MB
draw-solid-flush 88-89157-159 9-11%34%-35% 3.6MB
model 200-202206-207 0.7%-2%1%-3% 2.8MB
discard-flush 194-195198-199 0.7-2%1%-3% 2.7MB
(opt = client fps w/b_optimizeprediction)
(@max = server load with client at max FPS)
(note: Server CPU load goes up significantly while moving. Client FPS goes down with opt, less without.)
(opt* FPS = connection interrupt due to high frame rate)
(@max* Server load = server overloaded, connection interrupt on client)
(note: draw-solid-flush is odd. Numbers double checked. Verify maps)

Conclusions

  1. Even non-drawing brushes have a noticeable, and in some cases significant, impact on runtime performance.
  2. The number of brushes per leaf is a significant factor. Thus, splitting up your BSP even where it does not contribute to viability calculations may be desirable, if you have lots of complex brushwork. (note that this isn't free either, of course)
  3. Non-solid brushes have less impact, but still have far more than no brushes at all.
  4. Brushes which exist in multiple leafs cost more than brushes which exist in only one leaf, both in terms of file size and performance. Note however that they are cheaper than having unique brushes in each leaf.
  5. In situations where complex drawing geometry is required, but complex collision geometry is not, significant performance gains and size reduction can be achieved by omitting the collision geometry.
  6. A method for directly creating drawing geometry in the editor, without producing collision geometry is useful, especially for 'caulk hull' style construction.