DRAFT: The impact of brushes on Q3 engine runtime performance.
note: this is a rough draft.
This article describes the some of the effects of brushes on
runtime performance of maps for q3 engine games. Henceforth, q3 refers
to Quake III, Return to castle Wolfenstein, Wolfenstein:Enemy Territory
and similar games. Some other Quake III engine licenses may have diverged
significantly from what is described here. Testing was done using
Wolfenstein:Enemy Territory, GtkRadiant and q3map2
(based on version 2.5.16 with some custom modifications).
Q3 performance in general
On modern hardware, the performance of q3 engine games tends to be limited
by CPU speed (note: OC testing), along with memory and cache and bus bandwidth.
The most common way to measure the cost of a particular scene is 'r_speeds'
(number of triangles) (note: r_speeds), however experience shows that there
are many other factors which affect frame rate. A significant portion of
CPU and memory bandwidth is used by non-rendering tasks, but information on how
to control these costs is not widely available.
The IBSP format
The .bsp file contains more than just the raw BSP tree. It defines
drawing surfaces, solid and interactive geometry and game related entities.
Kekoa Proudfoots Unofficial Quake 3 Map Specs
gives a good overview of the format. A general understanding of the format
and terminology used to describe it is required to fully understand the
rest of this article.
(note: q3map -info can be used to list the sizes and number of entries in various BSP lumps)
The key characteristics relevant to this article are:
- Collision geometry (note: collision geometry refers to all
game interactions with the world, not just what players can collide
with)and drawing geometry are independent, defined by brushes and drawsurfaces
respectively (note: drawsurfaces are referred to as faces in Kekoa's document) (note: except for patch meshes, which
is a special case this article doesn't cover)
- The characteristics of surfaces in game are defined by the brushes.
- The BSP leafs are used to organize drawing and collision geometry in space, using the leafbrushes
and leaffaces lumps.
The compilation process
Most map geometry is created using brushes in the editor to describe
both the collision properties and drawing textures. The compilation process
turns the drawing geometry of the brushes into the relevant entries in the
surfaces (and related lumps) and stores the brushes in leafbrushes.
With a couple of minor exceptions (note: origin) every brush in the map
is stored as a leafbrush in the .bsp
static models (misc_model) are converted directly to drawsurfaces.
each leafnode of the BSP file contains of list of brushes and drawsurfaces
that it contains (note, a brush or surface can be present in more than one
Runtime use of leafbrushes
Interaction with the geometry of the world in Q3 is done use the trace
function. This traces a solid (or point) through the world, and finds
out what it would run into. (note: q3engine code/qcommon/cm_trace.c)
Trace first uses the BSP to find out which leaves are potentially of
interest, and then tests against every leafbrush in that leaf.
There are some early outs, but in general, every brush in a leaf will be
seen every time a trace crosses that leaf. This is true even if the surface
or contents of the brush are not of interest (e.g. a nonsolid is still
check in collision detection.
Server and client performance
In general, mappers do their testing on a 'listen server', where the server
and client are run in the same process. For games with bot-play, this is
essential, as it reflects the end user environment. For online play, it is
important to realize that both server and client performance of interest. A
poorly constructed map can cause 'laggy' online performance, even if all the
clients get good frame rates. Obviously, rendering performance can only be a
factor on the client. Physics performance may be a problem on either.
(note: client FPS affect on server performance)
Efficient maps and "caulk-hull" construction
So called "caulk hull" construction is a method of building maps which
divorces structural (in the BSP sense) brushwork from the drawing details.
This allows the mapper greater control over the BSP process, at the
expense of more brushes.
The objective of this article is to describe the performance impact of leaf
brushes, and use this to develop best practices for creating efficient maps.
The following specific questions are investigated:
- What is the performance impact of non-drawing brushes ?
- How is this impact split between client and server ?
- Does splitting up a navigable area into multiple leafs reduce the performance impact of non-drawing brushes ?
- Do non-solid brush have less impact than solid ones ?
- What is the relative performance and file size of a misc_model compared to brushwork ?
- Is it possible to achieve misc_model performance from brushwork ?
The test maps consist of a single structural room with spawn point and 16K of brushes of various types,
or a model generated from those brushes. The large number of brushes is used to ensure the
results are clearly seen on a high end machine.
(note: test client system: Athlon 64 @2.1 ghz, 1GB Dual channel DDR @426, Geforce 6600GT @550/1100)
(note: test server system: Athlon XP @1.4 ghz, 768MB DDR @266)
(note: compiling the checkerboard drawing test case maps uses over 700MB of memory, and requires ~20 minutes on the test system)
(note: the test cases with nodraw will have no texture at all on the floor. This will cause HOM if you don't have r_fastsky on)
all maps compiled with bsp -meta (required for ET).
||nondrawing, solid brushes, flush with (completely embedded in) the floor
||Basic test of the cost of non-drawing brushwork
||nondrawing, solid brushes, flush with (completely embedded in) the floor, blocksize 256
||Test the impact of putting the above brushwork into multiple leafs
||nondrawing, solid brushes, partly protruding from the floor
||What happens if the brushes occupy more than one leaf ?
||nondrawing, solid brushes, completely above the floor (inside the structural box)
||What happens if they are completely inside, not overlapping anything else.
||nondrawing, nonsolid brushes, flush with the floor
||as nodraw-solid-flush, but nonsolid.
||solid, drawing brushes, flush with the floor
||Baseline drawing brushes.
||nonsolid, drawing brushes, flush with the floor
||Drawing with nonsolid brushes. Note: uses custom q3map2 feature and shader to ensure brushes are nonsolid.
||ASE misc_model (generated from drawing brushwork above), flush with the floor
||Triangle model generated from above brushwork. Same geometry but no brushes.
||drawing brushwork with brushes discarded, flush with the floor
||Same as draw-solid-flush, but using a modified q3map2 which discards the brushes (and associated resources) after compile.
q3map2 -v -game et -fs_basepath "<basepath>" -meta -mv 1024 -mi 6144 "<mapname>"
q3map2 -game et -fs_basepath "<basepath>" -vis -saveprt "<mapname>"
(note: all but nodraw-solid-flush-b256 are unaffected by vis, since they only have 1 cluster.)
Measuring client and server performance
client performance is measured in frames per second. This obviously depends on the client system, but provides a relative indicator
Server performance is measured as %CPU usage, as reported by top. (note: server CPU load required to run a client
depends directly on the clients frame rate. Thus comparable measurements on different maps, the client must be
capped at the same frame rate.)
(opt = client fps w/b_optimizeprediction)
|test case||FPS||opt||Server CPU @43||@max||.BSP size
|caulk-flush ||116-118||640+* ||9%-11%||100%* ||1MB
|clip-flush ||71-73||600+* ||26%-30%||100%* ||1MB
|clip-split ||48-49||230 ||60%-63%||100%* ||1.1M
|clip-flush-b256 ||291-294||715* ||1%-2%||8%-11% ||1MB
|nonsolid-flush ||79-80||700+* ||16-17%||100%* ||1MB
|draw-solid-flush ||88-89||157-159 ||9-11%||34%-35% ||3.6MB
|model ||200-202||206-207 ||0.7%-2%||1%-3% ||2.8MB
|discard-flush ||194-195||198-199 ||0.7-2%||1%-3% ||2.7MB
(@max = server load with client at max FPS)
(note: Server CPU load goes up significantly while moving. Client FPS goes down with opt, less without.)
(opt* FPS = connection interrupt due to high frame rate)
(@max* Server load = server overloaded, connection interrupt on client)
(note: draw-solid-flush is odd. Numbers double checked. Verify maps)
Even non-drawing brushes have a noticeable, and in some cases significant, impact on runtime
The number of brushes per leaf is a significant factor. Thus, splitting up your BSP even
where it does not contribute to viability calculations may be desirable, if you have lots
of complex brushwork. (note that this isn't free either, of course)
Non-solid brushes have less impact, but still have far more than no brushes at all.
Brushes which exist in multiple leafs cost more than brushes which exist in
only one leaf, both in terms of file size and performance. Note however that
they are cheaper than having unique brushes in each leaf.
In situations where complex drawing geometry is required, but complex
collision geometry is not, significant performance gains and size
reduction can be achieved by omitting the collision geometry.
A method for directly creating drawing geometry in the editor, without
producing collision geometry is useful, especially for 'caulk hull'