Using those transistors ...
Wider datapaths and circuit elements (from 4 to 8 to 16 to 32 to 64 bits)
Caches
Additional functionality on the die (MMU, floating point, L2 cache controller)
Deeper pipelining (multiple instructions in progress at a time)
Superscalar architectures (instruction-level parallelism -- multiple instruction streams)
[Some Intel examples]
The result: increased performance, decreased cost
Back ...