Random notes during Thomas Wenisch talk at UIUC

GPU world underutilization problem

Unpopular model sitting there in
Lack of security for multi-tenancy
Lack of fast core-provisioning?
Lack of performance isolation in per-core allocation and
Slow reaction of scaling compared to CPU
Multi-tenancy is not well understood
Model loading is much slower. Either
Network devices are much cheaper so accelerator. So just throw money for network contention problem
Interference in network is not a big issue in training since mostly

Tiered model checkpoint?

Training

In-switch computing

We don’t want fabric to be bottleneck.

Planned OCS angles for training

different topology needs different hardware,