[D] CS231n: "any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data."

I've done a fair amount of thinking but can't figure this out. Could anyone explain this paragraph from Stanford's CS231n? Specifically, why not compute it on all the data and then split?