Explore power laws for attention in transformers. Probably existing idea but need to understand this. If not already present a great experiment to run. See “Power Law in Attention” here