Intuitively, a mountain pick view in RL should be a parameter point in which one iteration of the algorithm produces a result similar to several steps of the algorithm in points (parameter space) at greater distance from the foot of the pick. So the mountain view depends of the algorithm and encoding used. The height of the mountain should be a quotient between velocity of convergence of the algorithm at the pick (parameter space) and in points (parameter space) at distance d. So that velocity of convergence to goal increases when the distance tends to zero. That is the mountain pick projection in the parameter space is a stable (not thin) source of knowlege. One could say that this is similar to finding a dominating strategy or using a branching cutting algorithm.
Some ideas: Edited, RL creating dominating strategies: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188046
https://openreview.net/pdf?id=ryl1r1BYDS