Beta Phase: Square45 is currently in beta testing. Expect some features or content to be incomplete or missing.
45

Q-Function

Q(s, a) represents the expected cumulative reward starting from state s, taking action a, and following a specific policy thereafter.