Delayed reward

The  algorithm backing Google’s AlphaGo which beat all top human “Go” players is called reinforcement learning.  There is a concept called “delayed reward”. This idea is very interesting and shares some analogy with human’s intelligence handling process.

Delayed reward asks the agent/action to think about the objective in a little longer (at least not the next step) term, not instantaneous. Thinking about today’s social media,  which is a a quite contradictory. The social media is a great way to communicate and gather information. At the same time, it puts a lot pressure for a quick response and a fast judgement. There is no much time thinking of delayed , but more accurate response in some cases even before all the facts are gathered or the truth is known.

For a normal human who always looks for a short term, instantaneous reward will have difficulty handing failures, barriers and hardships. It takes courage and persistence to prevail under hardship since reward probably is not at the near horizon for a long time. We need always stay optimistic and put our hope and faith high.

In the algorithm, it is always consistently monitoring the output, computing the reward, thinking of the next action. Constant feedback into the system will help the system to gather information and improve the decision making process.

alphaGo

Advertisements