Pong is one of the earliest arcade video games. It is a table tennis sports game featuring simple two-dimensional graphics. The game was originally manufactured by Atari, which released it in 1972. Allan Alcorn created Pong as a training exercise assigned to him by Atari co-founder Nolan Bushnell. Bushnell based the idea on an electronic ping-pong game included in the Magnavox Odyssey, which later resulted in a lawsuit against Atari. Surprised by the quality of Alcorn's work, Bushnell and Atari co-founder Ted Dabney decided to manufacture the game.

Pong is one of the earliest arcade video games. It is a table tennis sports game featuring simple two-dimensional graphics. The game was originally manufactured by Atari, which released it in 1972. Allan Alcorn created Pong as a training exercise assigned to him by Atari co-founder Nolan Bushnell. Bushnell based the idea on an electronic ping-pong game included in the Magnavox Odyssey, which later resulted in a lawsuit against Atari. Surprised by the quality of Alcorn's work, Bushnell and Atari co-founder Ted Dabney decided to manufacture the game.

Reinforcement learning

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming.

Let's Play Pong!



We noticed that computers can now automatically learn to play ATARI games, they are beating world champions at Go, simulated quadrupeds are learning to run and leap, and robots are learning how to perform complex manipulation tasks that defy explicit programming. In our project, we aim at applying deep reinforcement learning to train a model that could master the game Pong from Pixels based on previous studies and change the parameters to get the best training model in the shortest time.

Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed returns they produce. Like humans, reinforcement learning algorithms sometimes have to wait a while to see the fruit of their decisions. They operate in a delayed return environment, where it can be difficult to understand which action leads to which outcome over many time steps. Reinforcement learning algorithms can be expected to perform better and better in more ambiguous, real-life environments while choosing from an arbitrary number of possible actions, rather than from the limited options of a video game. That is, with time we expect them to be valuable to achieve goals in the real world.



In reinforcement learning framework, the agent has an observation about environment. Then, according to inner mechanics, the agent will take one action for this observation. The action of the agent will change the environment, and the new environment will feedback a reward and a new observation to the agent. You can see Figure 1 for it.

In this project, we will use policy gradient method which can directly give an action according to the observation(Murphy). A pretty popular solution is Actor-Critic framework. In this framework, there will be one policy function we can denote it as p(a|s), and one critic function we can denote it as Q(a, s) which can evaluate the value of certain action in specific state. We use deep neural network to approximate p(a|s) and the output of this function is a probability to adopt this action. As for Q(a, s), we will just consider the reward. We denote

in which a means one action. Si is an observation. γ is a discount factor and rm is the reward which earned in position m of the series whose length is i + k. The cost function we will use is

Implementation method

Andrej train the agent to beat the computer by building a Neural Network that takes in each image and outputs a command to AI to move up or down.

Our Neural Network, based heavily on Andrej’s solution, will do the following:

  1. Take in images from the game and preprocess them (remove color, background, downsample etc.).
  2. Use the Neural Network to compute a probability of moving up.
  3. Sample from that probability distribution and tell the agent to move up or down.
  4. If the round is over (you missed the ball or the opponent missed the ball), find whether you won or lost.
  5. When the episode has finished(someone got to 21 points), pass the result through the backpropagation algorithm to compute the gradient for our weights.
  6. After 10 episodes have finished, sum up the gradient and move the weights in the direction of the gradient.
  7. Repeat this process until our weights are tuned to the point where we can beat the computer.

Existing Resources

OpenAI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms. It makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. The gym library is a collection of test problems — environments — that you can use to work out your reinforcement learning algorithms. These environments have a shared interface, allowing you to write general algorithms.


