REINFORCEMENT LEARNING BY HUMAN FEEDBACK