Questions for the seminar paper "Learning to Model the World With Language".
-----------------------------------------------------------------------------------------------
Please send your answers to: galessos@cs.uni-freiburg.de

1) What is the single main difference between Dynalang and the baselines considered in the paper for the HomeGrid benchmark (IMPALA, R2D2)? (1 sentence)
2) How do the authors evaluate the grounding capabilities of the agents? What is the peculiarity of the proposed tool? (2-3 sentences)
3) The multimodal encoder receives as input the multimodal observations, but also the current state of the world model (h_t). Why do you think this is the case? Do you think this is a necessary choice? (2-3 sentences)