People increasingly interact with intelligent agents. Some of these agents, e.g. social robots, need to adapt and personalize to the person they interact with. To achieve this, the agent should be able to receive feedback about its actions and learn the users’ preferences. However, people do not always provide such feedback. A possible approach would be for the agent to ask the user for feedback for each action it executes, but this is not desirable since it might irritate the user. Therefore, the agent needs to decide when to ask for feedback, taking into consideration the costs and benefits of doing so. As an attempt to solve this problem, we formalize the problem of interactive Multi-Armed Bandits with dynamic query cost, layout key challenges, and analyze possible solutions of existing methods to solve this problem. The project is done in collaboration with Intuition Robotics.