bnelearn.environment module¶
This module contains environments - a collection of players and possibly state histories that is used to control game playing and implements reward allocation to agents.
- class bnelearn.environment.AuctionEnvironment(mechanism: Mechanism, agents: Iterable[Bidder], valuation_observation_sampler: ValuationObservationSampler, batch_size=100, n_players=None, strategy_to_player_closure: Optional[Callable[[Strategy], Bidder]] = None, redraw_every_iteration: bool = False)[source]¶
Bases:
Environment
An environment of agents to play against and evaluate strategies.
- Args:
… (TODO: document) correlation_structure
- strategy_to_bidder_closure: A closure (strategy, batch_size) -> Bidder to
transform strategies into a Bidder compatible with the environment
- draw_conditionals(conditioned_player: int, conditioned_observation: Tensor, inner_batch_size: Optional[int] = None, device: Optional[str] = None) Tuple[Tensor, Tensor] [source]¶
Draws a conditional valuation / observation profile based on a (vector of) fixed observations for one player.
Total batch size will be conditioned_observation.shape[0] x inner_batch_size
- draw_valuations()[source]¶
Draws a new valuation and observation profile
- returns/yields:
nothing
- side effects:
updates agent’s valuations and observation states
- get_allocation(agent, redraw_valuations: bool = False, aggregate: bool = True) Tensor [source]¶
Returns allocation of a single player against the environment.
- get_efficiency(redraw_valuations: bool = False) float [source]¶
Average percentage that the actual welfare reaches of the maximal possible welfare over a batch.
- Args:
- redraw_valuations (:bool:) whether or not to redraw the valuations of
the agents.
- Returns:
- efficiency (:float:) Percentage that the actual welfare reaches of
the maximale possible welfare. Averaged over batch.
- get_revenue(redraw_valuations: bool = False) float [source]¶
Returns the average seller revenue over a batch.
- Args:
- redraw_valuations (bool): whether or not to redraw the valuations of
the agents.
- Returns:
revenue (float): average of seller revenue over a batch of games.
- get_reward(agent: Bidder, redraw_valuations: bool = False, aggregate: bool = True, regularize: float = 0.0, return_allocation: bool = False, smooth_market: bool = False, deterministic: bool = False) Tensor [source]¶
Returns reward of a single player against the environment, and optionally additionally the allocation of that player. Reward is calculated as average utility for each of the batch_size x env_size games
- class bnelearn.environment.Environment(agents: Iterable, n_players=2, batch_size=1, strategy_to_player_closure: Optional[Callable] = None, **kwargs)[source]¶
Bases:
ABC
An Environment object ‘manages’ a repeated game, i.e. manages the current players and their models, collects players’ actions, distributes rewards, runs the game itself and allows ‘simulations’ as in ‘how would a mutated player do in the current setting’?
- abstract get_reward(agent: Player, **kwargs) Tensor [source]¶
Return reward for a player playing a certain strategy
- get_strategy_action_and_reward(strategy: Strategy, player_position: int, redraw_valuations=False, **strat_to_player_kwargs) Tensor [source]¶
Returns reward of a given strategy in given environment agent position.
- get_strategy_reward(strategy: Strategy, player_position: int, redraw_valuations=False, aggregate_batch=True, regularize: float = 0, smooth_market: bool = False, deterministic: bool = False, **strat_to_player_kwargs) Tensor [source]¶
Returns reward of a given strategy in given environment agent position.
- Args:
strategy: the strategy to be evaluated player_position: the player position at which the agent will be evaluated redraw_valuation: whether to redraw valuations (default false) aggregate_batch: whether to aggregate rewards into a single scalar (True),
or return batch_size many rewards (one for each sample). Default True
strat_to_player_kwargs: further arguments needed for agent creation regularize: paramter that penalizes high action values (e.g. if we
get the same utility with different actions, we prefer the lower one). Default value of zero corresponds to no regularization.
- class bnelearn.environment.MatrixGameEnvironment(game: MatrixGame, agents, n_players=2, batch_size=1, strategy_to_player_closure=None, **kwargs)[source]¶
Bases:
Environment
An environment for matrix games.
Important features of matrix games for implementation:
not necessarily symmetric, i.e. each player has a fixed position
agents strategies do not take any input, the actions only depend on the game itself (no Bayesian Game)