bnelearn.environment module

This module contains environments - a collection of players and possibly state histories that is used to control game playing and implements reward allocation to agents.

class bnelearn.environment.AuctionEnvironment(mechanism: Mechanism, agents: Iterable[Bidder], valuation_observation_sampler: ValuationObservationSampler, batch_size=100, n_players=None, strategy_to_player_closure: Optional[Callable[[Strategy], Bidder]] = None, redraw_every_iteration: bool = False)[source]

Bases: Environment

An environment of agents to play against and evaluate strategies.


strategy_to_bidder_closure: A closure (strategy, batch_size) -> Bidder to

transform strategies into a Bidder compatible with the environment

draw_conditionals(conditioned_player: int, conditioned_observation: Tensor, inner_batch_size: Optional[int] = None, device: Optional[str] = None) Tuple[Tensor, Tensor][source]

Draws a conditional valuation / observation profile based on a (vector of) fixed observations for one player.

Total batch size will be conditioned_observation.shape[0] x inner_batch_size


Draws a new valuation and observation profile



side effects:

updates agent’s valuations and observation states

get_allocation(agent, redraw_valuations: bool = False, aggregate: bool = True) Tensor[source]

Returns allocation of a single player against the environment.

get_efficiency(redraw_valuations: bool = False) float[source]

Average percentage that the actual welfare reaches of the maximal possible welfare over a batch.

redraw_valuations (:bool:) whether or not to redraw the valuations of

the agents.

efficiency (:float:) Percentage that the actual welfare reaches of

the maximale possible welfare. Averaged over batch.

get_revenue(redraw_valuations: bool = False) float[source]

Returns the average seller revenue over a batch.

redraw_valuations (bool): whether or not to redraw the valuations of

the agents.


revenue (float): average of seller revenue over a batch of games.

get_reward(agent: Bidder, redraw_valuations: bool = False, aggregate: bool = True, regularize: float = 0.0, return_allocation: bool = False, smooth_market: bool = False, deterministic: bool = False) Tensor[source]

Returns reward of a single player against the environment, and optionally additionally the allocation of that player. Reward is calculated as average utility for each of the batch_size x env_size games


Prepares the interim-stage of a Bayesian game, (e.g. in an Auction, draw bidders’ valuations)

class bnelearn.environment.Environment(agents: Iterable, n_players=2, batch_size=1, strategy_to_player_closure: Optional[Callable] = None, **kwargs)[source]

Bases: ABC

An Environment object ‘manages’ a repeated game, i.e. manages the current players and their models, collects players’ actions, distributes rewards, runs the game itself and allows ‘simulations’ as in ‘how would a mutated player do in the current setting’?

abstract get_reward(agent: Player, **kwargs) Tensor[source]

Return reward for a player playing a certain strategy

get_strategy_action_and_reward(strategy: Strategy, player_position: int, redraw_valuations=False, **strat_to_player_kwargs) Tensor[source]

Returns reward of a given strategy in given environment agent position.

get_strategy_reward(strategy: Strategy, player_position: int, redraw_valuations=False, aggregate_batch=True, regularize: float = 0, smooth_market: bool = False, deterministic: bool = False, **strat_to_player_kwargs) Tensor[source]

Returns reward of a given strategy in given environment agent position.


strategy: the strategy to be evaluated player_position: the player position at which the agent will be evaluated redraw_valuation: whether to redraw valuations (default false) aggregate_batch: whether to aggregate rewards into a single scalar (True),

or return batch_size many rewards (one for each sample). Default True

strat_to_player_kwargs: further arguments needed for agent creation regularize: paramter that penalizes high action values (e.g. if we

get the same utility with different actions, we prefer the lower one). Default value of zero corresponds to no regularization.


True if no agents in the environment


Prepares the interim-stage of a Bayesian game, (e.g. in an Auction, draw bidders’ valuations)

class bnelearn.environment.MatrixGameEnvironment(game: MatrixGame, agents, n_players=2, batch_size=1, strategy_to_player_closure=None, **kwargs)[source]

Bases: Environment

An environment for matrix games.

Important features of matrix games for implementation:

  • not necessarily symmetric, i.e. each player has a fixed position

  • agents strategies do not take any input, the actions only depend on the game itself (no Bayesian Game)

get_reward(agent, **kwargs) tensor[source]

Simulates one batch of the environment and returns the average reward for agent as a scalar tensor.