bnelearn.environment module¶

This module contains environments - a collection of players and possibly state histories that is used to control game playing and implements reward allocation to agents.

class bnelearn.environment.AuctionEnvironment(mechanism: Mechanism, agents: Iterable[Bidder], valuation_observation_sampler: ValuationObservationSampler, batch_size=100, n_players=None, strategy_to_player_closure: Optional[Callable[[Strategy], Bidder]] = None, redraw_every_iteration: bool = False)[source]¶

Bases: Environment

An environment of agents to play against and evaluate strategies.

Args:

… (TODO: document) correlation_structure

strategy_to_bidder_closure: A closure (strategy, batch_size) -> Bidder to: transform strategies into a Bidder compatible with the environment

draw_conditionals(conditioned_player: int, conditioned_observation: Tensor, inner_batch_size: Optional[int] = None, device: Optional[str] = None) → Tuple[Tensor, Tensor][source]¶

Draws a conditional valuation / observation profile based on a (vector of) fixed observations for one player.

Total batch size will be conditioned_observation.shape[0] x inner_batch_size

draw_valuations()[source]¶

Draws a new valuation and observation profile

returns/yields:: nothing
side effects:: updates agent’s valuations and observation states

get_allocation(agent, redraw_valuations: bool = False, aggregate: bool = True) → Tensor[source]¶: Returns allocation of a single player against the environment.

get_efficiency(redraw_valuations: bool = False) → float[source]¶

Average percentage that the actual welfare reaches of the maximal possible welfare over a batch.

Args:

redraw_valuations (:bool:) whether or not to redraw the valuations of: the agents.

Returns:

efficiency (:float:) Percentage that the actual welfare reaches of: the maximale possible welfare. Averaged over batch.

get_revenue(redraw_valuations: bool = False) → float[source]¶

Returns the average seller revenue over a batch.

Args:

redraw_valuations (bool): whether or not to redraw the valuations of: the agents.

Returns:

revenue (float): average of seller revenue over a batch of games.

get_reward(agent: Bidder, redraw_valuations: bool = False, aggregate: bool = True, regularize: float = 0.0, return_allocation: bool = False, smooth_market: bool = False, deterministic: bool = False) → Tensor[source]¶: Returns reward of a single player against the environment, and optionally additionally the allocation of that player. Reward is calculated as average utility for each of the batch_size x env_size games

prepare_iteration()[source]¶: Prepares the interim-stage of a Bayesian game, (e.g. in an Auction, draw bidders’ valuations)

class bnelearn.environment.Environment(agents: Iterable, n_players=2, batch_size=1, strategy_to_player_closure: Optional[Callable] = None, **kwargs)[source]¶

Bases: ABC

An Environment object ‘manages’ a repeated game, i.e. manages the current players and their models, collects players’ actions, distributes rewards, runs the game itself and allows ‘simulations’ as in ‘how would a mutated player do in the current setting’?

abstract get_reward(agent: Player, **kwargs) → Tensor[source]¶: Return reward for a player playing a certain strategy

get_strategy_action_and_reward(strategy: Strategy, player_position: int, redraw_valuations=False, **strat_to_player_kwargs) → Tensor[source]¶: Returns reward of a given strategy in given environment agent position.

get_strategy_reward(strategy: Strategy, player_position: int, redraw_valuations=False, aggregate_batch=True, regularize: float = 0, smooth_market: bool = False, deterministic: bool = False, **strat_to_player_kwargs) → Tensor[source]¶

Returns reward of a given strategy in given environment agent position.

Args:

strategy: the strategy to be evaluated player_position: the player position at which the agent will be evaluated redraw_valuation: whether to redraw valuations (default false) aggregate_batch: whether to aggregate rewards into a single scalar (True),

or return batch_size many rewards (one for each sample). Default True

strat_to_player_kwargs: further arguments needed for agent creation regularize: paramter that penalizes high action values (e.g. if we

get the same utility with different actions, we prefer the lower one). Default value of zero corresponds to no regularization.

is_empty()[source]¶: True if no agents in the environment

prepare_iteration()[source]¶: Prepares the interim-stage of a Bayesian game, (e.g. in an Auction, draw bidders’ valuations)

class bnelearn.environment.MatrixGameEnvironment(game: MatrixGame, agents, n_players=2, batch_size=1, strategy_to_player_closure=None, **kwargs)[source]¶

Bases: Environment

An environment for matrix games.

Important features of matrix games for implementation:

not necessarily symmetric, i.e. each player has a fixed position
agents strategies do not take any input, the actions only depend on the game itself (no Bayesian Game)

get_reward(agent, **kwargs) → tensor[source]¶: Simulates one batch of the environment and returns the average reward for agent as a scalar tensor.