Markov Decision Process (MDP)

A MDP is similar to a MC, but with non-determinism. More information here.

Example

In the following picture the actions are represented by the colors. Hence, by executing action blue in the leftern state, we will, with probability one, observe a and reach the bottom state.

_images/MDP.png

Creation

>>> import jajapy as ja
>>> labelling = ['a','a','a','c','b','d']
>>> transitions = [(0,'red',1,0.5),(0,'red',2,0.5),(0,'blue',2,1.0),
...                (1,'blue',3,1.0),(1,'red',4,1.0),(3,'blue',3,1.0),(3,'red',4,1.0),
...                (2,'blue',5,1.0),(4,'blue',5,1.0),(5,'blue',5,1.0)]
>>> model = ja.createMDP(transitions,labelling,initial_state=0,name="My MDP")

We can also generate a random MDP

>>> random_model = ja.MDP_random(number_states=6,
                                random_initial_state=False,
                                alphabet=['a','b','c','d'],
                                actions=['red','blue'])

Exploration

>>> model.tau(0,'red',1,'a') # probability of moving from s0 to s1 seeing 'a' after executing 'red'
0.5
>>> model.getAlphabet() # all possible observations
['init','a','b','c','d']
>>> model.getLabel(0) # label on state 0
'a'
>>> model.getActions() # all possible actions
['red','blue']
>>> model.getActions(2) # all actions available in state 2
['blue']

Running

>>> scheduler = ja.UniformScheduler(model.getActions())
>>> model.run(5,scheduler) # returns a list of 5 actions and 5 observations
['init','red','a', 'blue', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd']
>>> s = model.generateSet(10,5,scheduler) # returns a Set containing 10 traces of size 5
>>> s.sequences
[['init','red','a', 'red', 'a', 'red', 'b', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
 ['init','blue','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'red', 'b', 'blue', 'd'],
 ['init','blue','a', 'red', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
 ['init','red','a', 'blue', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
 ['init','blue','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'blue', 'c', 'blue', 'c'],
 ['init','red','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'blue', 'c', 'red', 'b']]
>>> s.times # first sequence appears 6 times, the second twice, etc...
[3, 1, 1, 2, 2, 1]

Analysis

>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-0.5545177444479562

Saving/Loading

>>> model.save("my_mdp.txt")
>>> the_same_model = ja.loadMDP("my_mdp.txt")

Converting from/to Stormpy

>>> stormpy_sparse_model = model.toStormpy() # the next line is equivalent
>>> stormpy_sparse_model = ja.jajapyModeltoStormpy(model)
>>> same_model == ja.stormpyModeltoJajapy(stormpy_sparse_model)

Converting from/to Prism

>>> model.savePrism("my_mc.sm")
>>> same_model = ja.loadPrism("my_mc.sm")

Model

class jajapy.MDP(matrix: ndarray, labelling: list, actions: list, name: str = 'unknown_MDP')

Class representing a MDP.

Create a MDP.

Parameters

matrixndarray

Represents the transition matrix. matrix[s1][act_ID][s2][obs_ID] is the probability of moving from s1 to s2 by executing action of ID act_ID and seeing the observation of ID obs_ID.

labelling: list of str

A list of N observations (with N the nb of states). If labelling[s] == o then state of ID s is labelled by o. Each state has exactly one label.

actions: list of str

The list of all possible actions, such that: actions.index(“act”) is the ID of act.

namestr, optional

Name of the model. Default is “unknow_MDP”

a(s1: int, s2: int, action: str) float

Returns the probability of moving from state s1 to state s2, just after executing action. Parameters ———- s1 : int

ID of the source state.

s2int

ID of the destination state.

action: str

an action.

Returns

float

Probability of moving from state s1 to state s2.

Example

>>> model.a(0,1)
0.6
generateSet(set_size: int, param, scheduler: Scheduler, distribution=None, min_size=None) list

Generates a set (training set / test set) containing set_size traces generated under scheduler.

Parameters

set_size: int

number of traces in the output set.

param: a list, an int or a float.

the parameter(s) for the distribution. See “distribution”.

scheduler: Scheduler:

A scheduler used to generated all the traces.

distribution: str, optional

If distribution=='geo' then the sequence length will be distributed by a geometric law such that the expected length is min_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), or param can be a list of int. Default is None.

min_size: int, optional

see “distribution”. Default is None.

Returns

output: list

a set (training set / test set).

getActions(state: int = -1) list

If state is set, returns the list of all the actions available in state. Otherwise it returns the actions of the model.

Parameters

stateint, optional

a state ID

Returns

list of str

list of actions

Example

>>> model.getActions()
['A','B','C','D']
>>> model.getActions(1)
['A','C']
getAlphabet() list

Returns the alphabet of this model.

Returns

list of str

The alphabet of this model

Example

>>> model.getAlphabet()
['a','b','c','d','done']
getLabel(state: int) str

Returns the label of state.

Parameters

stateint

a state ID

Returns

str

a label

Example

>>> model.getLabel(2)
'Label-of-state-2'
logLikelihood(sequences: Set) float

Compute the average loglikelihood of a set.

Parameters

sequences: Set

A set.

Returns

output: float

loglikelihood of sequences under this model.

Examples

>>> model.logLikelihood(set1)
-4.442498878506513
next(state: int, action: str) tuple

Return a state-observation pair according to the distributions described by matrix.

Parameters

state: int

source state ID.

action: str

An action.

Returns

output(int, str)

A state-observation pair.

Example

>>> model.next(0,'A')
(1,'a')
>>> model.getLabel(0)
'a'
>>> model.next(0)
(1,'a')
>>> model.next(0)
(2,'a')
>>> model.a(0,1,'A')
0.6
>>> model.a(0,2,'A')
0.4
>>> model.next(0,'C')
(2,'a')
>>> model.a(0,2,'C')
1.0
pi(s: int) float

Return the probability of starting in state s.

Parameters

s: int

state ID.

Returns

outputfloat

the probability of starting in state s.

rename(name: str) None

Change the name of the model.

Parameters

namestr

new name.

run(number_steps: int, scheduler: Scheduler, current: int = -1) list

Simulates a run of length number_steps of the model under scheduler and returns the sequence of actions-observations generated.

Parameters

number_steps: int

length of the simulation.

currentint, optional.

If current it set, it starts from the state current. Otherwise it starts from an initial state.

Returns

output: list of str

List of alterning state-observation.

save(file_path: str) None

Save the model into a text file.

Parameters

file_pathstr

path of the output file.

Examples

>>> model.save("my_model.txt")
savePrism(file_path: str) None

Save this model into file_path in the Prism format.

Parameters

file_pathstr

Path of the output file.

tau(s1: int, action: str, s2: int, obs: str) float

Returns the probability of moving from state s1 executing action to s2 generating observation obs.

Parameters

s1: int

source state ID.

action: str

An action.

s2: int

destination state ID.

obs: str

generated observation.

Returns

float

A probability.

Example

>>> model.tau(0,'A',1,'a')
0.6
>>> model.getLabel(0)
'a'
>>> model.tau(0,'A',1,'b')
0.0
>>> model.tau(0,'B',1,'b')
0.0
>>> model.getActions(0)
['A']           
toStormpy()

Returns the equivalent stormpy sparse model. The output object will be a stormpy.SparseDtmc.

Returns

stormpy.SparseDtmc

The same model in stormpy format.

updateLabelling(new_labels: list) None

Update the states labelling.

Parameters

new_labels: list

a list of new labels. The length of the list must be equal to the number of states.

Other Functions

jajapy.createMDP(transitions: list, labelling: list, initial_state, name: str = 'unknown_MDP') MDP

An user-friendly way to create an MDP.

Parameters

transitions[ list of tuples (int, str, int, float)]

Each tuple represents a transition as follow: (source state ID, action, destination state ID, probability).

labelling: list of str

A list of N observations (with N the nb of states). If labelling[s] == o then state of ID s is labelled by o. Each state has exactly one label.

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model. Default is “unknow_MC”

Returns

MDP

the MDP describes by transitions, labelling, and initial_state.

Examples

jajapy.loadMDP(file_path: str) MDP

Load an MDP saved into a text file.

Parameters

file_pathstr

Location of the text file.

Returns

outputMDP

The MDP saved in file_path.

jajapy.MDP_random(nb_states: int, labelling: list, actions: list, random_initial_state: bool = True, deterministic: bool = False, sseed: Optional[int] = None) MDP

Generate a random MDP.

Parameters

number_statesint

Number of states.

labellinglist of str

List of observations.

actionslist of str

List of actions.

random_initial_state: bool, optional

If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is True.

deterministic: bool, optional

If True, the model will be determinstic: in state s, with action a, for all label o, there is at most one transition to a state labelled with o. Default is False.

sseedint, optional

the seed value.

Returns

MDP

A pseudo-randomly generated MDP.

Scheduler

Scheduler are used to solve the non-deterministic choices while running a MDP. A scheduler will basically chooses the actions to execute at each time step.

There are several kind of scheduler. The easiest one in the uniform one, which, at each time step, chooses an action uniformly at random. The memoryless scheduler will choose the action according to the current state of the MDP (thus it assumes that the MDP states are observable). It basically maps each MDP state to one action. Finally a finite memory scheduler can be seen as a generative automaton that takes on input a sequence of observation and returns a sequence of actions.

class jajapy.UniformScheduler(actions: list)

Class for an uniform scheduler. An uniform scheduler will always returns each action with an equal probability.

Creates a uniform scheduler for the given actions.

Parameters

actionslist of str

A list of actions.

getAction() str

Returns the action chosen by the scheduler

Returns

str

An action.

class jajapy.MemorylessScheduler(actions: list)

Class for memoryless scheduler.

Creates a memoryless scheduler.

Parameters

actionslist of str

list of n actions, with n the number of states in the model actions[n] will always be used in state n.

getAction(current_state: int) str

Returns the action used while in state current_state

Parameters

current_stateint

ID of the current MDP state.

Returns

str

Action chosen by the scheduler.

class jajapy.FiniteMemoryScheduler(action_matrix: dict, transition_matrix: dict)

Class for finite memoryless scheduler. A finite memory scheduler can be seen as a generative automaton that takes on input a sequence of observation and returns a sequence of actions.

Creates a finite memory scheduler.

Parameters

action_matrixdict

Describes the probability for each action to be chosen given the current scheduler state as follows: action_matrix = {scheduler_state_x: [[proba1,proba2,...],[action1,action2,...]], scheduler_state_y: [[proba1,proba2,...],[action1,action2,...]], ...}

transition_matrixdict

Given the a state s``and an observation ``o, returns to which state the scheduler move to if it receives o while in state s transition_matrix = {obs1: [scheduler_state_dest_if_current_state_=_0,scheduler_state_dest_if_current_state_=_1,...], obs2: [scheduler_state_dest_if_current_state_=_0,scheduler_state_dest_if_current_state_=_1,...] ...}

addObservation(obs: str) None

Updates the scheduler state according to the current state and the given observation obs.

Parameters

obsstr

An observation.

getAction() str

Returns the action chosen by the scheduler

Returns

str

Action chosen by the scheduler.

getActions(s: Optional[int] = None) list

Returns the list of all the possibles actions (and their probabilities) that the scheduler can choose if it is in state s. If s is not provided it considers the current scheduler state.

Parameters

sint, optional

A scheduler state ID. By default None

Returns

list

A list containing a list of probabilities and a list of actions: [[proba_action1, proba_action2,…],[action1, action2,…]]

getProbability(action: str, state: Optional[int] = None) float

Given a scheduler state ID and an action, returns the probability to execute this action in this state.

Parameters

actionstr

An action.

stateint, optional

A state ID. If not provided it considers the current state. By default None

Returns

float

A probability.

getSequenceStates(seq_obs: list) list

Given a sequence of observations, returns the sequence of states the scheduler goes through.

Parameters

seq_obslist of str

A sequence of observations.

Returns

list

A sequence of scheulder states.

reset() None

Reset the model to its initial state. Should be done before generating a new trace.