Markov Decision Process (MDP)
A MDP is similar to a MC, but with non-determinism. More information here.
Example
In the following picture the actions are represented by the colors. Hence, by executing action blue in the leftern state, we will, with probability one, observe a and reach the bottom state.
Creation
>>> import jajapy as ja
>>> labelling = ['a','a','a','c','b','d']
>>> transitions = [(0,'red',1,0.5),(0,'red',2,0.5),(0,'blue',2,1.0),
... (1,'blue',3,1.0),(1,'red',4,1.0),(3,'blue',3,1.0),(3,'red',4,1.0),
... (2,'blue',5,1.0),(4,'blue',5,1.0),(5,'blue',5,1.0)]
>>> model = ja.createMDP(transitions,labelling,initial_state=0,name="My MDP")
We can also generate a random MDP
>>> random_model = ja.MDP_random(number_states=6,
random_initial_state=False,
alphabet=['a','b','c','d'],
actions=['red','blue'])
Exploration
>>> model.tau(0,'red',1,'a') # probability of moving from s0 to s1 seeing 'a' after executing 'red'
0.5
>>> model.getAlphabet() # all possible observations
['init','a','b','c','d']
>>> model.getLabel(0) # label on state 0
'a'
>>> model.getActions() # all possible actions
['red','blue']
>>> model.getActions(2) # all actions available in state 2
['blue']
Running
>>> scheduler = ja.UniformScheduler(model.getActions())
>>> model.run(5,scheduler) # returns a list of 5 actions and 5 observations
['init','red','a', 'blue', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd']
>>> s = model.generateSet(10,5,scheduler) # returns a Set containing 10 traces of size 5
>>> s.sequences
[['init','red','a', 'red', 'a', 'red', 'b', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
['init','blue','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'red', 'b', 'blue', 'd'],
['init','blue','a', 'red', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
['init','red','a', 'blue', 'a', 'blue', 'd', 'blue', 'd', 'blue', 'd', 'blue', 'd'],
['init','blue','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'blue', 'c', 'blue', 'c'],
['init','red','a', 'red', 'a', 'blue', 'c', 'blue', 'c', 'blue', 'c', 'red', 'b']]
>>> s.times # first sequence appears 6 times, the second twice, etc...
[3, 1, 1, 2, 2, 1]
Analysis
>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-0.5545177444479562
Saving/Loading
>>> model.save("my_mdp.txt")
>>> the_same_model = ja.loadMDP("my_mdp.txt")
Converting from/to Stormpy
>>> stormpy_sparse_model = model.toStormpy() # the next line is equivalent
>>> stormpy_sparse_model = ja.jajapyModeltoStormpy(model)
>>> same_model == ja.stormpyModeltoJajapy(stormpy_sparse_model)
Converting from/to Prism
>>> model.savePrism("my_mc.sm")
>>> same_model = ja.loadPrism("my_mc.sm")
Model
- class jajapy.MDP(matrix: ndarray, labelling: list, actions: list, name: str = 'unknown_MDP')
Class representing a MDP.
Create a MDP.
Parameters
- matrixndarray
Represents the transition matrix. matrix[s1][act_ID][s2][obs_ID] is the probability of moving from s1 to s2 by executing action of ID act_ID and seeing the observation of ID obs_ID.
- labelling: list of str
A list of N observations (with N the nb of states). If labelling[s] == o then state of ID s is labelled by o. Each state has exactly one label.
- actions: list of str
The list of all possible actions, such that: actions.index(“act”) is the ID of act.
- namestr, optional
Name of the model. Default is “unknow_MDP”
- a(s1: int, s2: int, action: str) float
Returns the probability of moving from state s1 to state s2, just after executing action. Parameters ———- s1 : int
ID of the source state.
- s2int
ID of the destination state.
- action: str
an action.
Returns
- float
Probability of moving from state s1 to state s2.
Example
>>> model.a(0,1) 0.6
- generateSet(set_size: int, param, scheduler: Scheduler, distribution=None, min_size=None) list
Generates a set (training set / test set) containing set_size traces generated under
scheduler.Parameters
- set_size: int
number of traces in the output set.
- param: a list, an int or a float.
the parameter(s) for the distribution. See “distribution”.
- scheduler: Scheduler:
A scheduler used to generated all the traces.
- distribution: str, optional
If
distribution=='geo'then the sequence length will be distributed by a geometric law such that the expected length ismin_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), orparamcan be a list of int. Default is None.- min_size: int, optional
see “distribution”. Default is None.
Returns
- output: list
a set (training set / test set).
- getActions(state: int = -1) list
If state is set, returns the list of all the actions available in state. Otherwise it returns the actions of the model.
Parameters
- stateint, optional
a state ID
Returns
- list of str
list of actions
Example
>>> model.getActions() ['A','B','C','D'] >>> model.getActions(1) ['A','C']
- getAlphabet() list
Returns the alphabet of this model.
Returns
- list of str
The alphabet of this model
Example
>>> model.getAlphabet() ['a','b','c','d','done']
- getLabel(state: int) str
Returns the label of state.
Parameters
- stateint
a state ID
Returns
- str
a label
Example
>>> model.getLabel(2) 'Label-of-state-2'
- logLikelihood(sequences: Set) float
Compute the average loglikelihood of a set.
Parameters
- sequences: Set
A set.
Returns
- output: float
loglikelihood of
sequencesunder this model.
Examples
>>> model.logLikelihood(set1) -4.442498878506513
- next(state: int, action: str) tuple
Return a state-observation pair according to the distributions described by matrix.
Parameters
- state: int
source state ID.
- action: str
An action.
Returns
- output(int, str)
A state-observation pair.
Example
>>> model.next(0,'A') (1,'a') >>> model.getLabel(0) 'a' >>> model.next(0) (1,'a') >>> model.next(0) (2,'a') >>> model.a(0,1,'A') 0.6 >>> model.a(0,2,'A') 0.4 >>> model.next(0,'C') (2,'a') >>> model.a(0,2,'C') 1.0
- pi(s: int) float
Return the probability of starting in state
s.Parameters
- s: int
state ID.
Returns
- outputfloat
the probability of starting in state s.
- run(number_steps: int, scheduler: Scheduler, current: int = -1) list
Simulates a run of length
number_stepsof the model underschedulerand returns the sequence of actions-observations generated.Parameters
- number_steps: int
length of the simulation.
- currentint, optional.
If current it set, it starts from the state current. Otherwise it starts from an initial state.
Returns
- output: list of str
List of alterning state-observation.
- save(file_path: str) None
Save the model into a text file.
Parameters
- file_pathstr
path of the output file.
Examples
>>> model.save("my_model.txt")
- savePrism(file_path: str) None
Save this model into file_path in the Prism format.
Parameters
- file_pathstr
Path of the output file.
- tau(s1: int, action: str, s2: int, obs: str) float
Returns the probability of moving from state
s1executing action tos2generating observationobs.Parameters
- s1: int
source state ID.
- action: str
An action.
- s2: int
destination state ID.
- obs: str
generated observation.
Returns
- float
A probability.
Example
>>> model.tau(0,'A',1,'a') 0.6 >>> model.getLabel(0) 'a' >>> model.tau(0,'A',1,'b') 0.0 >>> model.tau(0,'B',1,'b') 0.0 >>> model.getActions(0) ['A']
Other Functions
- jajapy.createMDP(transitions: list, labelling: list, initial_state, name: str = 'unknown_MDP') MDP
An user-friendly way to create an MDP.
Parameters
- transitions[ list of tuples (int, str, int, float)]
Each tuple represents a transition as follow: (source state ID, action, destination state ID, probability).
- labelling: list of str
A list of N observations (with N the nb of states). If labelling[s] == o then state of ID s is labelled by o. Each state has exactly one label.
- initial_stateint or list of float
Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
- namestr, optional
Name of the model. Default is “unknow_MC”
Returns
- MDP
the MDP describes by transitions, labelling, and initial_state.
Examples
- jajapy.loadMDP(file_path: str) MDP
Load an MDP saved into a text file.
Parameters
- file_pathstr
Location of the text file.
Returns
- outputMDP
The MDP saved in file_path.
- jajapy.MDP_random(nb_states: int, labelling: list, actions: list, random_initial_state: bool = True, deterministic: bool = False, sseed: Optional[int] = None) MDP
Generate a random MDP.
Parameters
- number_statesint
Number of states.
- labellinglist of str
List of observations.
- actionslist of str
List of actions.
- random_initial_state: bool, optional
If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is True.
- deterministic: bool, optional
If True, the model will be determinstic: in state s, with action a, for all label o, there is at most one transition to a state labelled with o. Default is False.
- sseedint, optional
the seed value.
Returns
- MDP
A pseudo-randomly generated MDP.
Scheduler
Scheduler are used to solve the non-deterministic choices while running a MDP. A scheduler will basically chooses the actions to execute at each time step.
There are several kind of scheduler. The easiest one in the uniform one, which, at each time step, chooses an action uniformly at random. The memoryless scheduler will choose the action according to the current state of the MDP (thus it assumes that the MDP states are observable). It basically maps each MDP state to one action. Finally a finite memory scheduler can be seen as a generative automaton that takes on input a sequence of observation and returns a sequence of actions.
- class jajapy.UniformScheduler(actions: list)
Class for an uniform scheduler. An uniform scheduler will always returns each action with an equal probability.
Creates a uniform scheduler for the given actions.
Parameters
- actionslist of str
A list of actions.
- class jajapy.MemorylessScheduler(actions: list)
Class for memoryless scheduler.
Creates a memoryless scheduler.
Parameters
- actionslist of str
list of
nactions, withnthe number of states in the modelactions[n]will always be used in staten.
- class jajapy.FiniteMemoryScheduler(action_matrix: dict, transition_matrix: dict)
Class for finite memoryless scheduler. A finite memory scheduler can be seen as a generative automaton that takes on input a sequence of observation and returns a sequence of actions.
Creates a finite memory scheduler.
Parameters
- action_matrixdict
Describes the probability for each action to be chosen given the current scheduler state as follows:
action_matrix = {scheduler_state_x: [[proba1,proba2,...],[action1,action2,...]], scheduler_state_y: [[proba1,proba2,...],[action1,action2,...]], ...}- transition_matrixdict
Given the a state
s``and an observation ``o, returns to which state the scheduler move to if it receivesowhile in state stransition_matrix = {obs1: [scheduler_state_dest_if_current_state_=_0,scheduler_state_dest_if_current_state_=_1,...], obs2: [scheduler_state_dest_if_current_state_=_0,scheduler_state_dest_if_current_state_=_1,...] ...}
- addObservation(obs: str) None
Updates the scheduler state according to the current state and the given observation
obs.Parameters
- obsstr
An observation.
- getAction() str
Returns the action chosen by the scheduler
Returns
- str
Action chosen by the scheduler.
- getActions(s: Optional[int] = None) list
Returns the list of all the possibles actions (and their probabilities) that the scheduler can choose if it is in state
s. Ifsis not provided it considers the current scheduler state.Parameters
- sint, optional
A scheduler state ID. By default None
Returns
- list
A list containing a list of probabilities and a list of actions: [[proba_action1, proba_action2,…],[action1, action2,…]]
- getProbability(action: str, state: Optional[int] = None) float
Given a scheduler state ID and an action, returns the probability to execute this action in this state.
Parameters
- actionstr
An action.
- stateint, optional
A state ID. If not provided it considers the current state. By default None
Returns
- float
A probability.
- getSequenceStates(seq_obs: list) list
Given a sequence of observations, returns the sequence of states the scheduler goes through.
Parameters
- seq_obslist of str
A sequence of observations.
Returns
- list
A sequence of scheulder states.
- reset() None
Reset the model to its initial state. Should be done before generating a new trace.