Hidden Markov Model (HMM)

A HMM is a simple deterministic model where the transition functions and the generating functions are independent. In other words, the model first generate an observation and then move to the next state according to two independent probability distributions. More information here.

Example

Creation

>>> import jajapy as ja
>>> transitions = [(0,1,0.5),(0,2,0.5),(1,3,1.0),(2,4,1.0),
>>>                        (3,0,0.8),(3,1,0.1),(3,2,0.1),(4,3,1.0)]
>>> emission = [(0,"x",0.4),(0,"y",0.6),(1,"a",0.8),(1,"b",0.2),
>>>                     (2,"a",0.1),(2,"b",0.9),(3,"x",0.5),(3,"y",0.5),(4,"y",1.0)]
>>> original_model = ja.createHMM(transitions,emission,initial_state=0,name="My HMM")

We can also generate a random HMM

>>> random_model = ja.HMM_random(number_states=5,
                                random_initial_state=False,
                                alphabet=['x','y','a','b'])

Exploration

>>> model.a(0,1) #probability of going from s0 to s1
0.5
>>> model.a(1,3) #probability of going from s1 to s3
1.0
>>> model.b(0,'x') #probability of seeing 'x' while in s0
0.4
>>> model.tau(0,1,'x') #probability of going from s0 to s1 seeing 'x'
0.2
>>> model.getAlphabet() #all possible observations
['x','y','a','b']
>>> model.getAlphabet(0) #all possible observations in s0
['x','y']

Running

>>> model.run(5) # Generate a run of length 5, i.e. returns a list of 5 observations
['y', 'a', 'y', 'a', 'y']
>>> s = model.generateSet(10,5) # returns a Set containing 10 traces of length 5
>>> s.sequences
[['x', 'a', 'x', 'y', 'a'], ['x', 'b', 'y', 'x', 'a'],
 ['y', 'b', 'y', 'a', 'x'], ['y', 'b', 'x', 'y', 'b'],
 ['x', 'b', 'x', 'y', 'a'], ['y', 'b', 'y', 'y', 'x'],
 ['y', 'b', 'y', 'y', 'y'], ['y', 'a', 'y', 'a', 'y'],
 ['y', 'a', 'x', 'a', 'y']]
>>> s.times
[1, 1, 1, 1, 1, 1, 2, 1, 1]
>>> # all the traces appear once in the set, except the 7th which appears twice

Analysis

>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-4.442498878506513

Saving/Loading

>>> model.save("my_hmm.txt")
>>> model2 = ja.loadHMM("my_hmm.txt")

Model

class jajapy.HMM(matrix, output, alphabet, initial_state, name='unknown_HMM')

Creates an HMM.

Parameters

matrixndarray: Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.
outputndarray or list: Represents the output matrix. output[s1][obs] is the probability of seeing alphabet[obs] in state s1.
alphabet: list: The list of all possible alphabet, such that: alphabet.index(“obs”) is the ID of obs.
initial_stateint or list of float: Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
namestr, optional: Name of the model. Default is “unknow_MC”

a(s1: int, s2: int) → float

Returns the probability of moving from state s1 to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int: ID of the source state.
s2int: ID of the destination state.

Returns

float: Probability of moving from state s1 to state s2.

Examples

>>> model.a(0,1)
0.5
>>> model.a(0,0)
0.0

b(s: int, l: str) → float

Returns the probability of generating l in state s. If s is not a valid state ID it returns 0.

Parameters

sint: ID of the source state.
lstr: observation.

Returns

float: probability of generating l in state s.

Examples

>>> model.b(0,'x')
0.4
>>> model.b(0,'foo')
0.0

generateSet(set_size: int, param, distribution=None, min_size=None, timed: bool = False) → Set

Generates a set (training set / test set) containing set_size traces.

Parameters

set_size: int: number of traces in the output set.
param: a list, an int or a float.: the parameter(s) for the distribution. See “distribution”.
distribution: str, optional: If distribution=='geo' then the sequence length will be distributed by a geometric law such that the expected length is min_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), or param can be a list of int. Default is None.
min_size: int, optional: see “distribution”. Default is None.
timed: bool, optional: Only for timed model. Generate timed or non-timed traces. Default is False.

Returns

output: Set: a set (training set / test set).

Examples

>>> set1 = model.generateSet(100,10)
>>> # set1 contains 100 traces of length 10
>>> set2 = model.generate(100, 1/4, "geo", min_size=6)
>>> # set2 contains 100 traces. The length of the traces is distributed following
>>> # a geometric distribution with parameter 1/4. All the traces contains at
>>> # least 6 observations, hence the average length of a trace is 6+(1/4)**(-1) = 10.

getAlphabet(state: int = -1) → list

If state is set, returns the list of all the observations we could see in state. Otherwise it returns the alphabet of the model.

Parameters

stateint, optional: a state ID

Returns

list of str: list of observations

logLikelihood(sequences: Set) → float

Compute the average loglikelihood of a set.

Parameters

sequences: Set: A set.

Returns

output: float: loglikelihood of sequences under this model.

Examples

>>> model.logLikelihood(set1)
-4.442498878506513

next(state: int) → list

Returns a state-observation pair according to the distributions described by self.matrix[s] and self.output[s].

Parameters

stateint: ID of the source state.

Returns

output[int, list of floats]: A state-observation pair.

next_obs(state: int) → str

Generates one observation according to the distribution described by self.output_matrix.

Returns

str: An observation.

next_state(state: int) → int

Returns one state ID at random according to the distribution described by the self.matrix.

Parameters

stateint: ID of the source state.

Returns

int: A state ID.

pi(s: int) → float

Return the probability of starting in state s.

Parameters

s: int: state ID.

Returns

outputfloat: the probability of starting in state s.

rename(name: str) → None

Change the name of the model.

Parameters

namestr: new name.

run(number_steps: int, current: int = -1) → list

Simulates a run of length number_steps of the model and return the sequence of observations generated.

Parameters

number_steps: int: length of the simulation.
currentint, optional.: If current it set, it starts from the state current. Otherwise it starts from an initial state.

Returns

output: list of str: trace generated by the run.

save(file_path: str)

Save the model into a text file.

Parameters

file_pathstr: path of the output file.

Examples

>>> model.save("my_model.txt")

tau(s1: int, s2: int, obs: str) → float

Return the probability of generating from state s1 observation obs and moving to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int: ID of the source state.
s2int: ID of the destination state.
obsstr: An observation.

Returns

float: The probability of generating from state s1 observation obs and moving to state s.

Examples

>>> model.tau(0,1,'x')
0.2

Other Functions

jajapy.createHMM(transitions: list, emission: list, initial_state, name: str = 'unknown_HMM') → HMM

An user-friendly way to create a HMM.

Parameters

transitions[ list of tuples (int, int, float)]: Each tuple represents a transition as follow: (source state ID, destination state ID, probability).
emission[ list of tuples (int, str, float)]: Each tuple represents an emission probability as follow: (source state ID, emitted label, probability).
initial_stateint or list of float: Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
namestr, optional: Name of the model. Default is “unknow_HMM”

Returns

HMM: the HMM describes by transitions, emission, and initial_state.

Examples

>>> model = createHMM([(0,1,1.0),(1,0,0.6),(1,1,0.4)],[(0,'a',0.8),(0,'b',0.2),(1,'b',1.0)],0,"My_HMM")
>>> print(model)
Name: My_HMM
Initial state: s0
----STATE s0----
s0 -> s1 : 1.0
************
s0 => a : 0.8
s0 => b : 0.2   
----STATE s1----
s1 -> s0 : 0.6
s1 -> s1 : 0.4
************
s1 => b : 1.0

jajapy.loadHMM(file_path: str) → HMM

Load an HMM saved into a text file.

Parameters

file_pathstr: Location of the text file.

Returns

outputHMM: The HMM saved in file_path.

jajapy.HMM_random(nb_states: int, alphabet: list, random_initial_state: bool = False, sseed: Optional[int] = None) → HMM

Generates a random HMM.

Parameters

nb_statesint: Number of states.
alphabetlist of str: List of observations.
random_initial_state: bool, optional: If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is False.
sseedint, optional: the seed value.

Returns

HMM: A pseudo-randomly generated HMM.

Examples

>>> m = ja.HMM_random(4,['a','b','x','y'])