Hidden Markov Model (HMM)

A HMM is a simple deterministic model where the transition functions and the generating functions are independent. In other words, the model first generate an observation and then move to the next state according to two independent probability distributions. More information here.

Example

_images/HMM.png

Creation

>>> import jajapy as ja
>>> transitions = [(0,1,0.5),(0,2,0.5),(1,3,1.0),(2,4,1.0),
>>>                        (3,0,0.8),(3,1,0.1),(3,2,0.1),(4,3,1.0)]
>>> emission = [(0,"x",0.4),(0,"y",0.6),(1,"a",0.8),(1,"b",0.2),
>>>                     (2,"a",0.1),(2,"b",0.9),(3,"x",0.5),(3,"y",0.5),(4,"y",1.0)]
>>> original_model = ja.createHMM(transitions,emission,initial_state=0,name="My HMM")

We can also generate a random HMM

>>> random_model = ja.HMM_random(number_states=5,
                                random_initial_state=False,
                                alphabet=['x','y','a','b'])

Exploration

>>> model.a(0,1) #probability of going from s0 to s1
0.5
>>> model.a(1,3) #probability of going from s1 to s3
1.0
>>> model.b(0,'x') #probability of seeing 'x' while in s0
0.4
>>> model.tau(0,1,'x') #probability of going from s0 to s1 seeing 'x'
0.2
>>> model.getAlphabet() #all possible observations
['x','y','a','b']
>>> model.getAlphabet(0) #all possible observations in s0
['x','y']

Running

>>> model.run(5) # Generate a run of length 5, i.e. returns a list of 5 observations
['y', 'a', 'y', 'a', 'y']
>>> s = model.generateSet(10,5) # returns a Set containing 10 traces of length 5
>>> s.sequences
[['x', 'a', 'x', 'y', 'a'], ['x', 'b', 'y', 'x', 'a'],
 ['y', 'b', 'y', 'a', 'x'], ['y', 'b', 'x', 'y', 'b'],
 ['x', 'b', 'x', 'y', 'a'], ['y', 'b', 'y', 'y', 'x'],
 ['y', 'b', 'y', 'y', 'y'], ['y', 'a', 'y', 'a', 'y'],
 ['y', 'a', 'x', 'a', 'y']]
>>> s.times
[1, 1, 1, 1, 1, 1, 2, 1, 1]
>>> # all the traces appear once in the set, except the 7th which appears twice

Analysis

>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-4.442498878506513

Saving/Loading

>>> model.save("my_hmm.txt")
>>> model2 = ja.loadHMM("my_hmm.txt")

Model

class jajapy.HMM(matrix, output, alphabet, initial_state, name='unknown_HMM')

Creates an HMM.

Parameters

matrixndarray

Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.

outputndarray or list

Represents the output matrix. output[s1][obs] is the probability of seeing alphabet[obs] in state s1.

alphabet: list

The list of all possible alphabet, such that: alphabet.index(“obs”) is the ID of obs.

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model. Default is “unknow_MC”

a(s1: int, s2: int) float

Returns the probability of moving from state s1 to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int

ID of the source state.

s2int

ID of the destination state.

Returns

float

Probability of moving from state s1 to state s2.

Examples

>>> model.a(0,1)
0.5
>>> model.a(0,0)
0.0
b(s: int, l: str) float

Returns the probability of generating l in state s. If s is not a valid state ID it returns 0.

Parameters

sint

ID of the source state.

lstr

observation.

Returns

float

probability of generating l in state s.

Examples

>>> model.b(0,'x')
0.4
>>> model.b(0,'foo')
0.0
generateSet(set_size: int, param, distribution=None, min_size=None, timed: bool = False) Set

Generates a set (training set / test set) containing set_size traces.

Parameters

set_size: int

number of traces in the output set.

param: a list, an int or a float.

the parameter(s) for the distribution. See “distribution”.

distribution: str, optional

If distribution=='geo' then the sequence length will be distributed by a geometric law such that the expected length is min_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), or param can be a list of int. Default is None.

min_size: int, optional

see “distribution”. Default is None.

timed: bool, optional

Only for timed model. Generate timed or non-timed traces. Default is False.

Returns

output: Set

a set (training set / test set).

Examples

>>> set1 = model.generateSet(100,10)
>>> # set1 contains 100 traces of length 10
>>> set2 = model.generate(100, 1/4, "geo", min_size=6)
>>> # set2 contains 100 traces. The length of the traces is distributed following
>>> # a geometric distribution with parameter 1/4. All the traces contains at
>>> # least 6 observations, hence the average length of a trace is 6+(1/4)**(-1) = 10.
getAlphabet(state: int = -1) list

If state is set, returns the list of all the observations we could see in state. Otherwise it returns the alphabet of the model.

Parameters

stateint, optional

a state ID

Returns

list of str

list of observations

logLikelihood(sequences: Set) float

Compute the average loglikelihood of a set.

Parameters

sequences: Set

A set.

Returns

output: float

loglikelihood of sequences under this model.

Examples

>>> model.logLikelihood(set1)
-4.442498878506513
next(state: int) list

Returns a state-observation pair according to the distributions described by self.matrix[s] and self.output[s].

Parameters

stateint

ID of the source state.

Returns

output[int, list of floats]

A state-observation pair.

next_obs(state: int) str

Generates one observation according to the distribution described by self.output_matrix.

Returns

str

An observation.

next_state(state: int) int

Returns one state ID at random according to the distribution described by the self.matrix.

Parameters

stateint

ID of the source state.

Returns

int

A state ID.

pi(s: int) float

Return the probability of starting in state s.

Parameters

s: int

state ID.

Returns

outputfloat

the probability of starting in state s.

rename(name: str) None

Change the name of the model.

Parameters

namestr

new name.

run(number_steps: int, current: int = -1) list

Simulates a run of length number_steps of the model and return the sequence of observations generated.

Parameters

number_steps: int

length of the simulation.

currentint, optional.

If current it set, it starts from the state current. Otherwise it starts from an initial state.

Returns

output: list of str

trace generated by the run.

save(file_path: str)

Save the model into a text file.

Parameters

file_pathstr

path of the output file.

Examples

>>> model.save("my_model.txt")
tau(s1: int, s2: int, obs: str) float

Return the probability of generating from state s1 observation obs and moving to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int

ID of the source state.

s2int

ID of the destination state.

obsstr

An observation.

Returns

float

The probability of generating from state s1 observation obs and moving to state s.

Examples

>>> model.tau(0,1,'x')
0.2

Other Functions

jajapy.createHMM(transitions: list, emission: list, initial_state, name: str = 'unknown_HMM') HMM

An user-friendly way to create a HMM.

Parameters

transitions[ list of tuples (int, int, float)]

Each tuple represents a transition as follow: (source state ID, destination state ID, probability).

emission[ list of tuples (int, str, float)]

Each tuple represents an emission probability as follow: (source state ID, emitted label, probability).

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model. Default is “unknow_HMM”

Returns

HMM

the HMM describes by transitions, emission, and initial_state.

Examples

>>> model = createHMM([(0,1,1.0),(1,0,0.6),(1,1,0.4)],[(0,'a',0.8),(0,'b',0.2),(1,'b',1.0)],0,"My_HMM")
>>> print(model)
Name: My_HMM
Initial state: s0
----STATE s0----
s0 -> s1 : 1.0
************
s0 => a : 0.8
s0 => b : 0.2   
----STATE s1----
s1 -> s0 : 0.6
s1 -> s1 : 0.4
************
s1 => b : 1.0
jajapy.loadHMM(file_path: str) HMM

Load an HMM saved into a text file.

Parameters

file_pathstr

Location of the text file.

Returns

outputHMM

The HMM saved in file_path.

jajapy.HMM_random(nb_states: int, alphabet: list, random_initial_state: bool = False, sseed: Optional[int] = None) HMM

Generates a random HMM.

Parameters

nb_statesint

Number of states.

alphabetlist of str

List of observations.

random_initial_state: bool, optional

If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is False.

sseedint, optional

the seed value.

Returns

HMM

A pseudo-randomly generated HMM.

Examples

>>> m = ja.HMM_random(4,['a','b','x','y'])