Gaussian observations Hidden Markov Model (GoHMM)

A GoHMM is similar to a HMM, but this time the observations are vectors of real number generated according to sevral Gaussian distributions. Instead of having a discrete probability ditribution over all the possible observations, each state has several pairs of parameters mu and sigma, one for each Gaussian distribution. THe number of Gaussian distribution is the same in each state.

Example

The following example is a GoHMM with only one Gaussian distribution in each state. The dotted arrows represent the initial state probability distribution: we will start in the yellow state with probability 0.1.

_images/GOHMM.png

Creation

We can create a GoHMM with two Gaussian distributions as follow:

>>> import jajapy as ja
>>> from numpy import array
>>> nb_states = 5
>>> s0 = GoHMM_state(list(zip([0,1],[0.9,0.1])),[3.0,5.0],nb_states)
>>> s1 = GoHMM_state(list(zip([0,1,2,4],[0.05,0.9,0.04,0.01])),[0.5,1.5],nb_states)
>>> s2 = GoHMM_state(list(zip([1,2,3,4],[0.05,0.8,0.14,0.01])),[0.2,0.7],nb_states)
>>> s3 = GoHMM_state(list(zip([2,3],[0.05,0.95])),[0.0,0.3],nb_states)
>>> s4 = GoHMM_state(list(zip([1,4],[0.1,0.9])),[2.0,4.0],nb_states)
>>> matrix = array([s0[0],s1[0],s2[0],s3[0],s4[0]])
>>> output = array([s0[1],s1[1],s2[1],s3[1],s4[1]])
>>> model =  GoHMM(matrix,output,[0.1,0.7,0.0,0.0,0.2],"My GoHMM")
>>> #print(model)

We can also generate a random GoHMM

>>> random_model = GoHMM_random(nb_states=5,
                                nb_distributions=2,
                                random_initial_state=True,
                                min_mu = 0.0,
                                max_mu = 5.0,
                                min_sigma = 0.5,
                                max_sigma = 5.0)

Exploration

>>> model.a(0,0)# moving from state 0 to state 0
0.9
>>> model.mu(1)# mu parameters in state 1
[0.5,2.5]
>>> model.mu_n(1,0)# first mu parameter in state 1
0.5
>>> model.mu_n(1,1)# second mu parameter in state 1
2.5
>>> # probability that the first distribution in state 0 generates 4.0.
>>> model.b_n(s=0,n=0,l=4.0)
0.07820853879509118
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0
>>> model.b(0,[4.0,2.0])
0.019676393869096122
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0, and that we move from state 0
>>> # to state 1.
>>> model.tau(0,1,[4.0,2.0])
0.001967639386909612

Running

>>> model.run(5) # returns a list of 5 observations
[-0.6265, 0.3031, -0.5885, 5.4135, 3.45367]
>>> s = model.generateSet(10,5) # returns a Set containing 10 traces of size 5
>>> s.sequences
[[-0.0069, 1.7705, -1.7140, 1.8141, -0.81803],
 [1.9338, 12.4325, -1.0763, 1.8086, 1.4970],
 [1.7609, 2.8232, 0.0197, -0.3019, 2.3554],
 [0.5744, -0.9050, 0.3306, -0.3162, 2.7524],
 [-1.4897, 1.7604, -0.2746, 0.3566, -0.3647],
 [-3.6607, 0.9767, 0.3046, -0.3125, -0.0091],
 [-1.6521, -0.2060, 0.0392, -0.1600, -0.5134],
 [1.2431, 1.0243, 0.4519, -0.6647, -0.6829],
 [-0.1490, 2.2450, 9.5885, 10.1277, 2.0458],
 [1.8807, 0.8840, -1.2561, 2.0877, -0.5899]]

>>> s.times
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Analysis

>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-10.183743400307154

Saving/Loading

>>> model.save("my_gohmm.txt")
>>> another_model = ja.loadGoHMM("my_gohmm.txt")

Model

class jajapy.GoHMM(matrix, output, initial_state, name='unknown_GoHMM')

Creates a GoHMM.

Parameters

matrixndarray

Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.

outputndarray

Contains the parameters of the guassian distributions. output[s1][0][0] is the mu parameter of the first distribution in s1, output[s1][0][1] is the sigma parameter of the first distribution in s1. output[s1][1][0] is the mu parameter of the second distribution in s1. etc…

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model. Default is “unknown_GoHMM”

Creates an abstract model for HMM and GoHMM.

Parameters

matrixndarray

Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model.

a(s1: int, s2: int) float

Returns the probability of moving from state s1 to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int

ID of the source state.

s2int

ID of the destination state.

Returns

float

Probability of moving from state s1 to state s2.

Examples

>>> model.a(0,1)
0.5
>>> model.a(0,0)
0.0
b(s: int, l: list) float

Returns the likelihood of generating l in state s.

Parameters

sint

ID of the source state.

llist of float

list of observations.

Returns

outputfloat

Likelihood of generating l in state s.

b_n(s: int, n: int, l: float) float

Returns the likelihood of generating, from the nth distribution of s, observation l.

Parameters

sint

state ID

nint

Index of the distribution.

lstr

The observation.

Returns

outputfloat

Likelihood of generating, from the n`th distribution of this state, observation `l.

generateSet(set_size: int, param, distribution=None, min_size=None, timed: bool = False) Set

Generates a set (training set / test set) containing set_size traces.

Parameters

set_size: int

number of traces in the output set.

param: a list, an int or a float.

the parameter(s) for the distribution. See “distribution”.

distribution: str, optional

If distribution=='geo' then the sequence length will be distributed by a geometric law such that the expected length is min_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), or param can be a list of int. Default is None.

min_size: int, optional

see “distribution”. Default is None.

timed: bool, optional

Only for timed model. Generate timed or non-timed traces. Default is False.

Returns

output: Set

a set (training set / test set).

Examples

>>> set1 = model.generateSet(100,10)
>>> # set1 contains 100 traces of length 10
>>> set2 = model.generate(100, 1/4, "geo", min_size=6)
>>> # set2 contains 100 traces. The length of the traces is distributed following
>>> # a geometric distribution with parameter 1/4. All the traces contains at
>>> # least 6 observations, hence the average length of a trace is 6+(1/4)**(-1) = 10.
logLikelihood(sequences: Set) float

Compute the average loglikelihood of a set.

Parameters

sequences: Set

A set.

Returns

output: float

loglikelihood of sequences under this model.

Examples

>>> model.logLikelihood(set1)
-4.442498878506513
mu(s: int) ndarray

Returns the mu parameters for this state.

Parameters

sint

ID of the source state.

Returns

ndarray

the mu parameters.

mu_n(s: int, n: int) float

Returns the mu parameters of the n`th distribution in `s.

Parameters

sint

ID of the source state.

nint

Index of the distribution.

Returns

float

the mu parameter of the n`th distribution in `s.

next(state: int) list

Returns a state-observation pair according to the distributions described by self.matrix[s] and self.output[s].

Parameters

stateint

ID of the source state.

Returns

output[int, list of floats]

A state-observation pair.

next_obs(s: int) list

Generates n observations according to the n normal distributions in s.

Parameters

sint

ID of the source state.

Returns

outputstr

An observation.

next_state(state: int) int

Returns one state ID at random according to the distribution described by the self.matrix.

Parameters

stateint

ID of the source state.

Returns

int

A state ID.

pi(s: int) float

Return the probability of starting in state s.

Parameters

s: int

state ID.

Returns

outputfloat

the probability of starting in state s.

rename(name: str) None

Change the name of the model.

Parameters

namestr

new name.

run(number_steps: int, current: int = -1) list

Simulates a run of length number_steps of the model and return the sequence of observations generated.

Parameters

number_steps: int

length of the simulation.

currentint, optional.

If current it set, it starts from the state current. Otherwise it starts from an initial state.

Returns

output: list of str

trace generated by the run.

save(file_path: str)

Save the model into a text file.

Parameters

file_pathstr

path of the output file.

Examples

>>> model.save("my_model.txt")
tau(s1: int, s2: int, obs: list) float

Returns the likelihood of generating, from s1, observation obs while moving to state s2.

Parameters

s1int

A state ID.

s2int

A state ID.

obslist of floats

An observation.

Returns

outputfloat

The likelihood of generating, from from s1, observation obs while moving to state s2.

Other Functions

jajapy.createGoHMM(transitions: list, output: list, initial_state: str, name: str = 'unknown_GoHMM') GoHMM

An user-friendly way to create a GoHMM.

Parameters

transitions[ list of tuples (int, int, float)]

Each tuple represents a transition as follow: (source state ID, destination state ID, probability).

outputlist of list of tuples (float, float)]

Represents the parameters of the gaussian distributions [(mu1, sigma1),(mu2, sigma2),…]. output[0] contains the parameters of the distributions in state 0 output[0][0] contains the 2 parameters of the first distribution in state 0. output[0][0][0] is the mu parameter of the first distribution in state 0, and output[0][0][1] is the sigma parameter of the first distribution in state 0.

initial_stateint or list of float

Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).

namestr, optional

Name of the model. Default is “unknow_GoHMM”.

Returns

GoHMM

the GoHMM describes by transitions, emission, and initial_state.

Examples

jajapy.loadGoHMM(file_path: str) GoHMM

Load an GoHMM saved into a text file.

Parameters

file_pathstr

Location of the text file.

Returns

outputGoHMM

The GoHMM saved in file_path.

jajapy.GoHMM_random(nb_states: int, nb_distributions: int, random_initial_state: bool = False, min_mu: float = 0.0, max_mu: float = 2.0, min_sigma: float = 0.5, max_sigma: float = 2.0, sseed: Optional[int] = None) GoHMM

Generates a random GoHMM.

Parameters

nu_statesint

Number of states.

nb_distributionsint

Number of distributions in each state.

alphabetlist of str

List of observations.

random_initial_state: bool, optional

If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is False.

min_mufloat, optional

lower bound for mu. By default 0.0

max_mufloat, optional

upper bound for mu. By default 2.0

min_sigmafloat, optional

lower bound for sigma. By default 0.5

max_sigmafloat, optional

upper bound for sigma. By default 2.0

sseedint, optional

the seed value.

Returns

GoHMM

A pseudo-randomly generated GoHMM.