Gaussian observations Hidden Markov Model (GoHMM)

A GoHMM is similar to a HMM, but this time the observations are vectors of real number generated according to sevral Gaussian distributions. Instead of having a discrete probability ditribution over all the possible observations, each state has several pairs of parameters mu and sigma, one for each Gaussian distribution. THe number of Gaussian distribution is the same in each state.

Example

The following example is a GoHMM with only one Gaussian distribution in each state. The dotted arrows represent the initial state probability distribution: we will start in the yellow state with probability 0.1.

Creation

We can create a GoHMM with two Gaussian distributions as follow:

>>> import jajapy as ja
>>> from numpy import array
>>> nb_states = 5
>>> s0 = GoHMM_state(list(zip([0,1],[0.9,0.1])),[3.0,5.0],nb_states)
>>> s1 = GoHMM_state(list(zip([0,1,2,4],[0.05,0.9,0.04,0.01])),[0.5,1.5],nb_states)
>>> s2 = GoHMM_state(list(zip([1,2,3,4],[0.05,0.8,0.14,0.01])),[0.2,0.7],nb_states)
>>> s3 = GoHMM_state(list(zip([2,3],[0.05,0.95])),[0.0,0.3],nb_states)
>>> s4 = GoHMM_state(list(zip([1,4],[0.1,0.9])),[2.0,4.0],nb_states)
>>> matrix = array([s0[0],s1[0],s2[0],s3[0],s4[0]])
>>> output = array([s0[1],s1[1],s2[1],s3[1],s4[1]])
>>> model =  GoHMM(matrix,output,[0.1,0.7,0.0,0.0,0.2],"My GoHMM")
>>> #print(model)

We can also generate a random GoHMM

>>> random_model = GoHMM_random(nb_states=5,
                                nb_distributions=2,
                                random_initial_state=True,
                                min_mu = 0.0,
                                max_mu = 5.0,
                                min_sigma = 0.5,
                                max_sigma = 5.0)

Exploration

>>> model.a(0,0)# moving from state 0 to state 0
0.9
>>> model.mu(1)# mu parameters in state 1
[0.5,2.5]
>>> model.mu_n(1,0)# first mu parameter in state 1
0.5
>>> model.mu_n(1,1)# second mu parameter in state 1
2.5
>>> # probability that the first distribution in state 0 generates 4.0.
>>> model.b_n(s=0,n=0,l=4.0)
0.07820853879509118
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0
>>> model.b(0,[4.0,2.0])
0.019676393869096122
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0, and that we move from state 0
>>> # to state 1.
>>> model.tau(0,1,[4.0,2.0])
0.001967639386909612

Running

>>> model.run(5) # returns a list of 5 observations
[-0.6265, 0.3031, -0.5885, 5.4135, 3.45367]
>>> s = model.generateSet(10,5) # returns a Set containing 10 traces of size 5
>>> s.sequences
[[-0.0069, 1.7705, -1.7140, 1.8141, -0.81803],
 [1.9338, 12.4325, -1.0763, 1.8086, 1.4970],
 [1.7609, 2.8232, 0.0197, -0.3019, 2.3554],
 [0.5744, -0.9050, 0.3306, -0.3162, 2.7524],
 [-1.4897, 1.7604, -0.2746, 0.3566, -0.3647],
 [-3.6607, 0.9767, 0.3046, -0.3125, -0.0091],
 [-1.6521, -0.2060, 0.0392, -0.1600, -0.5134],
 [1.2431, 1.0243, 0.4519, -0.6647, -0.6829],
 [-0.1490, 2.2450, 9.5885, 10.1277, 2.0458],
 [1.8807, 0.8840, -1.2561, 2.0877, -0.5899]]

>>> s.times
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Analysis

>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-10.183743400307154

Saving/Loading

>>> model.save("my_gohmm.txt")
>>> another_model = ja.loadGoHMM("my_gohmm.txt")

Model

class jajapy.GoHMM(matrix, output, initial_state, name='unknown_GoHMM')

Creates a GoHMM.

Parameters

matrixndarray: Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.
outputndarray: Contains the parameters of the guassian distributions. output[s1][0][0] is the mu parameter of the first distribution in s1, output[s1][0][1] is the sigma parameter of the first distribution in s1. output[s1][1][0] is the mu parameter of the second distribution in s1. etc…
initial_stateint or list of float: Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
namestr, optional: Name of the model. Default is “unknown_GoHMM”

Creates an abstract model for HMM and GoHMM.

Parameters

matrixndarray: Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.
initial_stateint or list of float: Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
namestr, optional: Name of the model.

a(s1: int, s2: int) → float

Returns the probability of moving from state s1 to state s2. If s1 or s2 is not a valid state ID it returns 0.

Parameters

s1int: ID of the source state.
s2int: ID of the destination state.

Returns

float: Probability of moving from state s1 to state s2.

Examples

>>> model.a(0,1)
0.5
>>> model.a(0,0)
0.0

b(s: int, l: list) → float

Returns the likelihood of generating l in state s.

Parameters

sint: ID of the source state.
llist of float: list of observations.

Returns

outputfloat: Likelihood of generating l in state s.

b_n(s: int, n: int, l: float) → float

Returns the likelihood of generating, from the nth distribution of s, observation l.

Parameters

sint: state ID
nint: Index of the distribution.
lstr: The observation.

Returns

outputfloat: Likelihood of generating, from the n`th distribution of this state, observation `l.

generateSet(set_size: int, param, distribution=None, min_size=None, timed: bool = False) → Set

Generates a set (training set / test set) containing set_size traces.

Parameters

set_size: int: number of traces in the output set.
param: a list, an int or a float.: the parameter(s) for the distribution. See “distribution”.
distribution: str, optional: If distribution=='geo' then the sequence length will be distributed by a geometric law such that the expected length is min_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), or param can be a list of int. Default is None.
min_size: int, optional: see “distribution”. Default is None.
timed: bool, optional: Only for timed model. Generate timed or non-timed traces. Default is False.

Returns

output: Set: a set (training set / test set).

Examples

>>> set1 = model.generateSet(100,10)
>>> # set1 contains 100 traces of length 10
>>> set2 = model.generate(100, 1/4, "geo", min_size=6)
>>> # set2 contains 100 traces. The length of the traces is distributed following
>>> # a geometric distribution with parameter 1/4. All the traces contains at
>>> # least 6 observations, hence the average length of a trace is 6+(1/4)**(-1) = 10.

logLikelihood(sequences: Set) → float

Compute the average loglikelihood of a set.

Parameters

sequences: Set: A set.

Returns

output: float: loglikelihood of sequences under this model.

Examples

>>> model.logLikelihood(set1)
-4.442498878506513

mu(s: int) → ndarray

Returns the mu parameters for this state.

Parameters

sint: ID of the source state.

Returns

ndarray: the mu parameters.

mu_n(s: int, n: int) → float

Returns the mu parameters of the n`th distribution in `s.

Parameters

sint: ID of the source state.
nint: Index of the distribution.

Returns

float: the mu parameter of the n`th distribution in `s.

next(state: int) → list

Returns a state-observation pair according to the distributions described by self.matrix[s] and self.output[s].

Parameters

stateint: ID of the source state.

Returns

output[int, list of floats]: A state-observation pair.

next_obs(s: int) → list

Generates n observations according to the n normal distributions in s.

Parameters

sint: ID of the source state.

Returns

outputstr: An observation.

next_state(state: int) → int

Returns one state ID at random according to the distribution described by the self.matrix.

Parameters

stateint: ID of the source state.

Returns

int: A state ID.

pi(s: int) → float

Return the probability of starting in state s.

Parameters

s: int: state ID.

Returns

outputfloat: the probability of starting in state s.

rename(name: str) → None

Change the name of the model.

Parameters

namestr: new name.

run(number_steps: int, current: int = -1) → list

Simulates a run of length number_steps of the model and return the sequence of observations generated.

Parameters

number_steps: int: length of the simulation.
currentint, optional.: If current it set, it starts from the state current. Otherwise it starts from an initial state.

Returns

output: list of str: trace generated by the run.

save(file_path: str)

Save the model into a text file.

Parameters

file_pathstr: path of the output file.

Examples

>>> model.save("my_model.txt")

tau(s1: int, s2: int, obs: list) → float

Returns the likelihood of generating, from s1, observation obs while moving to state s2.

Parameters

s1int: A state ID.
s2int: A state ID.
obslist of floats: An observation.

Returns

outputfloat: The likelihood of generating, from from s1, observation obs while moving to state s2.

Other Functions

jajapy.createGoHMM(transitions: list, output: list, initial_state: str, name: str = 'unknown_GoHMM') → GoHMM

An user-friendly way to create a GoHMM.

Parameters

transitions[ list of tuples (int, int, float)]: Each tuple represents a transition as follow: (source state ID, destination state ID, probability).
outputlist of list of tuples (float, float)]: Represents the parameters of the gaussian distributions [(mu1, sigma1),(mu2, sigma2),…]. output[0] contains the parameters of the distributions in state 0 output[0][0] contains the 2 parameters of the first distribution in state 0. output[0][0][0] is the mu parameter of the first distribution in state 0, and output[0][0][1] is the sigma parameter of the first distribution in state 0.
initial_stateint or list of float: Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
namestr, optional: Name of the model. Default is “unknow_GoHMM”.

Returns

GoHMM: the GoHMM describes by transitions, emission, and initial_state.

Examples

jajapy.loadGoHMM(file_path: str) → GoHMM

Load an GoHMM saved into a text file.

Parameters

file_pathstr: Location of the text file.

Returns

outputGoHMM: The GoHMM saved in file_path.

jajapy.GoHMM_random(nb_states: int, nb_distributions: int, random_initial_state: bool = False, min_mu: float = 0.0, max_mu: float = 2.0, min_sigma: float = 0.5, max_sigma: float = 2.0, sseed: Optional[int] = None) → GoHMM

Generates a random GoHMM.

Parameters

nu_statesint: Number of states.
nb_distributionsint: Number of distributions in each state.
alphabetlist of str: List of observations.
random_initial_state: bool, optional: If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is False.
min_mufloat, optional: lower bound for mu. By default 0.0
max_mufloat, optional: upper bound for mu. By default 2.0
min_sigmafloat, optional: lower bound for sigma. By default 0.5
max_sigmafloat, optional: upper bound for sigma. By default 2.0
sseedint, optional: the seed value.

Returns

GoHMM: A pseudo-randomly generated GoHMM.