Gaussian observations Hidden Markov Model (GoHMM)
A GoHMM is similar to a HMM, but this time the observations are vectors of real number generated according to sevral Gaussian distributions. Instead of having a discrete probability ditribution over all the possible observations, each state has several pairs of parameters mu and sigma, one for each Gaussian distribution. THe number of Gaussian distribution is the same in each state.
Example
The following example is a GoHMM with only one Gaussian distribution in each state. The dotted arrows represent the initial state probability distribution: we will start in the yellow state with probability 0.1.
Creation
We can create a GoHMM with two Gaussian distributions as follow:
>>> import jajapy as ja
>>> from numpy import array
>>> nb_states = 5
>>> s0 = GoHMM_state(list(zip([0,1],[0.9,0.1])),[3.0,5.0],nb_states)
>>> s1 = GoHMM_state(list(zip([0,1,2,4],[0.05,0.9,0.04,0.01])),[0.5,1.5],nb_states)
>>> s2 = GoHMM_state(list(zip([1,2,3,4],[0.05,0.8,0.14,0.01])),[0.2,0.7],nb_states)
>>> s3 = GoHMM_state(list(zip([2,3],[0.05,0.95])),[0.0,0.3],nb_states)
>>> s4 = GoHMM_state(list(zip([1,4],[0.1,0.9])),[2.0,4.0],nb_states)
>>> matrix = array([s0[0],s1[0],s2[0],s3[0],s4[0]])
>>> output = array([s0[1],s1[1],s2[1],s3[1],s4[1]])
>>> model = GoHMM(matrix,output,[0.1,0.7,0.0,0.0,0.2],"My GoHMM")
>>> #print(model)
We can also generate a random GoHMM
>>> random_model = GoHMM_random(nb_states=5,
nb_distributions=2,
random_initial_state=True,
min_mu = 0.0,
max_mu = 5.0,
min_sigma = 0.5,
max_sigma = 5.0)
Exploration
>>> model.a(0,0)# moving from state 0 to state 0
0.9
>>> model.mu(1)# mu parameters in state 1
[0.5,2.5]
>>> model.mu_n(1,0)# first mu parameter in state 1
0.5
>>> model.mu_n(1,1)# second mu parameter in state 1
2.5
>>> # probability that the first distribution in state 0 generates 4.0.
>>> model.b_n(s=0,n=0,l=4.0)
0.07820853879509118
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0
>>> model.b(0,[4.0,2.0])
0.019676393869096122
>>> # probability that the first distribution in state 0 generates 4.0,
>>> # and that the second one generates 2.0, and that we move from state 0
>>> # to state 1.
>>> model.tau(0,1,[4.0,2.0])
0.001967639386909612
Running
>>> model.run(5) # returns a list of 5 observations
[-0.6265, 0.3031, -0.5885, 5.4135, 3.45367]
>>> s = model.generateSet(10,5) # returns a Set containing 10 traces of size 5
>>> s.sequences
[[-0.0069, 1.7705, -1.7140, 1.8141, -0.81803],
[1.9338, 12.4325, -1.0763, 1.8086, 1.4970],
[1.7609, 2.8232, 0.0197, -0.3019, 2.3554],
[0.5744, -0.9050, 0.3306, -0.3162, 2.7524],
[-1.4897, 1.7604, -0.2746, 0.3566, -0.3647],
[-3.6607, 0.9767, 0.3046, -0.3125, -0.0091],
[-1.6521, -0.2060, 0.0392, -0.1600, -0.5134],
[1.2431, 1.0243, 0.4519, -0.6647, -0.6829],
[-0.1490, 2.2450, 9.5885, 10.1277, 2.0458],
[1.8807, 0.8840, -1.2561, 2.0877, -0.5899]]
>>> s.times
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Analysis
>>> model.logLikelihood(s) # loglikelihood of this set of traces under this model
-10.183743400307154
Saving/Loading
>>> model.save("my_gohmm.txt")
>>> another_model = ja.loadGoHMM("my_gohmm.txt")
Model
- class jajapy.GoHMM(matrix, output, initial_state, name='unknown_GoHMM')
Creates a GoHMM.
Parameters
- matrixndarray
Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.
- outputndarray
Contains the parameters of the guassian distributions. output[s1][0][0] is the mu parameter of the first distribution in s1, output[s1][0][1] is the sigma parameter of the first distribution in s1. output[s1][1][0] is the mu parameter of the second distribution in s1. etc…
- initial_stateint or list of float
Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
- namestr, optional
Name of the model. Default is “unknown_GoHMM”
Creates an abstract model for HMM and GoHMM.
Parameters
- matrixndarray
Represents the transition matrix. matrix[s1][s2] is the probability of moving from s1 to s2.
- initial_stateint or list of float
Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
- namestr, optional
Name of the model.
- a(s1: int, s2: int) → float
Returns the probability of moving from state s1 to state s2. If s1 or s2 is not a valid state ID it returns 0.
Parameters
- s1int
ID of the source state.
- s2int
ID of the destination state.
Returns
- float
Probability of moving from state s1 to state s2.
Examples
>>> model.a(0,1) 0.5 >>> model.a(0,0) 0.0
- b(s: int, l: list) → float
Returns the likelihood of generating l in state s.
Parameters
- sint
ID of the source state.
- llist of float
list of observations.
Returns
- outputfloat
Likelihood of generating l in state s.
- b_n(s: int, n: int, l: float) → float
Returns the likelihood of generating, from the nth distribution of s, observation l.
Parameters
- sint
state ID
- nint
Index of the distribution.
- lstr
The observation.
Returns
- outputfloat
Likelihood of generating, from the n`th distribution of this state, observation `l.
- generateSet(set_size: int, param, distribution=None, min_size=None, timed: bool = False) → Set
Generates a set (training set / test set) containing
set_sizetraces.Parameters
- set_size: int
number of traces in the output set.
- param: a list, an int or a float.
the parameter(s) for the distribution. See “distribution”.
- distribution: str, optional
If
distribution=='geo'then the sequence length will be distributed by a geometric law such that the expected length ismin_size+(1/param). If distribution==None param can be an int, in this case all the seq will have the same length (param), orparamcan be a list of int. Default is None.- min_size: int, optional
see “distribution”. Default is None.
- timed: bool, optional
Only for timed model. Generate timed or non-timed traces. Default is False.
Returns
- output: Set
a set (training set / test set).
Examples
>>> set1 = model.generateSet(100,10) >>> # set1 contains 100 traces of length 10 >>> set2 = model.generate(100, 1/4, "geo", min_size=6) >>> # set2 contains 100 traces. The length of the traces is distributed following >>> # a geometric distribution with parameter 1/4. All the traces contains at >>> # least 6 observations, hence the average length of a trace is 6+(1/4)**(-1) = 10.
- logLikelihood(sequences: Set) → float
Compute the average loglikelihood of a set.
Parameters
- sequences: Set
A set.
Returns
- output: float
loglikelihood of
sequencesunder this model.
Examples
>>> model.logLikelihood(set1) -4.442498878506513
- mu(s: int) → ndarray
Returns the mu parameters for this state.
Parameters
- sint
ID of the source state.
Returns
- ndarray
the mu parameters.
- mu_n(s: int, n: int) → float
Returns the mu parameters of the n`th distribution in `s.
Parameters
- sint
ID of the source state.
- nint
Index of the distribution.
Returns
- float
the mu parameter of the n`th distribution in `s.
- next(state: int) → list
Returns a state-observation pair according to the distributions described by self.matrix[s] and self.output[s].
Parameters
- stateint
ID of the source state.
Returns
- output[int, list of floats]
A state-observation pair.
- next_obs(s: int) → list
Generates n observations according to the n normal distributions in s.
Parameters
- sint
ID of the source state.
Returns
- outputstr
An observation.
- next_state(state: int) → int
Returns one state ID at random according to the distribution described by the self.matrix.
Parameters
- stateint
ID of the source state.
Returns
- int
A state ID.
- pi(s: int) → float
Return the probability of starting in state
s.Parameters
- s: int
state ID.
Returns
- outputfloat
the probability of starting in state s.
- run(number_steps: int, current: int = -1) → list
Simulates a run of length
number_stepsof the model and return the sequence of observations generated.Parameters
- number_steps: int
length of the simulation.
- currentint, optional.
If current it set, it starts from the state current. Otherwise it starts from an initial state.
Returns
- output: list of str
trace generated by the run.
- save(file_path: str)
Save the model into a text file.
Parameters
- file_pathstr
path of the output file.
Examples
>>> model.save("my_model.txt")
- tau(s1: int, s2: int, obs: list) → float
Returns the likelihood of generating, from s1, observation obs while moving to state s2.
Parameters
- s1int
A state ID.
- s2int
A state ID.
- obslist of floats
An observation.
Returns
- outputfloat
The likelihood of generating, from from s1, observation obs while moving to state s2.
Other Functions
- jajapy.createGoHMM(transitions: list, output: list, initial_state: str, name: str = 'unknown_GoHMM') → GoHMM
An user-friendly way to create a GoHMM.
Parameters
- transitions[ list of tuples (int, int, float)]
Each tuple represents a transition as follow: (source state ID, destination state ID, probability).
- outputlist of list of tuples (float, float)]
Represents the parameters of the gaussian distributions [(mu1, sigma1),(mu2, sigma2),…]. output[0] contains the parameters of the distributions in state 0 output[0][0] contains the 2 parameters of the first distribution in state 0. output[0][0][0] is the mu parameter of the first distribution in state 0, and output[0][0][1] is the sigma parameter of the first distribution in state 0.
- initial_stateint or list of float
Determine which state is the initial one (then it’s the id of the state), or what are the probability to start in each state (then it’s a list of probabilities).
- namestr, optional
Name of the model. Default is “unknow_GoHMM”.
Returns
- GoHMM
the GoHMM describes by transitions, emission, and initial_state.
Examples
- jajapy.loadGoHMM(file_path: str) → GoHMM
Load an GoHMM saved into a text file.
Parameters
- file_pathstr
Location of the text file.
Returns
- outputGoHMM
The GoHMM saved in file_path.
- jajapy.GoHMM_random(nb_states: int, nb_distributions: int, random_initial_state: bool = False, min_mu: float = 0.0, max_mu: float = 2.0, min_sigma: float = 0.5, max_sigma: float = 2.0, sseed: Optional[int] = None) → GoHMM
Generates a random GoHMM.
Parameters
- nu_statesint
Number of states.
- nb_distributionsint
Number of distributions in each state.
- alphabetlist of str
List of observations.
- random_initial_state: bool, optional
If set to True we will start in each state with a random probability, otherwise we will always start in state 0. Default is False.
- min_mufloat, optional
lower bound for mu. By default 0.0
- max_mufloat, optional
upper bound for mu. By default 2.0
- min_sigmafloat, optional
lower bound for sigma. By default 0.5
- max_sigmafloat, optional
upper bound for sigma. By default 2.0
- sseedint, optional
the seed value.
Returns
- GoHMM
A pseudo-randomly generated GoHMM.