#mlpack on 2017-10-31 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

03:19 keonkim has quit [Ping timeout: 255 seconds]

03:35 keonkim has joined #mlpack

04:29 govg has joined #mlpack

08:27 rcurtin has joined #mlpack

09:18 gurupunskill has joined #mlpack

09:19 < gurupunskill> Hey! I'm new, but I'd like to contribute to your organization. Can someone help me get started along?

09:38 gurupunskill has quit [Ping timeout: 260 seconds]

10:40 gurupunskill has joined #mlpack

10:40 < gurupunskill> Okay, so I checked the chanel logs and found out how you helped another guy xD

10:41 < gurupunskill> I built mlpack from source, and I'm running through the tutorials to understand how it works. Thanks :)

10:44 gurupunskill has quit [Client Quit]

11:44 vivekp has quit [Ping timeout: 240 seconds]

11:46 vivekp has joined #mlpack

12:08 kaushik_ has joined #mlpack

13:00 < zoq> gurupunskil: Welcome, if you have any further questions please don't hesitate to ask here or via the mailing list.

13:41 the_neo_ has joined #mlpack

13:41 the_neo_ has left #mlpack []

13:42 the_neo_ has joined #mlpack

13:43 < the_neo_> I am Mathematics and Computing Sophomore. I want to contribute. I know C/C++, python (intermediate) and basic ML. I don't know how to proceed further. Can anyone help me? Thanks in advance!

13:44 < rcurtin> hi there the_neo_, I would suggest taking a look at http://www.mlpack.org/involved.html

13:44 < rcurtin> and http://www.mlpack.org/gsoc.html

13:44 < rcurtin> (let me know if those pages do not work... the system mlpack.org was suffering some routing issues this morning, but I think the hosting company has fixed them now...)

14:43 ans has joined #mlpack

14:44 ans has quit [Client Quit]

15:22 vivekp has quit [Ping timeout: 248 seconds]

15:23 the_neo_ has quit [Quit: Page closed]

15:28 vivekp has joined #mlpack

15:51 vivekp has quit [Ping timeout: 248 seconds]

15:54 vivekp has joined #mlpack

16:03 vivekp has quit [Ping timeout: 240 seconds]

16:12 vivekp has joined #mlpack

16:38 vivekp has quit [Ping timeout: 246 seconds]

16:40 vivekp has joined #mlpack

17:09 vivekp has quit [Ping timeout: 240 seconds]

17:11 vivekp has joined #mlpack

17:22 vivekp has quit [Ping timeout: 248 seconds]

17:27 vivekp has joined #mlpack

19:13 govg has quit [Ping timeout: 240 seconds]

19:15 govg has joined #mlpack

19:19 govg has quit [Ping timeout: 260 seconds]

19:27 kaushik_ has quit [Quit: Connection closed for inactivity]

20:36 gopala has joined #mlpack

20:37 < gopala> @rcurtin checked ur message in logs..

20:37 < gopala> 08:17 < rcurtin> or if you have multiple sequences you are training on, with a single sequence consisting of an observations file and a labels file, then you can pass the list of observation files (one per line) to --input_file 08:17 < rcurtin> and the list of labels files to --labels_file

20:39 < gopala> to clarify using an analogy, if I am to build say a speech phoneme HMMs, i'd have my speech features files at some sampling rate.. and in label files have phoneme label of each frame .. correct ?

20:48 < rcurtin> gopala: yep, that is correct

20:50 < gopala> so where does the number of states fit in ? each phoneme will have #n states ?

20:56 < rcurtin> you would specify the number of states to the mlpack_hmm_train program as the --states option (or '-n' for short)

20:57 < rcurtin> ah sorry hang on, let me clarify

20:57 < rcurtin> the number of states should be equal to the number of different label values that your HMM can have

20:58 < rcurtin> I think that in speech, typically a single phoneme is allowed to have multiple hidden states, which is not really the "textbook" way HMMs are done nor the way they are done in mlpack

20:58 < rcurtin> there's not really a way to say "train this HMM, but if you see label <x>, then make sure the hidden state is one of N states"

20:59 < rcurtin> however... what you could do, and I think this is what HTK does (not 100% sure on this), is train one HMM for each phoneme

20:59 < rcurtin> and when you train these HMMs, don't use labeled training; just specify the number of states for that phoneme, and only pass in training sequences for that particular phoneme

21:00 < rcurtin> so at the end of that process, you'll have ~30-70 HMMs (depending on language I guess), and then you can do prediction for a single phoneme by taking the max likelihood over all these models

21:00 < rcurtin> I'm not sure if that's all helpful, I hope I did not give way too much answer for a simple question :)

21:03 < gopala> ok it helps

21:03 < rcurtin> http://mi.eng.cam.ac.uk/~sjy/papers/gayo07.pdf might have some more clarity on how HMMs are typically used for speech recognition, and it may make it clear how to use mlpack for that

21:04 < gopala> so in supervised training, is there any baumwelch..or just compute GMM params

21:04 < rcurtin> if you are doing supervised training, no need for baum-welch---you can just estimate directly

21:04 < rcurtin> the GMMs still have to be fit with EM though

21:05 < rcurtin> but at least the transition probabilities matrix and initial state probabilities matrix can be estimated directly

21:06 < rcurtin> now, if you happen to have phonemes that are labeled not just, e.g., "a" but more like { "a0", "a1", "a2", ... } where you have one label for each hidden state of the phoneme, then you can use direct supervised estimation with mlpack_hmm_train, no need for the complex setup I just described

21:06 < rcurtin> but I don't think most speech data is typically labeled at that level

21:09 < gopala> ok thanks this helps.. so if i start with some rough (uniform) segmentation of phones to say 3 states.. I can loop over iterations of train and viterbi.. over and over again to end up with better labels.. correct ?

21:10 < rcurtin> hmm, I dunno if you could get better labels like that

21:10 < rcurtin> but I did think of a way you can get labels for individual phoneme states like you want

21:10 < rcurtin> train an HMM on only one phoneme with N hidden states

21:10 < rcurtin> so this is unlabeled baum-welch training

21:11 < rcurtin> once you get an HMM, use it to predict hidden states for all of your training data for that particular phoneme

21:11 < gopala> the problem is Id have to cut individual segments of this phoneme and generate training right ? or may be better deal at the api level

21:11 < rcurtin> if you do this for all phonemes, now you have hidden states for all phonemes, and you can "merge" all of this into a single HMM that can predict for an entire sequence by training with the output "labels" from each of the phoneme-level HMMs

21:12 < rcurtin> I think if you train an HMM for an individual phoneme, there is no need to segment the data itself any further than the phoneme-level labels that you already have

21:15 < gopala> are you suggesting a the idea of merge or is there something in mlpack to do this --- several hmms into one giant hmm?

21:16 < gopala> I guess Ill start using it and complain as I go... thanks ryan

21:16 < rcurtin> sure, happy to help

21:17 < rcurtin> what I'm suggesting in the second idea is to just retrain one big HMM directly on labeled data that you've generated with several HMMs

21:18 < rcurtin> i.e. you first train an HMM on each phoneme with baum-welch (i.e. without labels) with N states, and then you use that HMM to produce "sub-phoneme" labels for each phoneme

21:18 < gopala> yea ... it involves cutting each segment of each phoneme into diffreent files right ?

21:18 < rcurtin> where a "sub-phoneme label" is a term I'm using to refer to the particular internal state of a single phoneme

21:18 < rcurtin> right, but if you already have labels at the phoneme label it should be easy

21:19 < rcurtin> so if you have, say, 10000 labels of the "a" phoneme, you can just split those out of your data, train an "a" HMM on that "a" data without labels, then use that trained HMM to get "sub-phoneme labels" for all the "a" phoneme data

21:20 < gopala> got it.. yes..

21:20 < rcurtin> once you've done that for all phonemes (and changed the numbers of the sub-phoneme labels so they don't collide), you can train a "big" HMM model on the data and the "sub-phoneme labels" directly

21:20 < rcurtin> and then you can get predictions for a speech sequence directly

21:21 < gopala> i have to look at how you save the HMM model,, each state as a number ?

21:21 < gopala> and sequentially ?

21:22 < gopala> as in there shouldnt be any transitions between say a1 and b1 right.. but there should be between a1 and a2

21:22 < gopala> i using a1 and b1 as the first hidden states of a and b

21:27 < rcurtin> hmm, I am not sure I understand the question completely

21:27 < rcurtin> do you mean how I save the predictions from the HMM model? that would be as a number from 0 to the number of states

21:28 < gopala> so in doing the merge, I will change the number of states and something in the transition matrix.. correct ?

21:30 < rcurtin> yeah, when you create the merge model, then you will set the number of states to (number of phonemes * number of states for each phoneme)

21:30 < rcurtin> and you'll also have to change the labels for the states

21:31 < rcurtin> so i.e. if you had 'a' and 'b', and each of those phonemes had three states (so 'a0', 'a1', 'a2', 'b0', 'b1', 'b2')

21:31 < gopala> yep.. got it..

21:31 < rcurtin> then the 'a' HMM would produce class labels 0, 1, 2, and so would the 'b' HMM

21:31 < rcurtin> so you'd want to map the 'b' HMM output to, I guess, 3, 4, 5

21:31 < rcurtin> and so forth for the other models

21:32 < gopala> and will you also have a state stansition matrix ?

21:32 < rcurtin> no, the state transition matrix will be estimated from the training data directly

21:33 < rcurtin> it will be a block diagonal matrix, since i.e. state 'b0' never transitions to 'a2'

21:33 < gopala> got it.. i guess that was what i was asking about

21:34 < gopala> but i guess it has to have some non-zero probability if it is to decode a long speech file correct

21:34 < rcurtin> the probability from 'b2' to 'a0' shouldn't be zero

21:35 < gopala> yep

21:35 < rcurtin> anyway I have to run for now, I'll be back later

21:35 < gopala> ok

21:35 < rcurtin> feel free to ask any other questions if you have any problems

21:35 < gopala> thanks i got all i i neeeded..

21:35 < rcurtin> great, let me know if it works :)

21:35 < gopala> wil do..tx

21:40 gopala has quit [Ping timeout: 260 seconds]