ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
< sreenik>
rcurtin: Just to tell you, I had (successfully) introduced mlpack to my friends during the application period, the only deterrent was the gpu acceleration part (as instructions are sent one by one to the gpu and not as a chunk, or that is what I know). So bandicoot has a lot of us waiting :)
< sreenik>
Maybe mlpack could shift a couple of places up in the next tweet :)
KimSangYeon-DGU has joined #mlpack
< sreenik>
Oh and one more query, is bandicoot supposed to target only nvidia gpus or amd as well?
< rcurtin>
sreenik: yeah, bandicoot is high priority for me. it will be both nvidia and amd GPUs, so it will be an OpenCL and a CUDA wrapper
< rcurtin>
(it's not possible to do OpenCL only; nvidia's lead with CUDA is too big)
< sreenik>
rcurtin: Sounds good! By the way, the "bones" and "meat" classification sounds a lot cooler than the "impl" we have in mlpack :)
< sreenik>
I wonder when amd will invest some money into software
xiaohong has joined #mlpack
jeffin143 has joined #mlpack
< jeffin143>
lozhnikov : opened pr1904 , just have a look when you are free, and let me know some of your initial comments, I left some doubts there. Also let me know about the API we would be providing the user with.
< jeffin143>
Umm, Can we maintain an archive of all accepted gsoc proposals , I would love to read other proposals, also would be good for someone who wants to see some examples , what our organisation seeks for . Just a suggestion :)
< xiaohong>
So for the blog repo, we don't need to fork it, just push what we write down, right?
xiaohong has quit [Ping timeout: 256 seconds]
Suryo has joined #mlpack
< Suryo>
zoq: I've added a test for the ackley function as a part of Adam's tests. If you could take a look at it and let me know if they're okay, then I'll proceed with testing the remaining functions.
< Suryo>
Also, three of the new functions are non-differentiable. Would it be okay to merge them for now and test (and modify them if required) later?
Suryo has quit [Client Quit]
< lozhnikov>
jeffin143: ok, I'll look through the PR in the evening or tomorrow in the morning.
< zoq>
Suryo: Will take a look and let you know, about the non-differentiable functions, what about we test them at least once locally using CNE or CMAES?
< zoq>
xiaohong: Correct, you should have write permissions for the repo.
< toshal>
ShikharJ zoq: Please review the serialization PR #1770 in your free time as it is quite important to move ahead.
< zoq>
toshal: Okay, will take a look soon.
xiaohong has joined #mlpack
jeffin143 has quit [Remote host closed the connection]
< zoq>
jeffin143: The ACM Summer School schedule looks really interesting.
< sumedhghaisas>
hmm... I did something similar as well. Although couldn't get proper form for integration
< sumedhghaisas>
although you could just focus on integration root(G1 * G2)
< KimSangYeon-DGU>
hmm..
< sumedhghaisas>
cos (phi) is a constant here
< sumedhghaisas>
with respect to 'x'
< sumedhghaisas>
in actual quantum realm, cons (phi) changes with 'x' but here we take it as onstant
< sumedhghaisas>
there are many reasons for that
< KimSangYeon-DGU>
Ah,
< sumedhghaisas>
although on the side note
< sumedhghaisas>
how are the plots looking?
< KimSangYeon-DGU>
The plotting Quantum GMM and GMM is similar
< KimSangYeon-DGU>
but I didn't calculating the area
< KimSangYeon-DGU>
looks similar
< sumedhghaisas>
hmm... thats strange
< sumedhghaisas>
could we just plot it without any normalization?
< KimSangYeon-DGU>
Actually, I spent almost time to derivate Gaussian integral
< sumedhghaisas>
I mean just take any values for alphas
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
and any value between -1 and 1 for cosine
< sumedhghaisas>
and plot the function without any changes
< KimSangYeon-DGU>
I'll try it
< KimSangYeon-DGU>
Yeah
< KimSangYeon-DGU>
I'll try it right now
< sumedhghaisas>
Cool. I am online. Let me know when you are done.
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
Lets see how changing alphas and consines changes the plot
< sumedhghaisas>
I think you need to take of y-axis scale when plotting
< sumedhghaisas>
but give it a go
xiaohong2 has quit [Ping timeout: 276 seconds]
< sreenik>
What is the status of inception and resnet blocks in the ann module? Are they implemented yet?
< zoq>
sreenik: For both there is an unfinished PR.
< sreenik>
Umm okay. Then I will leave them for now in the tensorflow model translator and include them whenever the work is complete.
< zoq>
Right, sounds reasonable.
< KimSangYeon-DGU>
sumedhghaisas: Can you check it please?
< KimSangYeon-DGU>
I'll post our google docs
< KimSangYeon-DGU>
ahh, posted
< KimSangYeon-DGU>
sumedhghaisas: The two probabilities looked different or similar depending on the parameters
< sumedhghaisas>
cool
< sumedhghaisas>
let me check
< sumedhghaisas>
hey kim
< sumedhghaisas>
KimSangYeon-DGU: Could we zoom in on the interference pattern?
< sumedhghaisas>
Nice plots though... also for the last set of alphas can we generate plots for phi 10, 20, 30, 40, 50, 60, 70, ... and so on till 180
< sumedhghaisas>
That would be a good analysis to have
< sumedhghaisas>
I think different values of phi should create different interference patterns
< KimSangYeon-DGU>
Yeah!!
< KimSangYeon-DGU>
When I'm done, will ping you
< sumedhghaisas>
cool
< KimSangYeon-DGU>
sumedhghaisas: You mean only change phi, right?
< KimSangYeon-DGU>
10, 20, 30 until 180
< KimSangYeon-DGU>
last set of alphas
< sumedhghaisas>
correct
< sumedhghaisas>
KimSangYeon-DGU: And zoom in on the interference while you are at it :)
< KimSangYeon-DGU>
Yeah!!
< KimSangYeon-DGU>
sumedhghaisas: I'm done
< KimSangYeon-DGU>
Certainly, when the phi is pi/2, we didn't see the interference phenomena.
< KimSangYeon-DGU>
sumedhghaisas: Hey, Sumedh. I should take the last subway, so I'll reconnect in an hour.
< KimSangYeon-DGU>
sumedhghaisas: You can check the plotting of probabilities with changes phis
KimSangYeon-DGU has quit [Quit: Page closed]
KimSangYeon-DGU has joined #mlpack
< KimSangYeon-DGU>
Although experiments can be done in an hour, feel free to ping me :)
< KimSangYeon-DGU>
I connected using my phone
sumedhghaisas has quit [Ping timeout: 256 seconds]
KimSangYeon-DGU has quit [Ping timeout: 256 seconds]
< ShikharJ>
sakshamB: toshal: If you guys are here, let's start?
< sakshamB>
yes i am here
< ShikharJ>
sakshamB: Okay, let's wait a bit for toshal then.
< ShikharJ>
Meanwhile, have you started writing blog posts?
< sakshamB>
yes I saw the mail, I will try to have it done over the weekend.
< toshal>
Okay
< toshal>
Yes I have cloned the repo
< toshal>
Gsoc2019introduction.md looks cool
< ShikharJ>
Okay, so I see that you started off with the template code for MiniBatch discrimination, I'll review that today. And Highway Networks is also mostly complete, so I'll try and merge that after some discussion, I was planning to have.
< ShikharJ>
sakshamB: You might also want to push a template code for Inception Scoring, have you thought about how you're going to implement that?
< sakshamB>
the input to the inception score should be a pretrained model and a set of generated images. Since we won’t be able to use the pretrained Inception model should we train our own model on the MNSIT dataset?
< ShikharJ>
sakshamB: toshal: I would insist that the blog posts be done by Sunday. Weekly updates are something we value highly.
< toshal>
ShikharJ: Yes It will be done.
< ShikharJ>
sakshamB: Hmm, couldn't we just add that as a feature which is callable from a declared model? That way, we could just provide the output of the GAN model to the callable, and tune the model from there.
KimSangYeon-DGU has joined #mlpack
< sakshamB>
ShikharJ so if I understand correctly you want to implement this inside the FFN class or where would you implement it?
< ShikharJ>
sakshamB: Yeah, either FFN or GAN, if we're being specific.
< sakshamB>
we could implement it inside FFN, but its application is specific to GAN.
< ShikharJ>
sakshamB: Could it be used for CNNs? If so, then it might be useful inside FFN, else GAN.
< sakshamB>
I was initially thinking of implementing it as a layer but I guess that doesn’t make sense since we don’t need Backward and Gradient calls
< sakshamB>
we could have it as a function in GANs and pass a pretrained model to the function
< sakshamB>
No, it shouldn’t be useful for CNNs only for generative models
< ShikharJ>
Or we could implement it as a member function of the GAN class and call it with the output of the declared model.
jeffin143 has joined #mlpack
< sakshamB>
ShikharJ: yes that could work. Would you have to redefine it for CycleGan?
< sakshamB>
since as you mentioned that is separate from the rest of the GANs
< ShikharJ>
sakshamB: It'll have to be a rewrite if we're making it specific to GANs, otherwise if it is implemented in FFN layer, I don't think it would require a rewrite, maybe just an overload of the previously defined function.
< sakshamB>
ShikharJ: there are other metrics for GAN evaluation and I think toshal brought up FID in his proposal so if we plan to implement those in future it might be better to have them separate from GAN and FFN implementation
< ShikharJ>
sakshamB: Okay, then let's go with that.
< ShikharJ>
Hmm, maybe we can have a subfolder as GAN_metrics or something.
< sakshamB>
yes that would be nice
< ShikharJ>
sakshamB: Okay, feel free to push a draft for that as well. By the weekend, we should have the Highway networks PR merged in and mini-batch close to ready.
< ShikharJ>
toshal: How is the Label Smoothing PR coming up? Did you happen to write the template code for that?
< sakshamB>
ShikharJ alright will have that done
< toshal>
shikharJ:
< toshal>
Hi
< toshal>
Yes I am working on it. But it would be great if my serialization PR gets merged in next two days
< toshal>
Because it contains some important changes
< ShikharJ>
toshal: Is it a requirement for the further changes in label-smoothing?
< toshal>
Yes
< ShikharJ>
Hmm, in that case, just open a draft with the code, don't worry about the builds. I'll review the Serialization PR in the meantime. I'll also have to run the tests on that.
< toshal>
Okay
< ShikharJ>
sakshamB: toshal: Anything else that you would like me to do, apart from reviewing the PRs?
< toshal>
Is there anything else remaining
< toshal>
No sir
< toshal>
I am going off now. See you soon.
< ShikharJ>
toshal: Lmao, you're the same age as me (possibly older). We're on first name basis now.
< sakshamB>
no, nothing right now. Will communicate through irc if I have any doubts later
konquerer has joined #mlpack
< ShikharJ>
Okay, cool. Have a fun weekend.
< konquerer>
hello all. I'm interested in performing a query involving two point sets, which is a sort of padded intersection. Given a reference set R, a query set Q, and a padding distance p, return all the points of Q whose nearest neighbor in R is at a distance less than p away. Does mlpack have a class that will do this out of the box?
< konquerer>
It's sort of a range search, but with the output lists of indices merged together
< konquerer>
It seems like I ought to be able to avoid a significant amount of duplicate work by using a Dual Tree method, traversing two binary space tree versions of the point sets and pruning any leaves of the query set whose nearest possible point in the reference set is further than the specified padding distance.
< konquerer>
so I guess my question is, has somebody already done that work, or are my only options to do something more naive with a nearest-neighbor search or implement my own tree traversal method?
< jeffin143>
Zoq : yes indeed, excited to attend the sessions
< jeffin143>
zoq , rcurtin , lozhnikov : need some help , so all our string utilities function takes vector of string as input , but now while trying to build CLI binding for those function , I am facing trouble since we can't take vector as input.
< jeffin143>
So my plan was , to take file name as input and then parse the file - simple file handling , and then store the data in vector and then pass those and again store write the contents back to file
< jeffin143>
So the support could be there for 3 types of file , .txt ,.csv .arff since the first one use \t to separate columns and the later two use ',' , so we can easily split and store it in vector using any split function
< jeffin143>
Not sure these ideas are good.
< jeffin143>
Any input would be appreciated, so that we can build upon the binding.
< akhandait>
sreenik: The residual blocks PR has already been merged.
< akhandait>
So, I guess you can include that.
vivekp has joined #mlpack
< akhandait>
Also, I had to cancel my plans to travel sadly, but the good thing is I will probably be available most of next week.
< akhandait>
Anyways, let's have a meet tomorrow at 11 P.M. if that's okay with you.
< rcurtin>
konqueror: couldn't you just do a range search with radius p, and then if a query point has more than one reference point in that range, you keep the query point (otherwise remove it from your set)?
< sreenik>
akhandait: That's perfectly fine with me. So 11pm tomorrow. Meanwhile I will take a look at the PR implementing residual blocks
< akhandait>
Okay then, see ya tomorrow.
< zoq>
akhandait: Ahh, you are right.
< akhandait>
zoq: Too many PRs to keep track of :p
< jeffin143>
Zoq : , is this possible , vector<vector>> ?
< zoq>
jeffin143: Not sure right now. Have you tested it?
< zoq>
rcurtin: Might be able to provide some insight.
< zoq>
If it's not already there it might a good idea to implement support for that instead of using a workaround?
KimSangYeon-DGU has quit [Ping timeout: 256 seconds]
< jeffin143>
Zoq : I thought of that but I guess I am not so good with background of binding , I might have to thoroughly go through it then . Would do so now . All those param . I used it but didn't go deep
< zoq>
Okay, pretty sure once Ryan has some time, he could pinpoint us to the right lines.
< jeffin143>
Yeah , will schedule a meeting with him , when he is free , just to discuss about this :) , would you ping you too .
< jeffin143>
Ping you*
< zoq>
sounds good
< rcurtin>
we don't have a good way to do vector<vector>, program_options only knows how to read vector<string> or vector<int> or vector<float>, etc.
< rcurtin>
I'm not sure how we would do vector<vector<T>>, but it would be really nice to figure it out---that would allow us to finally provide range_search correctly to other languages :)
< jeffin143>
Rcurtin could we do something like vector<class>
< jeffin143>
And the class has so int , string and that is column values in csv file..??
< rcurtin>
jeffin143: I don't think we can, how would pass a 'class' on the command line?
< rcurtin>
i.e. if we have vector<float> we do --option float1 --option float2 --option float3 ...
< rcurtin>
I'd suggest instead maybe the better way to do something like vector<class> is to write a wrapper ClassWrapper which internally holds vector<class>, and then you can use the existing serializable model support
< rcurtin>
like PARAM_MODEL_IN(ClassWrapper, ...) and PARAM_MODEL_OUT(ClassWrapper, ...)
< jeffin143>
Oh so this is how a vector input is taken , so if a user has to convert 1000 strings he has to give it as 1000 options..??
< rcurtin>
I think maybe --option float1,float2,float3,... works too, don't remember
< rcurtin>
what's the end goal? is it that you want to allow the user to pass a file that's full of strings, one on each line?
< rcurtin>
(and it could be very many strings)
< jeffin143>
Yes ,
< jeffin143>
Or may be a csv which has a partic
< jeffin143>
Particular column as string and he can specify that column
< rcurtin>
yeah, this happens with hmm_train_main.cpp too---the solution is not good, but basically for now it is PARAM_STRING_IN() and you specify a file that contains all the other filenames to load from
< jeffin143>
And then all the rows , I can take
< rcurtin>
that works for the command-line, but from Python it's awkward... instead of passing a list of filenames, you have to write that list of filenames to somewhere and then pass the filename where you wrote it to...
< rcurtin>
maybe can you use PARAM_MATRIX_AND_INFO_IN()? that can load a CSV with strings in it
< rcurtin>
and it will automatically map them using DatasetInfo (basically it calls the overload of data::Load() with a DatasetInfo)
< jeffin143>
I tried that* but then the , loaded matrix has no strings..
< jeffin143>
Or may be with could try to load with giving option as instead of dataset info, the user can use dictionary encoding or tfidf or something else
< jeffin143>
.?
< sreenik>
Do we have LRN (local response normalization) layer support?
< zoq>
sreenik: no
< sreenik>
zoq: Okay..
< zoq>
sreenik: pretty sure this time :)
< sreenik>
Haha :)
< rcurtin>
jeffin143: yeah, the loaded matrix will not have strings, instead it will have numbers and you have to use DatasetInfo::UnmapString() and MapString() to get the strings back
< jeffin143>
Ok , this information could be useful , unmapstring() and mapstring() , I didn't knew about them , Thanks . Will give everything a try tomorrow . And will check which suits the best
< rcurtin>
yeah, hopefully it helps :)
< rcurtin>
but I think it is true that this part of mlpack could be improved; it is not the prettiest
< jeffin143>
Yes , I was facing to many difficulties
< jeffin143>
I wish armadillo had string datatype
< rcurtin>
there's arma::field<> but I don't think that works well
< jeffin143>
That would have made my summer good :-p
< rcurtin>
:)
< jeffin143>
That brings me to ask you, why don't arma support string..?
< rcurtin>
it's always possible to make an auxiliary class that can hold strings or any type inside of each column, with the idea that you use this to pass the data or load it
< rcurtin>
and then that will need to be converted to a numeric matrix before doing any machine learning
< rcurtin>
the reason is that Armadillo is kind of just a wrapper around LAPACK and BLAS, for linear algebra
< rcurtin>
but linear algebra on strings doesn't make sense, so the library doesn't support it
< jeffin143>
Oh I see
< rcurtin>
i.e., what does it mean to multiply a matrix of strings with another matrix of strings? :)
< jeffin143>
Haha
< jeffin143>
True
< rcurtin>
but you are right, it makes our life difficult when we deal with real data
< jeffin143>
Thanks for all the help , off to sleep after a heavy session :-p
< jeffin143>
Good night
< rcurtin>
sounds good, talk to you later :)
< ShikharJ>
sreenik: sakshamB: toshal: jeffin143: favre49: mulx10: Did you guys get the mlpack stickers yet?
< sreenik>
Ohh I think I had forgotten to send the email. I am doing it right away :D
< sreenik>
Done
< konquerer>
rcurtin: Thank you for the reply! Yes, that would work fine. I was concerned that I would do too much unnecessary work by finding all the points in R around the points Q, when I only need to verify that one exists... but I think I'm just trying to prematurely optimize. I'll give it a shot and see if it's fast enough (which it probably is).
< rcurtin>
konqueror: might be faster to do it the other way around, where you do a 1-NN search, then simply go through the distances and see if they are less than p
< rcurtin>
in fact, I think that is likely to be faster... well... I dunno, it depends on the data really
< rcurtin>
try it and see? :)
< konquerer>
rcurtin: yep, that's what I plan to do. I was thinking that I could formulate my specific problem as a dual-tree problem and create the requisite basecase and score functions to make a better algorithm... but since I'm new to these types of things I thought there might also be a standard name/implementation for what I'm trying to do that I wasn't familiar with.
< konquerer>
But until I test the more generic algorithms it doesn't make much sense to pursue something more specific
< rcurtin>
konquerer: yeah, agreed (sorry I am misspelling your nick... I am thinking of the old KDE browser :))
< rcurtin>
I think that you could write a BaseCase() and Score() for this, it wouldn't be too hard to do
< rcurtin>
basically I guess Score() can prune if for a given query point, the reference points are all farther than p
< rcurtin>
and it can *also* prune if, when it can be shown that there exists a reference point with distance less than p, you just mark that query point as in the result set and prune
< rcurtin>
(or that query node, if it can be shown to be true for all query points in the query node)