verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#896 (master - 06cae13 : Ryan Curtin): The build was broken.
travis-ci has left #mlpack []
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#897 (master - 5b8fdce : Ryan Curtin): The build passed.
travis-ci has left #mlpack []
mentekid has joined #mlpack
Mathnerd314 has quit [Ping timeout: 250 seconds]
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#899 (master - 719f47f : Ryan Curtin): The build passed.
travis-ci has left #mlpack []
nilay has joined #mlpack
< nilay> when i run make on my machine, it hangs after the following output: http://pastebin.com/JhjfN26Y
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
benchmark has joined #mlpack
benchmark has quit [Client Quit]
mentekid-mobile has joined #mlpack
< zoq> nilay: Is this the complete output or does something get cut off?
< nilay> zoq: Hello, it is the complete output.
< nilay> this happened one or two times before also.
< nilay> if i remake everything again, it works then on my personal machine. right now i am debugging on the remote machine you provided.
< zoq> nilay: hm, okay, so if you delete the build folder or run 'make clean; make' it works?
< zoq> nilay: Maybe, there is something wrong with you Makefile, not sure, never seen such an error before.
< nilay> it is not even reaching upto there, the making has not started i guess
< zoq> nilay: Can you just run 'cmake ..' and does this return?
< nilay> no i can't run cmake .. even
< zoq> nilay: The same output?
< nilay> i guess i'll have to clone the repo again and do cmake ..
< zoq> nilay: Probably the easiest solution.
< nilay> and the most time consuming one
< nilay> zoq: do you know a way in which i could open more than one terminal on the remote machine?
< zoq> nilay: You can use tmux or screen you can also login as often as you like.
< nilay> zoq: ok, thanks.
mentekid-mobile has quit [Quit: Bye]
< nilay> zoq: can i do scp from my computer to remote
< zoq> nilay: yes
< nilay> zoq: ok, and for that i should write, scp -r nilay@ip-remote-host:directory-in-remote
< nilay> zoq: sorry, scp -r directory-in-personal nilay@ip-remote-host:directory-in-remote
< zoq> nilay: yes, also you need to specify the port 'scp -P 8088 ...'
< nilay> ok, i think i was missing that before
< nilay> and i think i messed up the command also. i'll correct it now
< nilay> zoq: it says permission denied
< zoq> nilay: can you post the command?
< nilay> sure
< nilay> scp -P 8088 -r ./example/ nilay@138.201.57.103:/.
< zoq> nilay: try scp -P 8088 -r ./example/ nilay@138.201.57.103:/home/nilay/
< nilay> zoq: yeah, i forgot /home/nilay, assumed the . directory would be that only.. it works now
< nilay> oops , sorry
< nilay> i thought i was pushing to my fork
< nilay> how do i undo this
< nilay> zoq: you there?
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#902 (master - 6d84f11 : nilayjain): The build has errored.
travis-ci has left #mlpack []
< zoq> hm, did you revert the last merge commit?
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#903 (master - fddfc18 : nilayjain): The build has errored.
travis-ci has left #mlpack []
< nilay> yeah i revert it now
< nilay> no problem
< nilay> it'll get resolved
< nilay> haha, sorry about that
< zoq> hm, okay, beware you can't just always rewrite the history, it's saver to make a commit which revertes the changes.
benchmark has joined #mlpack
benchmark has quit [Client Quit]
< nilay> zoq: btw, why did the build fail, this was building and running so nicely on remote
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#904 (master - 1f562a1 : Ryan Curtin): The build failed.
travis-ci has left #mlpack []
< nilay> zoq: so i created a PR. the last build error in which i accidently pushed was due to no arma::ind2sub in armadillo 4.1.,
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
benchmark has joined #mlpack
benchmark has quit [Client Quit]
Mathnerd314 has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#906 (master - 5b8fdce : Ryan Curtin): The build is still failing.
travis-ci has left #mlpack []
benchmark has joined #mlpack
benchmark has quit [Client Quit]
< zoq> I think I fixed the benchmark output ... let's see
< keonkim> is there a reason why categorical features are mapped to incrementing integers instead of one hot encoding?
< keonkim> I've been struggling to implement one-hot-encoder but I can't find way to fit it inside mlpack.
< keonkim> i mean way to work one-hot-encoded arrays with armadillo matrixes.
< rcurtin> keonkim: there are very few algorithms that work with categorical data at all,
< rcurtin> specifically the hoeffding tree
< rcurtin> I wrote the DatasetInfo class only for the HoeffdingTree, which handles categorical data
< rcurtin> one-hot encoding is a thing we can definitely do, but the only issue is that if you do that the dataset size can get very large
< rcurtin> couldn't you write a function like OneHotEncoder::Encode(arma::Mat<eT>& input, arma::Mat<eT>& output)? I think that would fit fine
< rcurtin> but maybe I am not understanding the problem fully
< keonkim> the problem is the representing encoded data. [a, b, c, d] can should be encoded to [1000, 0100, 0010, 0001].
< rcurtin> hm, so, since categorical features are mapped to incrementing integers, that is already how they are stored in binary
< rcurtin> I thought by one-hot encoding you meant that you would add a feature for each value
< rcurtin> so the dataset [[a, b, c, d]] maps to
< rcurtin> [[1, 0, 0, 0] [0, 1, 0, 0] [0, 0, 1, 0] [0, 0, 0, 1]]
< keonkim> oh yes I meant that
< keonkim> i tried with array of char to test. but not with armadillo.
< rcurtin> let the user specify... they may have datasets with some numeric features and some categorical features they want to one-hot encode
< rcurtin> and since Armadillo does not allow different data types inside of a matrix, we would want to just use 'double' there
< rcurtin> but if the function for one hot encoding is templated and accepts ElemType as a parameter, then the user can use uint8_t if they want
< keonkim> hm, I don't think I understood correctly.
< keonkim> how should I fit [[1, 0, 0, 0] [0, 1, 0, 0] [0, 0, 1, 0] [0, 0, 0, 1]] inside Mat<double> ?
nilay has quit [Ping timeout: 250 seconds]
< rcurtin> do you mean, how should you represent the numbers? you can use insert_cols() to add the columns in the right place
< rcurtin> and for the values you can just cast 0 and 1 to their double representations
< keonkim> oh I get what you mean
< keonkim> I was wondering how to use that with the original matrix.
nilay has joined #mlpack
< nilay> rcurtin: sorry about the force push, i just came to know about it now.
< rcurtin> it's okay, fortunately this time it was easy to fix
< rcurtin> zoq and I looked into the permissions that github allows, and it turns out you can disable force pushes for repositories, so we went ahead and did that
< rcurtin> since ideally nobody should be force pushing anyway, so this should help prevent accidents :)
< keonkim> Hmm.. the original matrix will still hold the incrementing integers... I will think more about it.
< nilay> i was trying to undo my push, and well didn't what manifestations the commands i used could have
< nilay> didn't know*
< rcurtin> it's okay, git is a complex tool and takes a long time to learn fully :)
< rcurtin> keonkim: I'm still not sure what you mean... if you want to operate in place, you can add the new features with insert_rows() (sorry, I misspoke earlier, it should not be insert_cols()), and then you can remove the original categorical feature with remove_rows()
< keonkim> rcurtin: ok
< rcurtin> maybe I have overlooked something, I hope what I wrote is helpful, but it's possible I am not understanding the actual problem
< keonkim> I think I was overthinking :p its clear to me now.
< rcurtin> ok, glad I could help :)
< keonkim> rcurtin: I have another bigger problem. Currently, when there is a missing variable inside what is supposed to be a number feature, the missing variable is converted to 0 and DatasetInfo changes it to categorical feature.
< keonkim> so it becomes impossible to track missing value after mapping.
< keonkim> we talked about how to redesign it, but I cannot come up with a new strategy :(
< rcurtin> the issue is, what do we take to represent a missing variable?
< rcurtin> sometimes this can be the string "NULL", sometimes this can just be a lack of anything (like "5, , 7" in a CSV)
< rcurtin> so I think the user needs to be able to specify what they consider to be a missing value
< rcurtin> I suppose we could modify the behavior of DatasetInfo to map certain strings not to categorical features but instead to a specifically chosen value to represent missing values (like NaN for doubles)
< rcurtin> or actually I guess that modification would be for data::Load()
< keonkim> maybe specifying it while loading can make it work
< keonkim> yup and I was thinking while loading the data::Load can pass the specified missing variable to MapString to make it NaN.
< keonkim> I tested with just "".
< rcurtin> let's see what tham thinks, I wonder if DatasetInfo is the right thing to use for the imputer
< rcurtin> I think if we just modified data::Load() to have two more parameters, like a string (or set of strings) that should map to a certain value, and then that value (used to represent missing values)
< rcurtin> then you could just load a matrix that did no mapping except, e.g., "" to NaN
< rcurtin> then your imputer functions are easy since they just need to look for NaN
< rcurtin> I dunno, do you think that would work? the issue with DatasetInfo is that it is made for encoding categorical features, but if you try to encode NaNs in a feature that's mostly doubles, then it will end up mapping all of the doubles
< rcurtin> and that could be a huge number of values to map, so it would be very slow
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
< keonkim> hmm that way data::Load should always take double type matrix. or provide different strategy for integer matrix right?
< rcurtin> yeah, if the user can specify the value that a missing value should be mapped to, it is no problem
< rcurtin> I think, unfortunately, that our loading needs are becoming too complex to keep using Armadillo's load functionality
< rcurtin> and instead maybe we will have to switch to using boost::spirit or something like that, to handle situations like this
< rcurtin> I don't like maintaining our own loading code, but maybe there is no alternative here
< rcurtin> keonkim: your tutorial for VS2015 is really, really nice! do you mind if I link to it in the mlpack docs and wiki?
< keonkim> rcurtin: thank you :) and sure I don't mind
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#907 (master - da7a2c0 : Ryan Curtin): The build was fixed.
travis-ci has left #mlpack []
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#908 (master - 56371c8 : Ryan Curtin): The build was fixed.
travis-ci has left #mlpack []
nilay has quit [Ping timeout: 250 seconds]
< rcurtin> zoq: tham: did I miss any C++11 features we are using? https://github.com/mlpack/mlpack/commit/20f8bc08afb6f4a445d07dc95896625fde552507
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#909 (master - 8068530 : Ryan Curtin): The build was fixed.
travis-ci has left #mlpack []
< zoq> rcurtin: cxx_nullptr cxx_noexcept cxx_static_assert cxx_variadic_templates
< zoq> rcurtin: I dont't think, we need nullptr, noexcept or static_assert but it's used at some points.
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#910 (master - 1dc6883 : Ryan Curtin): The build passed.
travis-ci has left #mlpack []
< rcurtin> zoq: great, thanks. ck.mitted a fix with those new feature requirementas
< rcurtin> *committed, maybe I am not the best at phone typing :)
marcosirc has joined #mlpack
benchmark has joined #mlpack
benchmark has quit [Client Quit]
< rcurtin> nice, pretty significant speedups for LSH
< lozhnikov> rcurtin: I am testing RectangleTree without the dataset variable. It seems I have to do the refactoring for all tree traversals and all pruning rules and base cases. Tell me if I am mistaken.
< mentekid> rcurtin: nice, is this from the second hash table?
< rcurtin> unfortunately I think you are right, but fortunately I think the refactoring is straoghtforward
< rcurtin> mentekid: yeah, I think so
< mentekid> cool :D
< rcurtin> er wait no it is from the hybrid search
< rcurtin> lozhnikov: basically we will need to make the Rules classes stop holding references to the dataset and instead always use node->Dataset()
< rcurtin> but ti should be as simple as that and it should be easy to test
< mentekid> ah. Still cool :)
< rcurtin> if you want I can do that refactoring but it may be a few days
< lozhnikov> I'll do the refactoring since the R tree doesn't work without that.
< rcurtin> okay, thanks. like I said earlier, the changes you have made are great, the RectangleTree code is in *much* better shape now
< rcurtin> :)
< lozhnikov> thanks:)
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#911 (master - c0d0563 : Ryan Curtin): The build passed.
travis-ci has left #mlpack []
< zoq> I think it would be a good idea to replicate armadillo's deprecated functionality, instead of adding a comment.
< rcurtin> hmm is that with __attribute(deprecated)__?
< rcurtin> if that is portable I agree we should do that but I did not know if there was a portavle solution for that
< zoq> yes, it uses __attribute(deprecated)__ https://gist.github.com/zoq/a97d0da26ce5231772e68411dfecfdab
< zoq> should work with gcc, clang and windows
< rcurtin> we could just #define deprecated arma_deprecated and use 'deprecated'
< rcurtin> maybe there is a better word, but I think 'arma_deprecated' would be weird to use directly since that's from a dependency and not mlpack itself
< zoq> yes, right
< rcurtin> do you want to apply those changes? I have a lot of other things to fix this week :)
< zoq> I guess I could do it, or I could also open a new issue. Not sure right now.