#mlpack on 2014-06-10 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:03 sumedh_ has joined #mlpack

02:07 sumedhghaisas has quit [Ping timeout: 265 seconds]

02:52 < sumedh_> naywhayare: Hey ryan, have time??

03:32 < naywhayare> sumedh_: sure, go ahead

03:33 < sumedh_> Ohh sorry I solved it... There was some error in CMake file I was trying to create...

03:34 < sumedh_> By the way I ran svn commit yesterday...

03:34 < sumedh_> It showed me lines like sending files and all...

03:35 < sumedh_> I have to run svn up to update the files online right??

03:41 < naywhayare> your commit went through fine

03:41 < naywhayare> you should be doing svn up regularly anyway, because we have a lot of people committing to the repo

03:41 < sumedh_> I think I have made a mistake in amf.hpp :(

03:42 < sumedh_> not a big one...

03:42 < sumedh_> did my commit build okay??

03:44 < jenkins-mlpack> Starting build #1939 for job mlpack - svn checkin test (previous build: SUCCESS)

03:44 < naywhayare> for some reason jenkins didn't start the job

03:44 < naywhayare> so I just started it now

03:44 < sumedh_> ohh.... can you terminate it???

03:45 < sumedh_> I will make a quick fix...

03:45 Anand_ has joined #mlpack

03:46 < jenkins-mlpack> Project mlpack - svn checkin test build #1939: FAILURE in 2 min 11 sec: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1939/

03:46 < jenkins-mlpack> sumedhghaisas: * Modified AMF module so that now it uses tolerance checking rather

03:46 < jenkins-mlpack> than minResidue checking

03:46 < jenkins-mlpack> * Added SVD batch learning

03:47 < sumedh_> naywhayare: there is a mistake in amf.hpp

03:47 < sumedh_> in include files ...

03:47 < sumedh_> prefix "mlpack/methods" have to be added... and build will be fine...

03:56 < naywhayare> ok, then fix it and check that in

03:56 < naywhayare> you should always check the build before you commit anything in your own local build environment :)

03:59 < sumedh_> Okay done... I did... but I added search directory... which I forgot to remove... I sometimes do that for testing... :)

04:04 < jenkins-mlpack> Starting build #1940 for job mlpack - svn checkin test (previous build: FAILURE -- last SUCCESS #1938 9 hr 51 min ago)

04:42 sumedh_ has quit [Ping timeout: 276 seconds]

04:46 < jenkins-mlpack> Yippie, build fixed!

04:46 < jenkins-mlpack> Project mlpack - svn checkin test build #1940: FIXED in 42 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1940/

04:46 < jenkins-mlpack> sumedhghaisas: * fixed include error

06:17 Anand_ has quit [Ping timeout: 246 seconds]

09:45 Anand_ has joined #mlpack

11:05 Anand_ has quit [Ping timeout: 246 seconds]

12:26 andrewmw94 has joined #mlpack

12:50 oldbeardo has joined #mlpack

13:02 oldbeardo has quit [Ping timeout: 246 seconds]

13:03 oldbeardo has joined #mlpack

13:03 < oldbeardo> naywhayare: I tried what you suggested, but it doesn't work out

13:04 < oldbeardo> that's because most columns contain just one rating, and their relative cosines become almost 1

13:06 < oldbeardo> however, I tried out another dataset present in the trunk, namely 'test_data_3_1000.csv'

13:06 < oldbeardo> it works okay for that, and converges in just two iterations, it may be a good choice for the test

13:21 oldbeardo has quit [Quit: Page closed]

13:32 sumedh_ has joined #mlpack

14:49 Anand_ has joined #mlpack

14:52 oldbeardo has joined #mlpack

15:09 < naywhayare> oldbeardo: ok, using test_data_3_1000.csv is just fine

15:10 oldbeardo_ has joined #mlpack

15:11 < oldbeardo_> naywhayare: sorry about that

15:11 < naywhayare> oldbeardo_: no problem, did you get the message I sent?

15:12 < oldbeardo_> naywhayare: yes, I will make that test for QUIC-SVD, what about trees?

15:12 < oldbeardo_> CosineTree I mean

15:12 < naywhayare> I've been thinking about it but I don't have a complete solution yet

15:13 < oldbeardo_> okay, even the decreasing error thing won't work

15:13 < oldbeardo_> because it does not decrease strictly

15:13 oldbeardo has quit [Ping timeout: 246 seconds]

15:14 < naywhayare> alright; I will keep that in mind

15:14 < oldbeardo_> well, the actual error does, but the estimate does not

15:16 < naywhayare> ok. fortunately, for testing, we can compare with the actual error

15:18 < oldbeardo_> right, if the matrix is small enough

15:22 < oldbeardo_> naywhayare: you were also going to think about merging the two classes, any luck with that?

15:23 < naywhayare> I haven't had time to approach that yet

15:24 < oldbeardo_> okay

15:49 oldbeardo_ has quit [Quit: Page closed]

16:09 govg has joined #mlpack

16:25 govg has quit [Ping timeout: 245 seconds]

17:06 govg has joined #mlpack

17:06 govg has quit [Changing host]

17:06 govg has joined #mlpack

17:12 < Anand_> Marcus : I have added the metrics to scikit. I need your help in changing the RunMethod function as you did in mlpack

17:12 < Anand_> I have added the RunMetrics function

17:15 < Anand_> Also modified RunMethod a bit but it won't work. It needs to be changed

17:20 govg has quit [Ping timeout: 260 seconds]

17:28 Anand_ has quit [Ping timeout: 246 seconds]

17:28 < marcus_zoq> Anand: Hello, okay, the current version is on github?

17:37 oldbeardo has joined #mlpack

17:38 < oldbeardo> naywhayare: hey, just saw your mail, these tests seem quite comprehensive to me!

17:38 < oldbeardo> anyway, I will start working on this and get back in 2-3 days

17:52 < naywhayare> okay, thanks. let me know if you need any help

17:53 < oldbeardo> well, I can think of one question right now

17:54 < oldbeardo> my MGS uses the basis vectors already stored in the class object, how can I test it?

18:05 sumedh_ has quit [Ping timeout: 260 seconds]

18:07 < naywhayare> oldbeardo: yeah, I wasn't sure about that. I think maybe moving MGS to a standalone function may be a good solution

18:08 < oldbeardo> naywhayare: you mean somewhere in core math?

18:08 < naywhayare> yeah, maybe in lin_alg.hpp

18:08 < naywhayare> seems like a good fit to me

18:09 < oldbeardo> it would be, but only if there is way to remove columns from a matrix in armadillo

18:09 < oldbeardo> because the CosineTree construction involves removing basis vectors as well

18:10 < oldbeardo> and from the results of my Google search I don't think there is a way to do that

18:15 < oldbeardo> okay, sorry, that was really stupid

18:15 < oldbeardo> we can always overload

18:24 < naywhayare> sorry, I stepped out

18:24 < naywhayare> there is shed_rows(), shed_cols()

18:25 < oldbeardo> oh great, that solves it then

18:29 < oldbeardo> oh no, it does not

18:30 < oldbeardo> naywhayare: how do I keep an account of which column to remove

18:30 < oldbeardo> since removing one column will affect the indices of other basis vectors

18:31 < naywhayare> I don't see what the issue is

18:32 < naywhayare> if you move ModifiedGramSchmidt() to lin_alg.hpp and add a parameter for the existing basis, then you just call ModifiedGramSchmidt(basis, node->Centroid(), newBasisVector) in CosineTree::AddToBasis()

18:33 < oldbeardo> no, that's not where the problem is

18:33 sumedh_ has joined #mlpack

18:34 < oldbeardo> look at the CosineTree condition, there is an if condition

18:34 < oldbeardo> *CosineTree constructor

18:34 < naywhayare> oh, ok, I see what the problem is

18:35 < naywhayare> because the basis is constantly growing and shrinking in size, I think you're better off holding a std::vector<arma::vec&>

18:35 < naywhayare> to represent the basis vectors

18:36 < naywhayare> instead of an arma::mat that's constantly changing size

18:36 < naywhayare> this way adding and removing elements doesn't cost an entire resize of the basis matrix but instead just removing a reference from the std::vector

18:37 < oldbeardo> right, but should that be the default data structure we expect in MGS?

18:38 < naywhayare> in this case, I think yes

18:38 < naywhayare> so maybe it's inappropriate to put it in lin_alg.hpp, but it at least doesn't need to be part of CosineTree

18:38 < naywhayare> and as a result it can be tested separately

18:39 < oldbeardo> so where do we put it?

18:39 < naywhayare> I dunno, pick somewhere :) cosine_tree_util.hpp ?

18:40 < naywhayare> we could put it in lin_alg if you want, but you're right that std::vector<arma::vec&> is a weird thing to accept for the basis matrix and is very specific to CosineTree

18:40 < oldbeardo> wait, before that, won't we face the same problem in std::vector<arma::vec&>

18:41 < oldbeardo> and if we are looking at a particular solution, I think the existing does the job

18:41 < naywhayare> no, because you don't need the considerBasis object anymore

18:42 < oldbeardo> no, I mean the referencing

18:42 < oldbeardo> how do we know which element to remove?

18:43 < oldbeardo> the same if condition problem

18:49 < naywhayare> isn't the element to be removed always going to be the first basis vector?

18:50 < naywhayare> no, hang on, I don't think that's necessarily true because of the priority queue structure

18:51 < oldbeardo> yup, exactly

19:05 < naywhayare> I think we can make some modifications to guarantee that the element to be removed is the first basis vector

19:05 sumedh_ has quit [Ping timeout: 260 seconds]

19:07 < naywhayare> actually, maybe we can do this entirely differently

19:08 < naywhayare> you could modify CosineNode so that it also stored its associated orthonormalized basis vector

19:08 < naywhayare> then, when you call ModifiedGramSchmidt(), pass the entire CosineNodeQueue

19:08 < naywhayare> and loop over the queue, obtaining the orthonormalized basis vector for each entry in it

19:09 < oldbeardo> well, we could, but what advantage would it give over the existing implementation?

19:09 < naywhayare> speed

19:09 < naywhayare> the current implementation uses join_rows()

19:10 < naywhayare> each call to join_rows() costs an allocation and copy equal to the size of the entire basis matrix

19:10 < naywhayare> for large dimensions or large numbers of basis vectors, this can get quite costly

19:10 < oldbeardo> ahh, I didn't know it copied the whole thing

19:10 < naywhayare> yeah, unfortunately there's no other way to do it because there's never a guarantee that you can resize a block of memory to be arbitrarily larger

19:11 < naywhayare> removing the considerBasis object should provide some additional speedup

19:11 < oldbeardo> right, but that's how std::vector works right?

19:11 < oldbeardo> by resizing that is

19:11 < naywhayare> right

19:11 < naywhayare> but holding a std::vector<arma::vec&> means that the total cost doesn't depend on the dimensionality -- just the number of bases

19:11 < naywhayare> *the number of basis vectors

19:12 < naywhayare> because the vector only holds references, not the vectors themselves

19:12 < naywhayare> whereas an arma::mat has to guarantee that all of the vectors it holds are located contiguously in memory

19:12 < oldbeardo> got it

19:12 < naywhayare> we can't avoid some dependence on the number of basis vectors, though

19:12 < naywhayare> an insertion into the priority queue will have cost related to the number of basis vectors

19:13 < naywhayare> so there's no way around that. but the idea of storing an orthonormalized basis vector in a cosine node means that we only have to manage one priority queue

19:13 < naywhayare> although storing that vector in the cosine node has some additional memory cost, it shouldn't be significant since it's only one vector per node

19:14 < oldbeardo> it does not add any cost, since we are storing that vector anyway

19:15 < naywhayare> ah, you're right

19:16 < oldbeardo> thanks Ryan, I will try to complete this soon

19:16 < naywhayare> sure. I'm sorry I misunderstood the problem with my initial ideas and it took so long to come up with something better

19:17 < oldbeardo> one more thing, if we are going with this, I think it will be a good idea to retain MGS as a part of the CosineTree class

19:18 < oldbeardo> which will again make it diificult to test, wow, this is crazy

19:20 < naywhayare> what advantage do you get by retaining it as part of the class?

19:20 < naywhayare> you still need to pass the CosineNodeQueue because that's not a member of the CosineTree class

19:21 < naywhayare> so the arguments end up the same. you can leave it in the CosineTree class, that's just fine; but I don't see how that'll be more difficult to test

19:21 < naywhayare> since you should be able to just make some fake CosineNodeQueue that holds lots of fake CosineNode objects that hold random basis vectors, then pass that to the MGS function

19:21 < oldbeardo> yeah, that can be done

19:22 < oldbeardo> just for correction, CosineNodeQueue is a part of the CosineTree class

19:23 < naywhayare> yeah, you are right, so it'd be easier to hold MGS in the CosineTree class; but the actual queue used for building isn't a member variable, so you'll need to pass it

19:23 < oldbeardo> right, that solves it then

19:24 < oldbeardo> thanks again, I'll see you tomorrow

19:25 < naywhayare> sure, talk to you later

19:26 oldbeardo has quit [Quit: Page closed]

19:57 < naywhayare> andrewmw94: so I went reading about ending sentences with prepositions, and wow, this is a much deeper subject than I thought

19:58 < naywhayare> I think I'll stick to C++ and avoid english...

20:01 < andrewmw94> haha

20:01 < andrewmw94> As Churchill said: "This is the sort of nonsense, up with which I shall not put."

20:02 < naywhayare> hah

20:28 < andrewmw94> I think I'm going to need to add some constructors for making empty nodes for use in the split. I don't think we want them to be public, since the user shouldn't need them. I see a few possible solutions: I could just use the current constructor but pass an empty matrix. I could make the new ones, and add the SplitType class to the RectangleTree class so it can use the private constructors, or we could have a namespace for me to

20:34 < naywhayare> message clipped after "have a namespace for me to "

20:36 < naywhayare> (aside: http://tools.ietf.org/html/rfc2812 -- section 2.3: messages shall not exceed 512 characters in length; I'm surprised your client doesn't auto-wrap to multiple messages, but I don't think mine (irssi) does either)

20:43 < andrewmw94> I think I'm going to need to add some constructors for making empty nodes for use in the split. I don't think we want them to be public, since the user shouldn't need them. I see a few possible solutions: I could just use the current constructor but pass an empty matrix.

20:43 < andrewmw94> I could make the new ones, and add the SplitType class to the RectangleTree class so it can use the private constructors, or we could have a namespace for me to use and a seperate one to hide these constructors from the end user? Any thoughts on which solution is best?

20:45 < naywhayare> do you think there is a particularly compelling reason to hide that constructor from the user?

20:46 < naywhayare> or, do we have some way to sidestep the problem so that you never need to make an empty node?

20:48 < andrewmw94> I think making an empty node is going to be necessary since the tree grows upwars so to speak. I don't really see any reason it NEEDS to be hidden from the user, but I might find it confusing to have constructors that I'm not supposed to use.

20:49 < andrewmw94> I don't know if that's common in C++. In java, you can get arround this by not specifying public of private, and then it's only accessible from the same package

20:52 < naywhayare> fascinating, I didn't know that about java

20:54 < naywhayare> so the situation where this occurs is when all of the levels overflow and you need to add a new root?

20:54 < naywhayare> or does this happen when any level overflows?

20:54 < andrewmw94> when the root overflows

20:55 < andrewmw94> that's another question I had, the root can change, so the constructor won't necessarily return a pointer to the root node. Should I add another argument--a pointer that can be set to the root node--or is there a way to have the constructor return an address other than it's own?

20:56 < naywhayare> hm, I don't think we can overload the constructor's return value (or really the return value of the new operator) and even if we can that's pretty hacky

20:57 < naywhayare> right now I'm trying to think of if there is a way that we can avoid creating a node above the current node, and instead do some modifications to the current node and work with its children

20:58 < naywhayare> i.e. instead of making node A have parent newNode, make node A the parent of newNode (where newNode = A), and then modify the bounds of node A

20:58 < andrewmw94> but doesn't that mean we will have a lot of extra copying?

20:59 < andrewmw94> it might not matter since points are only inserted once though

20:59 < naywhayare> you're right; let me think about that for a second

20:59 < naywhayare> copy cost will be sizeof(RectangleTree) + children.size()

21:03 < naywhayare> if that strategy is only used for the root node, then the maximum number of copies is equal to the final depth of the tree

21:04 < andrewmw94> I think it will have to be used for all nodes

21:04 < naywhayare> for other nodes you can modify parent->children[i]

21:04 < naywhayare> where parent->children[i] is the node that you're replacing with a new node

21:10 < naywhayare> I'm trying to understand the problem better to come up with a better solution

21:10 < andrewmw94> yeah.

21:12 < naywhayare> right now, the RTreeSplit class (and the other split policies) appear to propagate changes up the tree

21:12 < andrewmw94> yeah. That's how they have it in the paper.

21:12 < naywhayare> I think we might have an easier time if, instead, the RTreeSplit policy class simply turns an overfilled node into two nodes (regardless of whether or not it's a leaf)

21:12 < naywhayare> and we let the constructor do the propagation

21:13 < naywhayare> hmm... hang on. let me think about how this might change the way RTreeSplit is designed

21:13 < andrewmw94> would that have to be done at the end though? Because some of the algorithms might get to slow if we first split nodes and then try to build the tree

21:15 < naywhayare> yeah, I was assuming the splitting is done after the point is added

21:16 < naywhayare> I was following the idea of RectangleTree::InsertPoint(), rectangle_tree_impl.hpp:94

21:17 < naywhayare> ok, I think an idea has congealed a little better

21:17 < naywhayare> the idea is, remove the call to splitNode() from line 94. a leaf node doesn't check if it needs to be split

21:18 < naywhayare> instead, its parent checks if it needs to be split, after line 109, by adding something like

21:18 < naywhayare> SplitNode(children[bestIndex])

21:18 < naywhayare> because children[bestIndex] is the only child that could possibly need to be split

21:19 < andrewmw94> what would we do for the first node that doesn't have a parent?

21:19 < naywhayare> if SplitNode() does split children[bestIndex], we find some way to add its new child to the children vector, then we return (not if we're the root -- I'll get to that momentarily; for now, assume we're not the root)

21:19 < naywhayare> ok, then let's jump ahead to the root case :)

21:20 < naywhayare> but I think to do that we sort of need to define a new SplitNode() API

21:21 < naywhayare> you can revise this if you like, but basically let's suppose we have SplitNode(a, b), where a is the node to be split; if a needs to be split, the function returns true and { a, b } are the new two nodes; otherwise, a doesn't change and b can be ignored

21:22 < naywhayare> so if we're the root node, we can call SplitNode(*this, b); if the function returned true, then we have to make a copy of *this, do children.clear(), then children.push_back(copy of *this), children.push_back(b)

21:22 < naywhayare> probably have to update the bound and other miscellany too, but I think that can be done quickly

21:22 < naywhayare> do you think this approach makes sense?

21:23 < andrewmw94> I suspect it's faster to have a pointer passed whenever we do something that could change the root, and have the pointer updated to the new root if necessary

21:24 < naywhayare> well but the thing is that the memory address of the root can't ever change; otherwise, when we type 'RectangleTree* r = new RectangleTree(dataset);', we end up with r pointing to somewhere in the middle of the tree

21:25 < naywhayare> the location in memory that corresponds to the root is set as soon as the constructor is called, unfortunately

21:26 < naywhayare> (this makes more sense when a pointer isn't used during construction: 'RectangleTree r(dataset);')

21:26 < andrewmw94> yeah. But I'm concerned with swapping nodes arround. I suspect that we would have to copy all of the child points. I currently have each leaf node storing it's child points, so there's no "centeral" matrix for storing those

21:27 < naywhayare> right, I remember that

21:28 < naywhayare> crap, my ride is here, so I have to go

21:28 < naywhayare> but I think that because you're storing a reference or pointer to the dataset, it doesn't need to be a deep copy of all the elements in the matrix

21:28 < andrewmw94> if we had something like class RTree{ RTreeNode* root};

21:28 < andrewmw94> oh. See you later

21:28 < naywhayare> I'll spend some time thinking about this and hopefully have a better solution later tonight or tomorrow

21:29 < andrewmw94> sounds good

23:30 andrewmw94 has quit [Quit: Leaving.]