#mlpack on 2021-08-04 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

02:51 <jonpsy[m]> <ShahAnwaarKhalid> "c2078893-7d08-4a64-a3d4-89a26a6c..." <- My day just got 1000000x better

02:51 <jonpsy[m]> but if it's really looking for heat, try compiling mlpack with ```-j4``` ;)

12:43 <RishabhGarg108Ri> @rcurtin I need your suggestion in one thing.

12:44 <RishabhGarg108Ri> in `DecisionTreeRegressor::Train()`, we have size checks like [this](https://github.com/mlpack/mlpack/blob/e6c8f7dec0a741deba6ba2a6c4558630b8930d11/src/mlpack/methods/decision_tree/decision_tree_regressor_impl.hpp#L439)

12:46 <RishabhGarg108Ri> This function compares `dataset.n_cols == labels.n_elem`. Since we are passing labels as a matrix it is causing an error because our labels is a matrix having two rows.

12:47 <RishabhGarg108Ri> So, would it make sense to change `dataset.n_cols == labels.n_elem` to `dataset.n_cols == labels.n_cols` or shall I try to change these checks in `DecisionTreeRegressor::Train()` ?

12:50 <RishabhGarg108Ri> Is there any case in mlpack other than xgboost where labels are matrix?

15:02 <rcurtin[m]> Shah Anwaar Khalid: haha, I worked hard to train my cats to stay off the keyboard 😃

15:02 <rcurtin[m]> jjb: able to join the meeting with Nippun this morning?

15:03 <rcurtin[m]> RishabhGarg108 (RishabhGarg108): I think it's just fine to use `labels.n_cols` for the check; responses should at least be a row vector anyway, meaning that the number of columns should be the same as the data

15:03 <rcurtin[m]> I don't think there is currently any other case than XGBoost where the labels/responses are a matrix 👍️

15:03 <RishabhGarg108Ri> Ok. Thanks!

16:23 <RishabhGarg108Ri> @ryan:ratml.org A couple of days back, we were discussing about the optimal value for `MultipleRandomDimensionSelect` for regression case. Since we are implementing `XGBoostTreeRegressor` should we worry about it now or is it fine?

16:33 <heisenbuugGopiMT> @rcurtin what is it comparing exactly?

16:33 <heisenbuugGopiMT> [Failed Test](https://dev.azure.com/mlpack/mlpack/_build/results?buildId=7075&view=logs&j=24d3abe3-ef0b-5deb-3aab-64d839de2c3c&t=b3d23cc5-b695-5043-0f03-2084bf2ff0b5&l=33)

16:34 <heisenbuugGopiMT> When I am printing the values from matrix in I am getting the right values, but somehow this test is failing.

16:34 <heisenbuugGopiMT> Any idea why?

16:35 <heisenbuugGopiMT> Also about this [comment](https://github.com/mlpack/mlpack/pull/2942#issuecomment-892262144)

16:40 <shrit[m]> heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you did not hear from me, do not hesitate in pinging me 👍️

16:40 <shrit[m]> * heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you do not hear from me, do not hesitate in pinging me 👍️

16:49 <rcurtin[m]> heisenbuug (Gopi M Tatiraju): that's just checking that the first dimension of the data has the values `{1, 2, 3, 4, 5}`

17:28 <heisenbuugGopiMT> Okay, I am looking for `canParse` now, I will update on that soon.

17:30 <shrit[m]> Perfect,

17:55 <shrit[m]> I have exported ensmallen to nuget using vcpkg from Linux

17:55 <shrit[m]> I have now a .nupkg file on my machine, does anyone have an idea how to verify if the file is correctly exported before submitting it?

18:08 <zoq[m]> There is not really a submitting process, it's basically create a nuget accoutn and upload the file, you can always remove the package afterwards.

18:08 <zoq[m]> So I would just do that and trigger the CI.

18:50 <swaingotnochill[> zoq if I load a gan model, how can i access the generator and discriminator? I can't seem to find any method for that.

18:51 <swaingotnochill[> ```data::Save("./saved_csv_files/ouput_mnist.csv", generatedData, false, false); ```

18:52 <swaingotnochill[> ```data::Load("./saved_models/ganMnist.bin", "ganMnist", ganModel)```

18:55 <say4n[m]> Line #295 of this: https://www.mlpack.org/doc/mlpack-3.2.1/doxygen/gan_8hpp_source.html ?

18:55 <say4n[m]> swaingotnochill: ^

18:57 <swaingotnochill[> say4n[m]: I can use the generator in the training file itself where I created my GAN. But, I am not able to access it after loading a trained GAN model.

18:58 <say4n[m]> <swaingotnochill[> "```data::Load("./saved_models/ga" <- Taking a wild shot but does ganModel.Generator() not work?

18:59 <swaingotnochill[> say4n[m]: sadly no...

18:59 <say4n[m]> Ah :/

19:00 <heisenbuugGopiMT> @shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse().)

19:00 <heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?

19:00 <heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse())

19:00 <heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?

19:01 <swaingotnochill[> swaingotnochill[: I am not sure how it is serialized. It might be one of the reason I can't access it...Or I am just wrong from the start 😞

19:01 <heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation page

19:01 <heisenbuugGopiMT> https://theboostcpplibraries.com/boost.spiritapi#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()

19:01 <heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?

19:06 <heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation page

19:06 <heisenbuugGopiMT> https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse().

19:06 <heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?

19:08 <shrit[m]> heisenbuug (Gopi M Tatiraju): which case you are referring too?

19:09 <heisenbuugGopiMT> Regarding `canParse`

19:09 <heisenbuugGopiMT> `canParse = qi::parse(...)`

19:10 <swaingotnochill[> (edited) zoq if ... => zoq Kartik K. Khullar if ...

19:10 <swaingotnochill[> (edited) ... Khullar if ... => ... Khullar if ...

19:10 <swaingotnochill[> (edited) zoq Kartik K. Khullar if ... => zoq if ...

21:04 <shrit[m]> heisenbuug (Gopi M Tatiraju): Yeah, agreed, In this case you can remove the code related to `canParse()`.

21:15 <heisenbuugGopiMT> Did you saw my comment on github?

21:15 <heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.

21:16 <heisenbuugGopiMT> Other than that we just need to handle the, Load with header case.

21:16 <heisenbuugGopiMT> I am working on it now

21:16 <heisenbuugGopiMT> * @shirit Did you saw my comment on github?

21:16 <heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.

21:16 <heisenbuugGopiMT> * @shrit Did you saw my comment on github?

21:16 <heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.

21:22 <heisenbuugGopiMT> Somehow we didn't get that `\t` error on CI.

21:26 <shrit[m]> I saw your comment

21:27 <shrit[m]> This is related directly to parsing the `\` and any letter comes behind

21:28 <heisenbuugGopiMT> Yea, I started handling the `load with header case`

21:29 <shrit[m]> I am still not sure how these cases should be handled, because imagine you have a dataset with `\n` then you need to start a new line right?

21:30 <shrit[m]> I mean does exist a dataset that comes with `\t` I have never seen that

21:30 <shrit[m]> Did you try that the same way on the mlpack master ? by using a dataset with a \t inside and see how it is mapped

21:30 <rcurtin[m]> shrit: that is pretty common, they are called TSVs (tab separated); I think the default SQL export is typically TSV

21:31 <shrit[m]> Yeah I know about tsv, but in this case, we are having an example like this `4, 4, \t, 3`

21:32 <shrit[m]> in which the `\t` is a token rather than a delimiter

21:32 <heisenbuugGopiMT> Yea, Keon's comment mentions that `\t` and `""` should be mapped to the same value.

21:32 <heisenbuugGopiMT> That's why the failing test cases said that there are 3 mapping as it was mapping `\t` and `""` differently

21:32 <heisenbuugGopiMT> but we should only have 2 mappings.

21:33 <rcurtin[m]> ok, sorry, I did not read the context of the discussion 😃

21:33 <rcurtin[m]> I think Keon's idea was that once whitespace (which includes tab characters) are removed, both things should be empty strings

21:34 <rcurtin[m]> (and thus should map to the same value)

21:34 <heisenbuugGopiMT> Yea, and that's why tests are designed around those implementation.

21:35 <rcurtin[m]> 👍️ I dunno if my comments are helpful here, sorry if I am distracting the discussion 😃 I can at least agree with Omar that I have never seen a `\t` as a *token* in a dataset, only a delimiter

21:36 <heisenbuugGopiMT> If that's the case maybe then should we consider changing that in the case?

21:36 <heisenbuugGopiMT> * If that's the case maybe then should we consider changing that in the tests?

21:38 <rcurtin[m]> I dunno, I feel like if a user passes lines `4, 4, \tHello\t, 3` and `4, 4, Hello, 3`, then that `Hello` should be seen as the same in both cases

21:39 <rcurtin[m]> if you are stripping whitespace off the front and off the back using `std::isspace()`, that should remove both spaces and tab characters, so I think that should work fine?

21:39 <heisenbuugGopiMT> Okay, let me check if that is working in our case.

21:47 <heisenbuugGopiMT> @ryan:ratml.org @shrit:matrix.org they are getting mapped differently.

21:47 <shrit[m]> what is the ouput your are getting?

21:48 <shrit[m]> how they are mapped?

21:50 <heisenbuugGopiMT> Hello

21:50 <heisenbuugGopiMT> \tHello\t

21:50 <heisenbuugGopiMT> * 1 Hello

21:50 <heisenbuugGopiMT> 2 \tHello\t

21:55 <shrit[m]> are you getting this with the master version?

22:00 <heisenbuugGopiMT> Yes

22:00 <heisenbuugGopiMT> On both, master and new-parser

22:07 <rcurtin[m]> Are you sure you have an actual tab character in the input and not just the two-character string "\t"?

22:09 <heisenbuugGopiMT> data file

22:09 <heisenbuugGopiMT> ```

22:09 * heisenbuugGopiMT < https://libera.ems.host/_matrix/media/r0/download/libera.chat/df882a87f99515a51d8183218885fd2f41744aab/message.txt >

22:09 <rcurtin[m]> yeah, I think you need to replace `\t` with an actual tab character

22:10 * heisenbuugGopiMT < https://libera.ems.host/_matrix/media/r0/download/libera.chat/ebb71b818ae7073a433afe6b5a8a7e9b6265755e/message.txt >

22:11 <rcurtin[m]> are you sure that's a tab character? at least in the chat it looks like four spaces (but maybe on your system it is a tab character)

22:11 <rcurtin[m]> some editors will put in several spaces when you hit the `tab` key

22:11 <heisenbuugGopiMT> I used tab key

22:11 <heisenbuugGopiMT> to create those spaces

22:11 <heisenbuugGopiMT> Pressed it once

22:13 <heisenbuugGopiMT> New `csv-parser` is also mapping them same.

22:15 <zoq[m]> <swaingotnochill[> "I am not sure how it is serializ" <- I guess you are using the GAN class? If that is the case `model.Generator()` should return the Generator.

22:15 <zoq[m]> zoq[m]: Checked the serialization as well - https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/gan/gan_impl.hpp#L489 looks correct.

22:15 <zoq[m]> zoq[m]: Do you think you can share the code to reproduce the issue?

22:16 <shrit[m]> heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you tab the tab key

22:16 <shrit[m]> * heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you hit the tab key

22:16 <heisenbuugGopiMT> Yea, I had it on 4 spaces in my vscode

22:16 <heisenbuugGopiMT> I opened the csv file in vscode

22:17 <shrit[m]> Would you use another editor such as vim

22:18 <heisenbuugGopiMT> okayy

22:18 <zoq[m]> <ShahAnwaarKhalid> "zoq: Did you get a chance to..." <- Yes, we are looking into it as part of https://github.com/mlpack/mlpack/pull/3007 as well, I think it's related.

22:18 <shrit[m]> in all cases I do not think they should be mapped differently either in the tab was 2, 4 or 6 spaces

22:19 <heisenbuugGopiMT> Yea...

22:19 <heisenbuugGopiMT> The point is how does `getline()` interprets it.

22:20 <heisenbuugGopiMT> If we have `\t` is csv file so when we `getline` it does it replaces that with spaces?

22:22 <heisenbuugGopiMT> If yes, then trim function is handling it, but if it keeps it as `\t` as first 2 chars of the token then we need to remove them ourselves when we are using it to map.

22:22 <rcurtin[m]> `\t` is not two characters; we are writing it here with two characters, but it is just *one* character that represents a tab

22:23 <rcurtin[m]> specifically, a tab is represented as ASCII character 9 (e.g. `0x09` are the byte values used to represent that character): https://www.asciitable.com/

22:23 <rcurtin[m]> if someone actually passes *two* characters, a backslash and a `t`, that is not a tab

22:24 <rcurtin[m]> note that if I wanted to write a backslash followed by a `t` in a C/C++ program, I would have to write it like this:

22:24 <rcurtin[m]> `std::string s("\\t")`

22:24 <heisenbuugGopiMT> Oh okay, my bad...

22:24 <heisenbuugGopiMT> `\t` is a escape char itself

22:24 <heisenbuugGopiMT> Not 2 chars

22:24 <rcurtin[m]> right, exactly 👍️

22:25 <heisenbuugGopiMT> I will do one thing, I will write a file using C++ and I will use `\t` there and will try to load the same file.

22:25 <heisenbuugGopiMT> That should clear everything I think?

22:27 <rcurtin[m]> that should work, I think 👍️

22:27 <heisenbuugGopiMT> I will update ASAP

22:38 <heisenbuugGopiMT> It is throwing an exception in `mlpack-master`

22:38 <shrit[m]> Would you show the C++ code

22:38 <shrit[m]> It is passing the tests, so I do not know how it might throwing an exception

22:38 <heisenbuugGopiMT> For writing out a csv file?

22:39 <shrit[m]> Yes

22:39 * heisenbuugGopiMT < https://libera.ems.host/_matrix/media/r0/download/libera.chat/470eb5b520494fbb7bd9b3aef36eb0e560ac138d/message.txt >

22:40 <shrit[m]> in this case you should use the \ before hello

22:40 <shrit[m]> myfile << "3,\"Hello\",4,\n";

22:40 <heisenbuugGopiMT> Okay, I will change that and see

22:40 <shrit[m]> * myfile << "3,\"Hello\",4,\n";

22:41 <shrit[m]> You should differentiate the C++ strings and the dataset,

22:42 <shrit[m]> myfile << "3,\\"Hello\\",4,\n";

22:42 <shrit[m]> this is for the C++ string,

22:43 <shrit[m]> When this is sent to your file then it should be written as follows:

22:43 <shrit[m]> `3,"Hello", 4,

22:44 <shrit[m]> * `3,"Hello", 4,`

22:45 <heisenbuugGopiMT> Yea now getting it like that

22:48 <shrit[m]> Perfect

22:49 <shrit[m]> Now, are you only getting errors with the tests ? Did you try your implementation on an external dataset?

22:49 * heisenbuugGopiMT < https://libera.ems.host/_matrix/media/r0/download/libera.chat/22b1db52685bbee7f10576d8cb4c224adccb87a4/message.txt >

22:49 <heisenbuugGopiMT> Gettting mapped differently in `mlpack-master`

22:50 <shrit[m]> what are these numbers, which one are related to the mapped strings?

22:50 <heisenbuugGopiMT> 0 1 2 2

22:51 <shrit[m]> 0 is hello with the tab, 1 is hello without tab right?

22:51 * heisenbuugGopiMT < https://libera.ems.host/_matrix/media/r0/download/libera.chat/2f260f3b462623ad267649ade651d02acbd871c0/message.txt >

22:51 <shrit[m]> I see

22:51 <heisenbuugGopiMT> yup yup

22:52 <heisenbuugGopiMT> I should add this in that issue

22:52 <heisenbuugGopiMT> I renamed it as `Feature List`

22:52 <shrit[m]> OK, in the tests are they mapped differently or the same?

22:53 <heisenbuugGopiMT> Let me check the test file

22:54 <heisenbuugGopiMT> In the failing one we had only `\t`

22:55 <shrit[m]> no, I mean the mlpack-master, is the tests are mapping them differently or not?

22:56 <heisenbuugGopiMT> Yes they are

22:56 <heisenbuugGopiMT> I tested on `mlpack-master` only

22:58 <shrit[m]> Okay, because you said they are mapped the same at minute 00:10

22:58 <shrit[m]> I want to be sure 😂

22:59 <heisenbuugGopiMT> Sorry my bad, I think I confused it with something

23:00 <shrit[m]> okay, so the mlpack-master is mapping them differently

23:00 <shrit[m]> however you implementation is mapping them the same

23:00 <shrit[m]> so this particular test is failling in the new parser

23:00 <heisenbuugGopiMT> I need to check on that one, my system hanged, give me 2 mins.

23:04 <heisenbuugGopiMT> Even my implementation is mapping them differently

23:05 <heisenbuugGopiMT> It's not...

23:05 <heisenbuugGopiMT> No tests are failing

23:06 <heisenbuugGopiMT> For some reason I got that `\t` error, which was Leon's hard test it mapped `\t` as \t on my system, but it should be mapped to `""`

23:06 <shrit[m]> So it is passing the keon test?

23:07 <heisenbuugGopiMT> Yuppp

23:07 <shrit[m]> Perfect then

23:07 <heisenbuugGopiMT> Yea, see on CI only with header case is failing...

23:08 <heisenbuugGopiMT> I need to rewrite `arma`'s implementation

23:08 <heisenbuugGopiMT> As before this we were using that only to load with header

23:09 <shrit[m]> I believe the \\" \\" tests are passing now

23:10 <heisenbuugGopiMT> Yea, it would have happened yesterday only but I made a small mistake in trim fucntion

23:11 <shrit[m]> 👍️

23:11 <shrit[m]> So the new parser implementation should be working perfectly now

23:11 <heisenbuugGopiMT> You plan staying up all night?

23:12 <shrit[m]> nope,

23:12 <shrit[m]> I think you should go sleep

23:12 <heisenbuugGopiMT> Oh, it's already 5:00AM here

23:12 <heisenbuugGopiMT> 😂😂

23:13 <shrit[m]> oh, you are like me synchronized with the west 👍️

23:13 <heisenbuugGopiMT> Yea, I will make sure all tests are passing before I sleep...

23:14 <shrit[m]> Perfect, do not hesitate in pushing the modification, I will review everything tomorrow 👍️

23:17 <heisenbuugGopiMT> Okay...

23:18 <shrit[m]> Great, what a pleasure to celebrate the removal of boost spirit before the weekend.

23:23 <heisenbuugGopiMT> Yupp

23:23 <heisenbuugGopiMT> It's gonna be fun

23:37 <rcurtin[m]> that is super exciting! 💯! If you want me to leave a review just let me know when it's ready. But be warned I might slow down the merge until after the weekend, I always have lots of comments 😃

23:38 <heisenbuugGopiMT> Yea, it's fine...

23:38 <heisenbuugGopiMT> It's gonna be a busy week...

23:39 <shrit[m]> Yeah, for the merge it might take more time, there are a couple of things to handle

23:39 <rcurtin[m]> 👍️ still it is a huge accomplishment just to get everything working! awesome job

23:39 <shrit[m]> Agreed 🚀

23:40 <heisenbuugGopiMT> It was amazing working on this, and you guys are always available to clear doubts and issues...

23:42 <heisenbuugGopiMT> And I will keep working on this...

23:42 <heisenbuugGopiMT> You can only do by learning and Open-Source contribution is the best way to learn and explore things...