rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
<jonpsy[m]> <ShahAnwaarKhalid> "c2078893-7d08-4a64-a3d4-89a26a6c..." <- My day just got 1000000x better
<jonpsy[m]> but if it's really looking for heat, try compiling mlpack with ```-j4``` ;)
<RishabhGarg108Ri> @rcurtin I need your suggestion in one thing.
<RishabhGarg108Ri> This function compares `dataset.n_cols == labels.n_elem`. Since we are passing labels as a matrix it is causing an error because our labels is a matrix having two rows.
<RishabhGarg108Ri> So, would it make sense to change `dataset.n_cols == labels.n_elem` to `dataset.n_cols == labels.n_cols` or shall I try to change these checks in `DecisionTreeRegressor::Train()` ?
<RishabhGarg108Ri> Is there any case in mlpack other than xgboost where labels are matrix?
<rcurtin[m]> Shah Anwaar Khalid: haha, I worked hard to train my cats to stay off the keyboard 😃
<rcurtin[m]> jjb: able to join the meeting with Nippun this morning?
<rcurtin[m]> RishabhGarg108 (RishabhGarg108): I think it's just fine to use `labels.n_cols` for the check; responses should at least be a row vector anyway, meaning that the number of columns should be the same as the data
<rcurtin[m]> I don't think there is currently any other case than XGBoost where the labels/responses are a matrix 👍️
<RishabhGarg108Ri> Ok. Thanks!
<RishabhGarg108Ri> @ryan:ratml.org A couple of days back, we were discussing about the optimal value for `MultipleRandomDimensionSelect` for regression case. Since we are implementing `XGBoostTreeRegressor` should we worry about it now or is it fine?
<heisenbuugGopiMT> @rcurtin what is it comparing exactly?
<heisenbuugGopiMT> When I am printing the values from matrix in I am getting the right values, but somehow this test is failing.
<heisenbuugGopiMT> Any idea why?
<heisenbuugGopiMT> Also about this [comment](https://github.com/mlpack/mlpack/pull/2942#issuecomment-892262144)
<shrit[m]> heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you did not hear from me, do not hesitate in pinging me 👍️
<shrit[m]> * heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you do not hear from me, do not hesitate in pinging me 👍️
<rcurtin[m]> heisenbuug (Gopi M Tatiraju): that's just checking that the first dimension of the data has the values `{1, 2, 3, 4, 5}`
<heisenbuugGopiMT> Okay, I am looking for `canParse` now, I will update on that soon.
<shrit[m]> Perfect,
<shrit[m]> I have exported ensmallen to nuget using vcpkg from Linux
<shrit[m]> I have now a .nupkg file on my machine, does anyone have an idea how to verify if the file is correctly exported before submitting it?
<zoq[m]> There is not really a submitting process, it's basically create a nuget accoutn and upload the file, you can always remove the package afterwards.
<zoq[m]> So I would just do that and trigger the CI.
<swaingotnochill[> zoq if I load a gan model, how can i access the generator and discriminator? I can't seem to find any method for that.
<swaingotnochill[> ```data::Save("./saved_csv_files/ouput_mnist.csv", generatedData, false, false); ```
<swaingotnochill[> ```data::Load("./saved_models/ganMnist.bin", "ganMnist", ganModel)```
<say4n[m]> swaingotnochill: ^
<swaingotnochill[> say4n[m]: I can use the generator in the training file itself where I created my GAN. But, I am not able to access it after loading a trained GAN model.
<say4n[m]> <swaingotnochill[> "```data::Load("./saved_models/ga" <- Taking a wild shot but does ganModel.Generator() not work?
<swaingotnochill[> say4n[m]: sadly no...
<say4n[m]> Ah :/
<heisenbuugGopiMT> @shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse().)
<heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse())
<heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<swaingotnochill[> swaingotnochill[: I am not sure how it is serialized. It might be one of the reason I can't access it...Or I am just wrong from the start 😞
<heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation page
<heisenbuugGopiMT> https://theboostcpplibraries.com/boost.spiritapi#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()
<heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<heisenbuugGopiMT> * @shrit:matrix.org I found this on boost's documentation page
<heisenbuugGopiMT> https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse().
<heisenbuugGopiMT> If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<shrit[m]> heisenbuug (Gopi M Tatiraju): which case you are referring too?
<heisenbuugGopiMT> Regarding `canParse`
<heisenbuugGopiMT> `canParse = qi::parse(...)`
<swaingotnochill[> (edited) zoq if ... => zoq Kartik K. Khullar if ...
<swaingotnochill[> (edited) ... Khullar if ... => ... Khullar if ...
<swaingotnochill[> (edited) zoq Kartik K. Khullar if ... => zoq if ...
<shrit[m]> heisenbuug (Gopi M Tatiraju): Yeah, agreed, In this case you can remove the code related to `canParse()`.
<heisenbuugGopiMT> Did you saw my comment on github?
<heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT> Other than that we just need to handle the, Load with header case.
<heisenbuugGopiMT> I am working on it now
<heisenbuugGopiMT> * @shirit Did you saw my comment on github?
<heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT> * @shrit Did you saw my comment on github?
<heisenbuugGopiMT> So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT> Somehow we didn't get that `\t` error on CI.
<shrit[m]> I saw your comment
<shrit[m]> This is related directly to parsing the `\` and any letter comes behind
<heisenbuugGopiMT> Yea, I started handling the `load with header case`
<shrit[m]> I am still not sure how these cases should be handled, because imagine you have a dataset with `\n` then you need to start a new line right?
<shrit[m]> I mean does exist a dataset that comes with `\t` I have never seen that
<shrit[m]> Did you try that the same way on the mlpack master ? by using a dataset with a \t inside and see how it is mapped
<rcurtin[m]> shrit: that is pretty common, they are called TSVs (tab separated); I think the default SQL export is typically TSV
<shrit[m]> Yeah I know about tsv, but in this case, we are having an example like this `4, 4, \t, 3`
<shrit[m]> in which the `\t` is a token rather than a delimiter
<heisenbuugGopiMT> Yea, Keon's comment mentions that `\t` and `""` should be mapped to the same value.
<heisenbuugGopiMT> That's why the failing test cases said that there are 3 mapping as it was mapping `\t` and `""` differently
<heisenbuugGopiMT> but we should only have 2 mappings.
<rcurtin[m]> ok, sorry, I did not read the context of the discussion 😃
<rcurtin[m]> I think Keon's idea was that once whitespace (which includes tab characters) are removed, both things should be empty strings
<rcurtin[m]> (and thus should map to the same value)
<heisenbuugGopiMT> Yea, and that's why tests are designed around those implementation.
<rcurtin[m]> 👍️ I dunno if my comments are helpful here, sorry if I am distracting the discussion 😃 I can at least agree with Omar that I have never seen a `\t` as a *token* in a dataset, only a delimiter
<heisenbuugGopiMT> If that's the case maybe then should we consider changing that in the case?
<heisenbuugGopiMT> * If that's the case maybe then should we consider changing that in the tests?
<rcurtin[m]> I dunno, I feel like if a user passes lines `4, 4, \tHello\t, 3` and `4, 4, Hello, 3`, then that `Hello` should be seen as the same in both cases
<rcurtin[m]> if you are stripping whitespace off the front and off the back using `std::isspace()`, that should remove both spaces and tab characters, so I think that should work fine?
<heisenbuugGopiMT> Okay, let me check if that is working in our case.
<heisenbuugGopiMT> @ryan:ratml.org @shrit:matrix.org they are getting mapped differently.
<shrit[m]> what is the ouput your are getting?
<shrit[m]> how they are mapped?
<heisenbuugGopiMT> Hello
<heisenbuugGopiMT> \tHello\t
<heisenbuugGopiMT> * 1 Hello
<heisenbuugGopiMT> 2 \tHello\t
<shrit[m]> are you getting this with the master version?
<heisenbuugGopiMT> Yes
<heisenbuugGopiMT> On both, master and new-parser
<rcurtin[m]> Are you sure you have an actual tab character in the input and not just the two-character string "\t"?
<heisenbuugGopiMT> data file
<heisenbuugGopiMT> ```
<rcurtin[m]> yeah, I think you need to replace `\t` with an actual tab character
<rcurtin[m]> are you sure that's a tab character? at least in the chat it looks like four spaces (but maybe on your system it is a tab character)
<rcurtin[m]> some editors will put in several spaces when you hit the `tab` key
<heisenbuugGopiMT> I used tab key
<heisenbuugGopiMT> to create those spaces
<heisenbuugGopiMT> Pressed it once
<heisenbuugGopiMT> New `csv-parser` is also mapping them same.
<zoq[m]> <swaingotnochill[> "I am not sure how it is serializ" <- I guess you are using the GAN class? If that is the case `model.Generator()` should return the Generator.
<zoq[m]> zoq[m]: Checked the serialization as well - https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/gan/gan_impl.hpp#L489 looks correct.
<zoq[m]> zoq[m]: Do you think you can share the code to reproduce the issue?
<shrit[m]> heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you tab the tab key
<shrit[m]> * heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you hit the tab key
<heisenbuugGopiMT> Yea, I had it on 4 spaces in my vscode
<heisenbuugGopiMT> I opened the csv file in vscode
<shrit[m]> Would you use another editor such as vim
<heisenbuugGopiMT> okayy
<zoq[m]> <ShahAnwaarKhalid> "zoq: Did you get a chance to..." <- Yes, we are looking into it as part of https://github.com/mlpack/mlpack/pull/3007 as well, I think it's related.
<shrit[m]> in all cases I do not think they should be mapped differently either in the tab was 2, 4 or 6 spaces
<heisenbuugGopiMT> Yea...
<heisenbuugGopiMT> The point is how does `getline()` interprets it.
<heisenbuugGopiMT> If we have `\t` is csv file so when we `getline` it does it replaces that with spaces?
<heisenbuugGopiMT> If yes, then trim function is handling it, but if it keeps it as `\t` as first 2 chars of the token then we need to remove them ourselves when we are using it to map.
<rcurtin[m]> `\t` is not two characters; we are writing it here with two characters, but it is just *one* character that represents a tab
<rcurtin[m]> specifically, a tab is represented as ASCII character 9 (e.g. `0x09` are the byte values used to represent that character): https://www.asciitable.com/
<rcurtin[m]> if someone actually passes *two* characters, a backslash and a `t`, that is not a tab
<rcurtin[m]> note that if I wanted to write a backslash followed by a `t` in a C/C++ program, I would have to write it like this:
<rcurtin[m]> `std::string s("\\t")`
<heisenbuugGopiMT> Oh okay, my bad...
<heisenbuugGopiMT> `\t` is a escape char itself
<heisenbuugGopiMT> Not 2 chars
<rcurtin[m]> right, exactly 👍️
<heisenbuugGopiMT> I will do one thing, I will write a file using C++ and I will use `\t` there and will try to load the same file.
<heisenbuugGopiMT> That should clear everything I think?
<rcurtin[m]> that should work, I think 👍️
<heisenbuugGopiMT> I will update ASAP
<heisenbuugGopiMT> It is throwing an exception in `mlpack-master`
<shrit[m]> Would you show the C++ code
<shrit[m]> It is passing the tests, so I do not know how it might throwing an exception
<heisenbuugGopiMT> For writing out a csv file?
<shrit[m]> Yes
<shrit[m]> in this case you should use the \ before hello
<shrit[m]> myfile << "3,\"Hello\",4,\n";
<heisenbuugGopiMT> Okay, I will change that and see
<shrit[m]> * myfile << "3,\"Hello\",4,\n";
<shrit[m]> You should differentiate the C++ strings and the dataset,
<shrit[m]> myfile << "3,\\"Hello\\",4,\n";
<shrit[m]> this is for the C++ string,
<shrit[m]> When this is sent to your file then it should be written as follows:
<shrit[m]> `3,"Hello", 4,
<shrit[m]> * `3,"Hello", 4,`
<heisenbuugGopiMT> Yea now getting it like that
<shrit[m]> Perfect
<shrit[m]> Now, are you only getting errors with the tests ? Did you try your implementation on an external dataset?
<heisenbuugGopiMT> Gettting mapped differently in `mlpack-master`
<shrit[m]> what are these numbers, which one are related to the mapped strings?
<heisenbuugGopiMT> 0 1 2 2
<shrit[m]> 0 is hello with the tab, 1 is hello without tab right?
<shrit[m]> I see
<heisenbuugGopiMT> yup yup
<heisenbuugGopiMT> I should add this in that issue
<heisenbuugGopiMT> I renamed it as `Feature List`
<shrit[m]> OK, in the tests are they mapped differently or the same?
<heisenbuugGopiMT> Let me check the test file
<heisenbuugGopiMT> In the failing one we had only `\t`
<shrit[m]> no, I mean the mlpack-master, is the tests are mapping them differently or not?
<heisenbuugGopiMT> Yes they are
<heisenbuugGopiMT> I tested on `mlpack-master` only
<shrit[m]> Okay, because you said they are mapped the same at minute 00:10
<shrit[m]> I want to be sure 😂
<heisenbuugGopiMT> Sorry my bad, I think I confused it with something
<shrit[m]> okay, so the mlpack-master is mapping them differently
<shrit[m]> however you implementation is mapping them the same
<shrit[m]> so this particular test is failling in the new parser
<heisenbuugGopiMT> I need to check on that one, my system hanged, give me 2 mins.
<heisenbuugGopiMT> Even my implementation is mapping them differently
<heisenbuugGopiMT> It's not...
<heisenbuugGopiMT> No tests are failing
<heisenbuugGopiMT> For some reason I got that `\t` error, which was Leon's hard test it mapped `\t` as \t on my system, but it should be mapped to `""`
<shrit[m]> So it is passing the keon test?
<heisenbuugGopiMT> Yuppp
<shrit[m]> Perfect then
<heisenbuugGopiMT> Yea, see on CI only with header case is failing...
<heisenbuugGopiMT> I need to rewrite `arma`'s implementation
<heisenbuugGopiMT> As before this we were using that only to load with header
<shrit[m]> I believe the \\" \\" tests are passing now
<heisenbuugGopiMT> Yea, it would have happened yesterday only but I made a small mistake in trim fucntion
<shrit[m]> 👍️
<shrit[m]> So the new parser implementation should be working perfectly now
<heisenbuugGopiMT> You plan staying up all night?
<shrit[m]> nope,
<shrit[m]> I think you should go sleep
<heisenbuugGopiMT> Oh, it's already 5:00AM here
<heisenbuugGopiMT> 😂😂
<shrit[m]> oh, you are like me synchronized with the west 👍️
<heisenbuugGopiMT> Yea, I will make sure all tests are passing before I sleep...
<shrit[m]> Perfect, do not hesitate in pushing the modification, I will review everything tomorrow 👍️
<heisenbuugGopiMT> Okay...
<shrit[m]> Great, what a pleasure to celebrate the removal of boost spirit before the weekend.
<heisenbuugGopiMT> Yupp
<heisenbuugGopiMT> It's gonna be fun
<rcurtin[m]> that is super exciting! 💯! If you want me to leave a review just let me know when it's ready. But be warned I might slow down the merge until after the weekend, I always have lots of comments 😃
<heisenbuugGopiMT> Yea, it's fine...
<heisenbuugGopiMT> It's gonna be a busy week...
<shrit[m]> Yeah, for the merge it might take more time, there are a couple of things to handle
<rcurtin[m]> 👍️ still it is a huge accomplishment just to get everything working! awesome job
<shrit[m]> Agreed 🚀
<heisenbuugGopiMT> It was amazing working on this, and you guys are always available to clear doubts and issues...
<heisenbuugGopiMT> And I will keep working on this...
<heisenbuugGopiMT> You can only do by learning and Open-Source contribution is the best way to learn and explore things...