<RishabhGarg108Ri>
This function compares `dataset.n_cols == labels.n_elem`. Since we are passing labels as a matrix it is causing an error because our labels is a matrix having two rows.
<RishabhGarg108Ri>
So, would it make sense to change `dataset.n_cols == labels.n_elem` to `dataset.n_cols == labels.n_cols` or shall I try to change these checks in `DecisionTreeRegressor::Train()` ?
<RishabhGarg108Ri>
Is there any case in mlpack other than xgboost where labels are matrix?
<rcurtin[m]>
Shah Anwaar Khalid: haha, I worked hard to train my cats to stay off the keyboard 😃
<rcurtin[m]>
jjb: able to join the meeting with Nippun this morning?
<rcurtin[m]>
RishabhGarg108 (RishabhGarg108): I think it's just fine to use `labels.n_cols` for the check; responses should at least be a row vector anyway, meaning that the number of columns should be the same as the data
<rcurtin[m]>
I don't think there is currently any other case than XGBoost where the labels/responses are a matrix 👍️
<RishabhGarg108Ri>
Ok. Thanks!
<RishabhGarg108Ri>
@ryan:ratml.org A couple of days back, we were discussing about the optimal value for `MultipleRandomDimensionSelect` for regression case. Since we are implementing `XGBoostTreeRegressor` should we worry about it now or is it fine?
<heisenbuugGopiMT>
@rcurtin what is it comparing exactly?
<shrit[m]>
heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you did not hear from me, do not hesitate in pinging me 👍️
<shrit[m]>
* heisenbuug (Gopi M Tatiraju): I will give it a look soon, if you do not hear from me, do not hesitate in pinging me 👍️
<rcurtin[m]>
heisenbuug (Gopi M Tatiraju): that's just checking that the first dimension of the data has the values `{1, 2, 3, 4, 5}`
<heisenbuugGopiMT>
Okay, I am looking for `canParse` now, I will update on that soon.
<shrit[m]>
Perfect,
<shrit[m]>
I have exported ensmallen to nuget using vcpkg from Linux
<shrit[m]>
I have now a .nupkg file on my machine, does anyone have an idea how to verify if the file is correctly exported before submitting it?
<zoq[m]>
There is not really a submitting process, it's basically create a nuget accoutn and upload the file, you can always remove the package afterwards.
<zoq[m]>
So I would just do that and trigger the CI.
<swaingotnochill[>
zoq if I load a gan model, how can i access the generator and discriminator? I can't seem to find any method for that.
<swaingotnochill[>
say4n[m]: I can use the generator in the training file itself where I created my GAN. But, I am not able to access it after loading a trained GAN model.
<say4n[m]>
<swaingotnochill[> "```data::Load("./saved_models/ga" <- Taking a wild shot but does ganModel.Generator() not work?
<swaingotnochill[>
say4n[m]: sadly no...
<say4n[m]>
Ah :/
<heisenbuugGopiMT>
@shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse().)
<heisenbuugGopiMT>
If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<heisenbuugGopiMT>
* @shrit:matrix.org I found this on boost's documentation [page](https://theboostcpplibraries.com/boost.spirit-api#:~:text=boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse()%20does,of%20boost%3A%3Aspirit%3A%3Aqi%3A%3Aparse())
<heisenbuugGopiMT>
If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<swaingotnochill[>
swaingotnochill[: I am not sure how it is serialized. It might be one of the reason I can't access it...Or I am just wrong from the start 😞
<heisenbuugGopiMT>
* @shrit:matrix.org I found this on boost's documentation page
<heisenbuugGopiMT>
If I understand correctly if we are anyways trimming the token, we won't enter this case, right?
<shrit[m]>
heisenbuug (Gopi M Tatiraju): which case you are referring too?
<heisenbuugGopiMT>
Regarding `canParse`
<heisenbuugGopiMT>
`canParse = qi::parse(...)`
<swaingotnochill[>
(edited) zoq if ... => zoq Kartik K. Khullar if ...
<swaingotnochill[>
(edited) ... Khullar if ... => ... Khullar if ...
<swaingotnochill[>
(edited) zoq Kartik K. Khullar if ... => zoq if ...
<shrit[m]>
heisenbuug (Gopi M Tatiraju): Yeah, agreed, In this case you can remove the code related to `canParse()`.
<heisenbuugGopiMT>
Did you saw my comment on github?
<heisenbuugGopiMT>
So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT>
Other than that we just need to handle the, Load with header case.
<heisenbuugGopiMT>
I am working on it now
<heisenbuugGopiMT>
* @shirit Did you saw my comment on github?
<heisenbuugGopiMT>
So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT>
* @shrit Did you saw my comment on github?
<heisenbuugGopiMT>
So I tried doing that and it's passing `keon's Harder Test as well`.
<heisenbuugGopiMT>
Somehow we didn't get that `\t` error on CI.
<shrit[m]>
I saw your comment
<shrit[m]>
This is related directly to parsing the `\` and any letter comes behind
<heisenbuugGopiMT>
Yea, I started handling the `load with header case`
<shrit[m]>
I am still not sure how these cases should be handled, because imagine you have a dataset with `\n` then you need to start a new line right?
<shrit[m]>
I mean does exist a dataset that comes with `\t` I have never seen that
<shrit[m]>
Did you try that the same way on the mlpack master ? by using a dataset with a \t inside and see how it is mapped
<rcurtin[m]>
shrit: that is pretty common, they are called TSVs (tab separated); I think the default SQL export is typically TSV
<shrit[m]>
Yeah I know about tsv, but in this case, we are having an example like this `4, 4, \t, 3`
<shrit[m]>
in which the `\t` is a token rather than a delimiter
<heisenbuugGopiMT>
Yea, Keon's comment mentions that `\t` and `""` should be mapped to the same value.
<heisenbuugGopiMT>
That's why the failing test cases said that there are 3 mapping as it was mapping `\t` and `""` differently
<heisenbuugGopiMT>
but we should only have 2 mappings.
<rcurtin[m]>
ok, sorry, I did not read the context of the discussion 😃
<rcurtin[m]>
I think Keon's idea was that once whitespace (which includes tab characters) are removed, both things should be empty strings
<rcurtin[m]>
(and thus should map to the same value)
<heisenbuugGopiMT>
Yea, and that's why tests are designed around those implementation.
<rcurtin[m]>
👍️ I dunno if my comments are helpful here, sorry if I am distracting the discussion 😃 I can at least agree with Omar that I have never seen a `\t` as a *token* in a dataset, only a delimiter
<heisenbuugGopiMT>
If that's the case maybe then should we consider changing that in the case?
<heisenbuugGopiMT>
* If that's the case maybe then should we consider changing that in the tests?
<rcurtin[m]>
I dunno, I feel like if a user passes lines `4, 4, \tHello\t, 3` and `4, 4, Hello, 3`, then that `Hello` should be seen as the same in both cases
<rcurtin[m]>
if you are stripping whitespace off the front and off the back using `std::isspace()`, that should remove both spaces and tab characters, so I think that should work fine?
<heisenbuugGopiMT>
Okay, let me check if that is working in our case.
<heisenbuugGopiMT>
@ryan:ratml.org @shrit:matrix.org they are getting mapped differently.
<shrit[m]>
what is the ouput your are getting?
<shrit[m]>
how they are mapped?
<heisenbuugGopiMT>
Hello
<heisenbuugGopiMT>
\tHello\t
<heisenbuugGopiMT>
* 1 Hello
<heisenbuugGopiMT>
2 \tHello\t
<shrit[m]>
are you getting this with the master version?
<heisenbuugGopiMT>
Yes
<heisenbuugGopiMT>
On both, master and new-parser
<rcurtin[m]>
Are you sure you have an actual tab character in the input and not just the two-character string "\t"?
<rcurtin[m]>
are you sure that's a tab character? at least in the chat it looks like four spaces (but maybe on your system it is a tab character)
<rcurtin[m]>
some editors will put in several spaces when you hit the `tab` key
<heisenbuugGopiMT>
I used tab key
<heisenbuugGopiMT>
to create those spaces
<heisenbuugGopiMT>
Pressed it once
<heisenbuugGopiMT>
New `csv-parser` is also mapping them same.
<zoq[m]>
<swaingotnochill[> "I am not sure how it is serializ" <- I guess you are using the GAN class? If that is the case `model.Generator()` should return the Generator.
<zoq[m]>
zoq[m]: Do you think you can share the code to reproduce the issue?
<shrit[m]>
heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you tab the tab key
<shrit[m]>
* heisenbuug (Gopi M Tatiraju): it depends on the editor config considering how many spaces it will put when you hit the tab key
<heisenbuugGopiMT>
Yea, I had it on 4 spaces in my vscode
<heisenbuugGopiMT>
I opened the csv file in vscode
<shrit[m]>
Would you use another editor such as vim
<heisenbuugGopiMT>
okayy
<zoq[m]>
<ShahAnwaarKhalid> "zoq: Did you get a chance to..." <- Yes, we are looking into it as part of https://github.com/mlpack/mlpack/pull/3007 as well, I think it's related.
<shrit[m]>
in all cases I do not think they should be mapped differently either in the tab was 2, 4 or 6 spaces
<heisenbuugGopiMT>
Yea...
<heisenbuugGopiMT>
The point is how does `getline()` interprets it.
<heisenbuugGopiMT>
If we have `\t` is csv file so when we `getline` it does it replaces that with spaces?
<heisenbuugGopiMT>
If yes, then trim function is handling it, but if it keeps it as `\t` as first 2 chars of the token then we need to remove them ourselves when we are using it to map.
<rcurtin[m]>
`\t` is not two characters; we are writing it here with two characters, but it is just *one* character that represents a tab
<rcurtin[m]>
specifically, a tab is represented as ASCII character 9 (e.g. `0x09` are the byte values used to represent that character): https://www.asciitable.com/
<rcurtin[m]>
if someone actually passes *two* characters, a backslash and a `t`, that is not a tab
<rcurtin[m]>
note that if I wanted to write a backslash followed by a `t` in a C/C++ program, I would have to write it like this:
<rcurtin[m]>
`std::string s("\\t")`
<heisenbuugGopiMT>
Oh okay, my bad...
<heisenbuugGopiMT>
`\t` is a escape char itself
<heisenbuugGopiMT>
Not 2 chars
<rcurtin[m]>
right, exactly 👍️
<heisenbuugGopiMT>
I will do one thing, I will write a file using C++ and I will use `\t` there and will try to load the same file.
<heisenbuugGopiMT>
That should clear everything I think?
<rcurtin[m]>
that should work, I think 👍️
<heisenbuugGopiMT>
I will update ASAP
<heisenbuugGopiMT>
It is throwing an exception in `mlpack-master`
<shrit[m]>
Would you show the C++ code
<shrit[m]>
It is passing the tests, so I do not know how it might throwing an exception
<heisenbuugGopiMT>
I should add this in that issue
<heisenbuugGopiMT>
I renamed it as `Feature List`
<shrit[m]>
OK, in the tests are they mapped differently or the same?
<heisenbuugGopiMT>
Let me check the test file
<heisenbuugGopiMT>
In the failing one we had only `\t`
<shrit[m]>
no, I mean the mlpack-master, is the tests are mapping them differently or not?
<heisenbuugGopiMT>
Yes they are
<heisenbuugGopiMT>
I tested on `mlpack-master` only
<shrit[m]>
Okay, because you said they are mapped the same at minute 00:10
<shrit[m]>
I want to be sure 😂
<heisenbuugGopiMT>
Sorry my bad, I think I confused it with something
<shrit[m]>
okay, so the mlpack-master is mapping them differently
<shrit[m]>
however you implementation is mapping them the same
<shrit[m]>
so this particular test is failling in the new parser
<heisenbuugGopiMT>
I need to check on that one, my system hanged, give me 2 mins.
<heisenbuugGopiMT>
Even my implementation is mapping them differently
<heisenbuugGopiMT>
It's not...
<heisenbuugGopiMT>
No tests are failing
<heisenbuugGopiMT>
For some reason I got that `\t` error, which was Leon's hard test it mapped `\t` as \t on my system, but it should be mapped to `""`
<shrit[m]>
So it is passing the keon test?
<heisenbuugGopiMT>
Yuppp
<shrit[m]>
Perfect then
<heisenbuugGopiMT>
Yea, see on CI only with header case is failing...
<heisenbuugGopiMT>
I need to rewrite `arma`'s implementation
<heisenbuugGopiMT>
As before this we were using that only to load with header
<shrit[m]>
I believe the \\" \\" tests are passing now
<heisenbuugGopiMT>
Yea, it would have happened yesterday only but I made a small mistake in trim fucntion
<shrit[m]>
👍️
<shrit[m]>
So the new parser implementation should be working perfectly now
<heisenbuugGopiMT>
You plan staying up all night?
<shrit[m]>
nope,
<shrit[m]>
I think you should go sleep
<heisenbuugGopiMT>
Oh, it's already 5:00AM here
<heisenbuugGopiMT>
😂😂
<shrit[m]>
oh, you are like me synchronized with the west 👍️
<heisenbuugGopiMT>
Yea, I will make sure all tests are passing before I sleep...
<shrit[m]>
Perfect, do not hesitate in pushing the modification, I will review everything tomorrow 👍️
<heisenbuugGopiMT>
Okay...
<shrit[m]>
Great, what a pleasure to celebrate the removal of boost spirit before the weekend.
<heisenbuugGopiMT>
Yupp
<heisenbuugGopiMT>
It's gonna be fun
<rcurtin[m]>
that is super exciting! 💯! If you want me to leave a review just let me know when it's ready. But be warned I might slow down the merge until after the weekend, I always have lots of comments 😃
<heisenbuugGopiMT>
Yea, it's fine...
<heisenbuugGopiMT>
It's gonna be a busy week...
<shrit[m]>
Yeah, for the merge it might take more time, there are a couple of things to handle
<rcurtin[m]>
👍️ still it is a huge accomplishment just to get everything working! awesome job
<shrit[m]>
Agreed 🚀
<heisenbuugGopiMT>
It was amazing working on this, and you guys are always available to clear doubts and issues...
<heisenbuugGopiMT>
And I will keep working on this...
<heisenbuugGopiMT>
You can only do by learning and Open-Source contribution is the best way to learn and explore things...