#mlpack on 2021-08-24 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

00:25 <zoq[m]> Is the auto-approver hanging?

02:25 aakashi2001 has joined #mlpack

02:25 aakashi2001 has quit [Changing host]

02:25 aakashi2001 has joined #mlpack

04:47 <Aakash-kaushikAa> Reference: https://github.com/mlpack/mlpack/pull/2990#issuecomment-903291032

04:47 <Aakash-kaushikAa> Hi, So as we were able to get past the parsing error, we now need to discuss how we would like to present the documentation and which direction it should go to, so for discussing that can we hold a meet on this Friday(27th Aug) on the same time(1700 UTC) as mlpack meet in the same zoom room.

04:47 <Aakash-kaushikAa> everybody is welcome for the meet.

06:31 aakashi2001 has quit [Ping timeout: 250 seconds]

06:55 aakashi2001 has joined #mlpack

07:02 aakashi2001 has quit [Ping timeout: 250 seconds]

07:43 aakashi2001 has joined #mlpack

07:43 aakashi2001 has quit [Changing host]

07:43 aakashi2001 has joined #mlpack

09:06 aakashi2001 has quit [Ping timeout: 250 seconds]

12:02 <rcurtin[m]> zoq: looks like it is working still? I saw it just approved #3028. let me know if there is some other issue

12:03 <rcurtin[m]> Aakash-kaushik (Aakash kaushik): that is really exciting and awesome! I would love to attend but unfortunately I am booked pretty much all of Friday :( if I can help out, please just let me know 👍️

12:06 <DavidportlouisDa> Hi @marcusedel:matrix.org @kartikdutt18 can we move today's meeting to either Thursday (26/08/21) or Friday (27/08/21) @ 19:00 IST ?

12:41 <zoq[m]> <DavidportlouisDa> "Hi @marcusedel:matrix.org @karti" <- Thursday works for me.

12:43 <zoq[m]> <rcurtin[m]> "Aakash-kaushik (Aakash kaushik):" <- rcurtin: we could move the meeting to another day/time; the main idea is to get people together who are interested in improving the documentation and discuss the direction. Aakash-kaushik (Aakash kaushik) put a really nice demo together (it's more than a demo).

12:44 <zoq[m]> <rcurtin[m]> "zoq: looks like it is working st" <- Yeah, looks like I wasn't patient enough.

12:54 <heisenbuugGopiMT> @rcurtin can we use already one-hot encoded data directly for training?

12:56 <heisenbuugGopiMT> Also, a doubt, when we one hot encode data, it maps it into bits, right?... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/68830eb15eeac90df71cc11aa0f9ca825a0a314d)

12:56 <heisenbuugGopiMT> * Also, a doubt, when we one hot encode data, it maps it into bits, right?... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/3a6641a6f8adff7fd459094cc260eab7de64862b)

12:57 <heisenbuugGopiMT> * Also, a doubt, when we one hot encode data, it maps it into bits, right?... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/8c4de310c4fc058d5df5723eaadd7fc7cb3945f7)

12:57 <heisenbuugGopiMT> In that case we get three columns, right?

12:59 <zoq[m]> heisenbuugGopiMT: Example for onehot encoding - https://github.com/mlpack/examples/blob/62526e824cf446f4c8bcebb4e4b6d8550b7e83bb/avocado_price_prediction_with_linear_regression/avocado_price_prediction_with_lr_cpp.ipynb

13:00 <heisenbuugGopiMT> I was trying to help him out with [this](https://github.com/mlpack/mlpack/issues/3041) issue but just wanted to make sure I am pointing him in the right direction.

13:08 <rcurtin[m]> heisenbuug (Gopi M Tatiraju): I saw the issue, thanks for answering that! I was thinking they could do one of two things... they can change the CSV so instead of 0/1 it's something that doesn't parse as a number, like `false`/`true`. Or, they can manually set `datasetInfo.Type(i) = Datatype::categorical` for all categorical dimensions before calling `data::Load()` (I might have the names of the functions or syntax slightly wrong there)

13:08 <rcurtin[m]> an example that's kind of like that is in the notebook Marcus linked to, in cell [10]

13:10 <heisenbuugGopiMT> Yea, the model needs to interpret differently based on if it's one hot encoding or `datasetmapper`, right?

13:10 <heisenbuugGopiMT> I mean with one hot encoding number of columns increases?

13:10 <heisenbuugGopiMT> How does the model know that it should take those 2 columns combined?

13:27 <rcurtin[m]> if the dataset has already been one-hot encoded, then I think the user will need to reverse that so that just the original categories are in one column

13:29 <heisenbuugGopiMT> Oh okay, I just wanted to know that...

13:29 <rcurtin[m]> zoq: Aakash-kaushik (Aakash kaushik) : if you want to move the meeting, feel free---I'd love to attend, but I can also catch up later. I am not sure how much time I have to help with the effort, but I definitely want to do whatever I can to make sure that people who do want to contribute to the effort are able to do so effectively. so, I can handle, e.g., updating website scripts, etc. (and maybe we can finally throw away my awful doxygen

13:29 <rcurtin[m]> postprocessing scripts 😃)

13:29 <kartikdutt18kart> Works for me as well

13:32 <heisenbuugGopiMT> So each library might have a different way to store one-hot encoded data?

13:32 <heisenbuugGopiMT> So when we one hot encode from one library we can't use it somewhere else?

13:33 <rcurtin[m]> I'm not sure I understand the question---one-hot encoding is a standard transformation where we take, e.g., a category (like suppose we can have categories "a", "b", "c", "d"), and we make one dimension for each category. so, for example, if a point had category "c", then we would encode it as [0, 0, 1, 0], and if it had category "d", we would encode it as [0, 0, 0, 1]

13:34 <rcurtin[m]> I think, glancing over the responses on the issue, the best idea would be to modify the Python script to output the data in categorical form, not in one-hot encoded form

13:36 <heisenbuugGopiMT> yup, I suggested that to him. [skmultiflow.transform.OneHotToCategorical](https://scikit-multiflow.readthedocs.io/en/stable/api/generated/skmultiflow.transform.OneHotToCategorical.html) might work.

13:37 <rcurtin[m]> right, we don't currently have a "backwards" transformation for one-hot encoded data, so if they can't modify the python scripts, then that skmultiflow tool might be a good way to go 👍️

13:37 <Aakash-kaushikAa> > [zoq](https://matrix.to/#/@marcusedel:matrix.org): @Aakash-kaushik : if you want to move the meeting, feel free---I'd love to attend, but I can also catch up later. I am not sure how much time I have to help with the effort, but I definitely want to do whatever I can to make sure that people who do want to contribute to the effort are able to do so effectively. so, I can handle, e.g., updating website scripts, etc. (and maybe we can

13:37 <Aakash-kaushikAa> finally throw away my awful doxygen postprocessing scripts 😃)

13:37 <Aakash-kaushikAa> hey @ryan:ratml.org it's more about feedback because i am not sure how it should look or how the RST files should be written and processed and all that stuff.

13:37 <heisenbuugGopiMT> We might want to add the "backward" transformation?

13:38 <Aakash-kaushikAa> So maybe you guys can pick up a suitable time and we can take a pass through it. and if you want to take a look yourself you should be able to set it up through the PR, the guide i have written should be sufficient.

13:46 aakashi2001 has joined #mlpack

14:54 <zoq[m]> <Aakash-kaushikAa> "So maybe you guys can pick up a..." <- Right, so what about we discuss this in the next mlpack meeting, which is Friday next week. In the meantime we can discuss on the open PR and think about what the direction could look like?

15:09 aakashi2001 has quit [Remote host closed the connection]

16:28 aakashi2001 has joined #mlpack

16:45 <swaingotnochill[> <kartikdutt18kart> "Works for me as well" <- I will join with David on that day itself as we discussed earlier :) Hope that's fine

17:32 aakashi2009 has joined #mlpack

17:32 aakashi2009 has quit [Changing host]

17:32 aakashi2009 has joined #mlpack

17:36 aakashi2001 has quit [Ping timeout: 248 seconds]

17:41 aakashi2009 has quit [Remote host closed the connection]