#ruby on 2022-05-06 — irc logs at libera.irclog.whitequark.org

2022-04-12 14:08 jhass[m] changed the topic of #ruby to: Rules: https://ruby-community.com | Ruby 3.1.2, 3.0.4, 2.7.6: https://www.ruby-lang.org | Paste 4+ lines to: https://gist.github.com | Books: https://goo.gl/wpGhoQ

00:03 giorgian has joined #ruby

00:07 z4kz has quit [Quit: Client closed]

00:08 giorgian has quit [Ping timeout: 252 seconds]

00:12 Thanzex has quit [Read error: Connection reset by peer]

00:13 Thanzex has joined #ruby

00:38 nebiros has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]

00:42 giorgian has joined #ruby

00:47 giorgian has quit [Ping timeout: 256 seconds]

01:00 roadie has joined #ruby

01:06 giorgian has joined #ruby

01:10 nebiros has joined #ruby

01:10 nebiros has quit [Changing host]

01:10 nebiros has joined #ruby

01:13 giorgian has quit [Ping timeout: 250 seconds]

01:16 roadie has quit [Ping timeout: 248 seconds]

01:26 giorgian has joined #ruby

01:29 RickHull has quit [Ping timeout: 250 seconds]

01:31 giorgian has quit [Ping timeout: 248 seconds]

01:42 Rounin has quit [Ping timeout: 250 seconds]

02:30 John_Ivan has quit [Ping timeout: 276 seconds]

02:46 John_Ivan has joined #ruby

02:46 roadie has joined #ruby

03:01 John_Ivan has quit [Read error: Connection reset by peer]

03:09 Sankalp has quit [Ping timeout: 276 seconds]

03:26 giorgian has joined #ruby

03:32 giorgian has quit [Ping timeout: 256 seconds]

03:37 Sankalp has joined #ruby

03:40 roadie has quit [Ping timeout: 260 seconds]

03:51 emcb54 has quit [Read error: Connection reset by peer]

03:53 emcb54 has joined #ruby

03:54 Ziyan has joined #ruby

04:07 roadie has joined #ruby

04:17 roadie has quit [Ping timeout: 248 seconds]

04:31 \{} has joined #ruby

04:37 roadie has joined #ruby

04:37 roadie has quit [Read error: Connection reset by peer]

05:09 \{} has quit [Quit: leaving]

05:17 ur5us has quit [Ping timeout: 260 seconds]

05:21 hanzo has joined #ruby

05:27 giorgian has joined #ruby

05:33 giorgian has quit [Ping timeout: 276 seconds]

05:46 giorgian has joined #ruby

05:48 z4kz has joined #ruby

05:51 giorgian has quit [Ping timeout: 248 seconds]

06:14 jpn has joined #ruby

06:15 teclator has joined #ruby

06:29 giorgian has joined #ruby

06:35 giorgian has quit [Ping timeout: 260 seconds]

06:42 <rapha> weaksauce: it's a single FTS5 table, so already indexed. and sqlite always has a rowid column which might be used for that.

06:43 <rapha> leah2: yes, that does. and it's what i'm doing right now. weaksauce was right and by morning after about 5 million records it had gotten so slow it wasn't much moving at all anymore.

06:43 <rapha> maybe i posed this in a way that it became an XY problem

06:44 Ziyan has quit [Quit: My iMac has gone to sleep. ZZZzzz…]

06:47 <rapha> what i need is basically File.write("results.csv", DB[:table].all.map{|r| r[:text]}.join("\n").split(/\s+/).tally.map{|x| x.join(',')}.join("\n")) ... but at 23GB of data that won't work RAM-wise and take very, very long, CPU-wise. That's what got me thinking about how to parallelize it.

06:52 giorgian has joined #ruby

07:19 Ziyan has joined #ruby

07:20 fowl has quit [Ping timeout: 276 seconds]

07:32 Rounin has joined #ruby

07:38 entropy has joined #ruby

07:38 entropie has quit [Ping timeout: 276 seconds]

07:38 entropy is now known as entropie

07:40 Ziyan has quit [Ping timeout: 276 seconds]

07:40 Ziyan has joined #ruby

07:42 fowl has joined #ruby

07:45 Ziyan has quit [Ping timeout: 248 seconds]

07:47 _ht has joined #ruby

07:57 jpn has quit [Ping timeout: 276 seconds]

07:59 Ziyan has joined #ruby

08:31 dionysus69 has joined #ruby

08:35 Guest26nakilon has joined #ruby

08:35 <Guest26nakilon> has anyone generated the MHTML file?

08:38 <Guest26nakilon> I mean I have a MHTML file saved from browser, now I want to edit it programmatically, but when I process it and then encode the <html> chunk like this: str.gsub("=", "=3D").force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" }.gsub(/.{75}/, "\\0=\r\n")" -- the resulting file encoding seems to be broken

08:38 <Guest26nakilon> https://i.imgur.com/Je0Ijsf.png

08:41 <Guest26nakilon> and when I use this code [str].pack("M") -- it puts =0D in the end of every line and then chrome can't render it at all

08:53 <Guest26nakilon> oh, changed .force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" } to .b.gsub(/[\x80-\xF0]/n){ |_| "=%X" % _.ord } and it almost did it: https://i.imgur.com/KYD91qk.png

08:54 <Guest26nakilon> weird that https://datatracker.ietf.org/doc/html/rfc2557 has nothing at all about this trailing '=' thing

08:55 <Guest26nakilon> after each line that cuts on 76th byte (and 74th in case of Ruby's "M" that didn't work for me at all

08:55 <Guest26nakilon> looks like there should not be trailing "==" but I don't know how to make such regex

08:56 <Guest26nakilon> it's currently .gsub(/.{75}/, "\\0=\r\n")

08:58 <kjetilho> Guest26nakilon: it is assumed you know MIME - look at the description of base64

08:59 <Guest26nakilon> it's not base64

09:01 <Guest26nakilon> chrome encoder https://i.imgur.com/aEgLkne.png ; my encoder - https://i.imgur.com/AKOFNAW.png

09:02 <Guest26nakilon> I feel like it's not possible to split correctly with one gsub pass

09:06 <Guest26nakilon> hm, .gsub(/.{,74}[^=](?=.)/, "\\0=\r\n") looks better https://i.imgur.com/eJiISaU.png but for some reason still not enough https://i.imgur.com/KYD91qk.png

09:08 <Guest26nakilon> looks like the =%X encoded chars can't be split by trailing =\r\n

09:10 <Guest26nakilon> this did the trick: gsub(/.{,73}[^=][^=](?=.)/, "\\0=\r\n")

09:15 <Guest26nakilon> not sure what's going on here though https://i.imgur.com/x283Jse.png -> https://i.imgur.com/Vak6yXL.png

09:17 perrierjouet has joined #ruby

09:21 <kjetilho> Guest26nakilon: sorry, typo - I meant quoted-printable of course

09:22 <Guest26nakilon> oh didn't know that, thanks

09:23 <Guest26nakilon> I'm making a MHTML shrinker

09:23 <Guest26nakilon> the youtube.com HTML weights 1 MB and half of it are, for example, unused SVG defs

09:24 <kjetilho> heh.

09:24 <kjetilho> Opera Mini was (is?) great

09:25 <Guest26nakilon> together with an aggressive webp reconverter I'm going to compress the MHTML several times before adding to my repo where it's used as a test file

09:30 oxfuxxx has joined #ruby

09:38 AndreYuh1i has joined #ruby

09:39 <AndreYuh1i> Hey everyone! Can I enable protect_from_forgery depending on an ENV var? For example I want to have it on production but not on staging.

09:39 roadie has joined #ruby

09:44 kjetilho has left #ruby [#ruby]

09:45 Ziyan_ has joined #ruby

09:46 Ziyan has quit [Ping timeout: 260 seconds]

09:50 <Rounin> AndreYuh1i: There's apparently a protect_from_forgery if which takes a method of your own choosing, so you could read the env var there

09:52 <AndreYuh1i> Rounin: oh I didn't know about that option. Thank you!

09:52 <Rounin> Np :)

09:54 Thanzex has quit [Read error: Connection reset by peer]

09:54 Thanzex has joined #ruby

10:06 jpn has joined #ruby

10:07 dionysus69 has quit [Ping timeout: 250 seconds]

10:11 <Guest26nakilon> by throwing out unused svg defs the html size shrinked from 905895 to 443879

10:12 oxfuxxx has quit [Ping timeout: 256 seconds]

10:16 dionysus69 has joined #ruby

10:21 jpn has quit [Ping timeout: 248 seconds]

10:25 SteveR has joined #ruby

10:26 Furai has quit [Quit: WeeChat 3.5]

10:26 <leah2> rapha: cant you do that in sql directly?

10:30 Furai has joined #ruby

10:32 ___nick___ has joined #ruby

10:35 <rapha> leah2: that complete bit of code? i'd find that pretty amazing and would have no clue how to go about it.

10:44 <Guest26nakilon> with better "id" regex even 905895 -> 303053

10:46 SteveR has quit [Quit: Client closed]

10:49 z4kz has quit [Quit: Client closed]

10:58 jpn has joined #ruby

11:02 jpn has quit [Ping timeout: 246 seconds]

11:05 Ziyan_ has quit [Quit: Textual IRC Client: www.textualapp.com]

11:44 szkl has joined #ruby

11:45 dionysus69 has quit [Ping timeout: 260 seconds]

11:54 John_Ivan has joined #ruby

12:09 RickHull has joined #ruby

12:25 dionysus69 has joined #ruby

12:38 Bish has quit [Quit: leaving]

12:41 Ziyan has joined #ruby

12:53 ua_ has quit [Excess Flood]

12:53 ua_ has joined #ruby

12:56 crankharder has joined #ruby

12:58 Ziyan_ has joined #ruby

12:59 Ziyan has quit [Ping timeout: 260 seconds]

13:06 Furai has quit [Quit: WeeChat 3.5]

13:09 Furai has joined #ruby

13:20 Ziyan_ has quit [Quit: My iMac has gone to sleep. ZZZzzz…]

13:24 Ziyan has joined #ruby

13:27 jpn has joined #ruby

13:32 emcb543 has joined #ruby

13:34 jpn has quit [Ping timeout: 250 seconds]

13:34 emcb54 has quit [Ping timeout: 260 seconds]

13:34 emcb543 is now known as emcb54

13:50 szkl has quit [Quit: Connection closed for inactivity]

13:57 Ziyan has quit [Quit: My iMac has gone to sleep. ZZZzzz…]

13:58 idiocrash_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

14:01 bit4bit has joined #ruby

14:04 Ziyan has joined #ruby

14:06 jpn has joined #ruby

14:13 RickHull has quit [Ping timeout: 240 seconds]

14:14 jpn has quit [Ping timeout: 252 seconds]

14:17 AndreYuh1i has quit [Quit: Lost terminal]

14:19 Ziyan has quit [Ping timeout: 252 seconds]

14:20 Ziyan has joined #ruby

14:21 hanzo has quit [Quit: Connection closed for inactivity]

14:28 jpn has joined #ruby

14:44 jpn has quit [Ping timeout: 248 seconds]

14:58 victori has quit [Ping timeout: 248 seconds]

15:08 victori has joined #ruby

15:10 jpn has joined #ruby

15:21 jpn has quit [Ping timeout: 256 seconds]

15:23 hololeap_ has quit [Ping timeout: 240 seconds]

15:28 hololeap_ has joined #ruby

15:36 jpn has joined #ruby

15:41 jpn has quit [Ping timeout: 250 seconds]

15:42 g0zart has joined #ruby

15:47 CrazyEddy has quit [Ping timeout: 256 seconds]

15:47 <Guest26nakilon> so html:900kb->300kb, css:2500kb->750kb, images(with quality loss):500kb->50kb -- that is 3800kb->1180kb

15:49 CrazyEddy has joined #ruby

15:52 hololeap_ is now known as hololeap

15:56 oxfuxxx has joined #ruby

15:57 jpn has joined #ruby

16:02 goldfish has joined #ruby

16:02 jpn has quit [Ping timeout: 276 seconds]

16:07 oxfuxxx has quit [Ping timeout: 246 seconds]

16:08 oxfuxxx has joined #ruby

16:15 emcb548 has joined #ruby

16:17 emcb54 has quit [Ping timeout: 248 seconds]

16:17 emcb548 is now known as emcb54

16:23 oxfuxxx has quit [Remote host closed the connection]

16:32 Guest26nakilon has quit [Quit: Client closed]

16:43 g0zart has quit [Quit: Leaving]

16:48 hanzo has joined #ruby

16:52 jpn has joined #ruby

16:58 jpn has quit [Ping timeout: 256 seconds]

17:06 emcb545 has joined #ruby

17:08 emcb54 has quit [Ping timeout: 252 seconds]

17:08 emcb545 is now known as emcb54

17:18 ___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

17:19 ___nick___ has joined #ruby

17:20 ___nick___ has quit [Client Quit]

17:22 ___nick___ has joined #ruby

17:58 <weaksauce> rapha i think your best bet is to use an indexed column like I suggested

17:58 <weaksauce> and definitely not use offset

18:00 <weaksauce> you might be able to sql wizard your way out of it but I am not that good at sql

18:04 emcb540 has joined #ruby

18:06 emcb54 has quit [Ping timeout: 276 seconds]

18:06 emcb540 is now known as emcb54

18:24 roadie has quit [Ping timeout: 252 seconds]

18:26 jpn has joined #ruby

18:26 dionysus70 has joined #ruby

18:26 dionysus69 has quit [Read error: Connection reset by peer]

18:26 dionysus70 is now known as dionysus69

18:30 hololeap has quit [Ping timeout: 240 seconds]

18:31 jpn has quit [Ping timeout: 276 seconds]

18:34 hololeap has joined #ruby

18:43 crankharder has quit [Quit: leaving]

19:00 dionysus69 has quit [Read error: Connection reset by peer]

19:00 dionysus69 has joined #ruby

19:01 emcb543 has joined #ruby

19:03 emcb54 has quit [Ping timeout: 246 seconds]

19:03 emcb543 is now known as emcb54

19:08 havenwood has quit [Quit: The Lounge - https://thelounge.chat]

19:08 havenwood has joined #ruby

19:15 hololeap has quit [Ping timeout: 240 seconds]

19:19 hololeap has joined #ruby

19:20 hololeap has quit [Client Quit]

20:02 _ht has quit [Remote host closed the connection]

20:02 BiHi has joined #ruby

20:03 Thanzex has quit [Read error: Connection reset by peer]

20:03 Thanzex has joined #ruby

20:03 BiHi has quit []

20:04 ___nick___ has quit [Ping timeout: 248 seconds]

20:11 Ziyan has quit [Quit: Textual IRC Client: www.textualapp.com]

20:14 jpn has joined #ruby

20:19 jpn has quit [Ping timeout: 248 seconds]

20:34 Vonter has quit [Ping timeout: 276 seconds]

20:39 oxfuxxx has joined #ruby

20:53 <leah2> nah

20:53 <leah2> just dump the table into a csv and parse that imo :p

20:55 oxfuxxx has quit [Ping timeout: 276 seconds]

20:56 oxfuxxx has joined #ruby

21:05 oxfuxxx has quit [Quit: [H]EAT ROX FUCK R0X SHIT BRIX. = The Yankies M0th3Rphackers Coconut Aerospace =]

21:17 osXnut has joined #ruby

21:22 bit4bit has quit [Ping timeout: 248 seconds]

21:33 shokohsc has quit [Quit: The Lounge - https://thelounge.chat]

21:37 shokohsc has joined #ruby

21:39 shokohsc has quit [Client Quit]

21:40 bit4bit has joined #ruby

21:44 ur5us has joined #ruby

21:44 FetidToot6 has joined #ruby

21:45 <rapha> weaksauce: as mentioned, it's an fts5 table, hence indexed by default. anyhow, going with .each and just doing everything in that one loop worked fine for getting the data i needed. the question came more out of curiosity how this could be parallelized. but that was just me trying to shave a yak, so, yeah, next topic.

21:47 FetidToot has quit [Ping timeout: 272 seconds]

21:47 FetidToot6 is now known as FetidToot

21:49 teclator has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

21:50 swaggboi has quit [Ping timeout: 260 seconds]

21:55 <weaksauce> i imagine you could have made it parallel it using that technique but first getting the partition ids to search from as starting points. and then making that parallel.

21:55 <weaksauce> but cool

21:56 <weaksauce> and by indexed I just meant the column you are partitioning on would have to be indexed

22:02 jpn has joined #ruby

22:03 swaggboi has joined #ruby

22:03 <rapha> i do believe (https://www.sqlite.org/lang_createtable.html; Ctrl-F 'retrieving or sorting records by rowid is fast') that rowid in sqlite is always indexed.

22:05 <weaksauce> seems like it's even faster than an index by the nature of the btree

22:07 jpn has quit [Ping timeout: 248 seconds]

22:10 <rapha> as for using the partition idea for parallelisation, i'll try that when i feel like shaving a yak the next time

22:10 <rapha> it'd certainly be nice for smaller datasets but which require more extensive calculations performed on each record

22:11 FetidToot0 has joined #ruby

22:12 <weaksauce> almost a naive reimplementation of google's map/reduce actually

22:13 FetidToot has quit [Ping timeout: 260 seconds]

22:13 FetidToot0 is now known as FetidToot

22:16 <weaksauce> or maybe more accurate a single server map/reduce

22:17 <rapha> huh?

22:17 * rapha googles google's map/reduce

22:18 <rapha> oh! wow!

22:26 shokohsc has joined #ruby

23:06 bit4bit has quit [Ping timeout: 256 seconds]

23:15 RickHull has joined #ruby

23:16 ur5us has quit [Ping timeout: 250 seconds]

23:20 naina has joined #ruby

23:39 naina has quit [K-Lined]

23:50 jpn has joined #ruby

23:55 jpn has quit [Ping timeout: 256 seconds]