jhass[m] changed the topic of #ruby to: Rules: https://ruby-community.com | Ruby 3.1.2, 3.0.4, 2.7.6: https://www.ruby-lang.org | Paste 4+ lines to: https://gist.github.com | Books: https://goo.gl/wpGhoQ
giorgian has joined #ruby
z4kz has quit [Quit: Client closed]
giorgian has quit [Ping timeout: 252 seconds]
Thanzex has quit [Read error: Connection reset by peer]
Thanzex has joined #ruby
nebiros has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
giorgian has joined #ruby
giorgian has quit [Ping timeout: 256 seconds]
roadie has joined #ruby
giorgian has joined #ruby
nebiros has joined #ruby
nebiros has quit [Changing host]
nebiros has joined #ruby
giorgian has quit [Ping timeout: 250 seconds]
roadie has quit [Ping timeout: 248 seconds]
giorgian has joined #ruby
RickHull has quit [Ping timeout: 250 seconds]
giorgian has quit [Ping timeout: 248 seconds]
Rounin has quit [Ping timeout: 250 seconds]
John_Ivan has quit [Ping timeout: 276 seconds]
John_Ivan has joined #ruby
roadie has joined #ruby
John_Ivan has quit [Read error: Connection reset by peer]
Sankalp has quit [Ping timeout: 276 seconds]
giorgian has joined #ruby
giorgian has quit [Ping timeout: 256 seconds]
Sankalp has joined #ruby
roadie has quit [Ping timeout: 260 seconds]
emcb54 has quit [Read error: Connection reset by peer]
emcb54 has joined #ruby
Ziyan has joined #ruby
roadie has joined #ruby
roadie has quit [Ping timeout: 248 seconds]
\{} has joined #ruby
roadie has joined #ruby
roadie has quit [Read error: Connection reset by peer]
\{} has quit [Quit: leaving]
ur5us has quit [Ping timeout: 260 seconds]
hanzo has joined #ruby
giorgian has joined #ruby
giorgian has quit [Ping timeout: 276 seconds]
giorgian has joined #ruby
z4kz has joined #ruby
giorgian has quit [Ping timeout: 248 seconds]
jpn has joined #ruby
teclator has joined #ruby
giorgian has joined #ruby
giorgian has quit [Ping timeout: 260 seconds]
<rapha> weaksauce: it's a single FTS5 table, so already indexed. and sqlite always has a rowid column which might be used for that.
<rapha> leah2: yes, that does. and it's what i'm doing right now. weaksauce was right and by morning after about 5 million records it had gotten so slow it wasn't much moving at all anymore.
<rapha> maybe i posed this in a way that it became an XY problem
Ziyan has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
<rapha> what i need is basically File.write("results.csv", DB[:table].all.map{|r| r[:text]}.join("\n").split(/\s+/).tally.map{|x| x.join(',')}.join("\n")) ... but at 23GB of data that won't work RAM-wise and take very, very long, CPU-wise. That's what got me thinking about how to parallelize it.
giorgian has joined #ruby
Ziyan has joined #ruby
fowl has quit [Ping timeout: 276 seconds]
Rounin has joined #ruby
entropy has joined #ruby
entropie has quit [Ping timeout: 276 seconds]
entropy is now known as entropie
Ziyan has quit [Ping timeout: 276 seconds]
Ziyan has joined #ruby
fowl has joined #ruby
Ziyan has quit [Ping timeout: 248 seconds]
_ht has joined #ruby
jpn has quit [Ping timeout: 276 seconds]
Ziyan has joined #ruby
dionysus69 has joined #ruby
Guest26nakilon has joined #ruby
<Guest26nakilon> has anyone generated the MHTML file?
<Guest26nakilon> I mean I have a MHTML file saved from browser, now I want to edit it programmatically, but when I process it and then encode the <html> chunk like this: str.gsub("=", "=3D").force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" }.gsub(/.{75}/, "\\0=\r\n")" -- the resulting file encoding seems to be broken
<Guest26nakilon> and when I use this code [str].pack("M") -- it puts =0D in the end of every line and then chrome can't render it at all
<Guest26nakilon> oh, changed .force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" } to .b.gsub(/[\x80-\xF0]/n){ |_| "=%X" % _.ord } and it almost did it: https://i.imgur.com/KYD91qk.png
<Guest26nakilon> weird that https://datatracker.ietf.org/doc/html/rfc2557 has nothing at all about this trailing '=' thing
<Guest26nakilon> after each line that cuts on 76th byte (and 74th in case of Ruby's "M" that didn't work for me at all
<Guest26nakilon> looks like there should not be trailing "==" but I don't know how to make such regex
<Guest26nakilon> it's currently .gsub(/.{75}/, "\\0=\r\n")
<kjetilho> Guest26nakilon: it is assumed you know MIME - look at the description of base64
<Guest26nakilon> it's not base64
<Guest26nakilon> chrome encoder https://i.imgur.com/aEgLkne.png ; my encoder - https://i.imgur.com/AKOFNAW.png
<Guest26nakilon> I feel like it's not possible to split correctly with one gsub pass
<Guest26nakilon> hm, .gsub(/.{,74}[^=](?=.)/, "\\0=\r\n") looks better https://i.imgur.com/eJiISaU.png but for some reason still not enough https://i.imgur.com/KYD91qk.png
<Guest26nakilon> looks like the =%X encoded chars can't be split by trailing =\r\n
<Guest26nakilon> this did the trick: gsub(/.{,73}[^=][^=](?=.)/, "\\0=\r\n")
<Guest26nakilon> not sure what's going on here though https://i.imgur.com/x283Jse.png -> https://i.imgur.com/Vak6yXL.png
perrierjouet has joined #ruby
<kjetilho> Guest26nakilon: sorry, typo - I meant quoted-printable of course
<Guest26nakilon> oh didn't know that, thanks
<Guest26nakilon> I'm making a MHTML shrinker
<Guest26nakilon> the youtube.com HTML weights 1 MB and half of it are, for example, unused SVG defs
<kjetilho> heh.
<kjetilho> Opera Mini was (is?) great
<Guest26nakilon> together with an aggressive webp reconverter I'm going to compress the MHTML several times before adding to my repo where it's used as a test file
oxfuxxx has joined #ruby
AndreYuh1i has joined #ruby
<AndreYuh1i> Hey everyone! Can I enable protect_from_forgery depending on an ENV var? For example I want to have it on production but not on staging.
roadie has joined #ruby
kjetilho has left #ruby [#ruby]
Ziyan_ has joined #ruby
Ziyan has quit [Ping timeout: 260 seconds]
<Rounin> AndreYuh1i: There's apparently a protect_from_forgery if which takes a method of your own choosing, so you could read the env var there
<AndreYuh1i> Rounin: oh I didn't know about that option. Thank you!
<Rounin> Np :)
Thanzex has quit [Read error: Connection reset by peer]
Thanzex has joined #ruby
jpn has joined #ruby
dionysus69 has quit [Ping timeout: 250 seconds]
<Guest26nakilon> by throwing out unused svg defs the html size shrinked from 905895 to 443879
oxfuxxx has quit [Ping timeout: 256 seconds]
dionysus69 has joined #ruby
jpn has quit [Ping timeout: 248 seconds]
SteveR has joined #ruby
Furai has quit [Quit: WeeChat 3.5]
<leah2> rapha: cant you do that in sql directly?
Furai has joined #ruby
___nick___ has joined #ruby
<rapha> leah2: that complete bit of code? i'd find that pretty amazing and would have no clue how to go about it.
<Guest26nakilon> with better "id" regex even 905895 -> 303053
SteveR has quit [Quit: Client closed]
z4kz has quit [Quit: Client closed]
jpn has joined #ruby
jpn has quit [Ping timeout: 246 seconds]
Ziyan_ has quit [Quit: Textual IRC Client: www.textualapp.com]
szkl has joined #ruby
dionysus69 has quit [Ping timeout: 260 seconds]
John_Ivan has joined #ruby
RickHull has joined #ruby
dionysus69 has joined #ruby
Bish has quit [Quit: leaving]
Ziyan has joined #ruby
ua_ has quit [Excess Flood]
ua_ has joined #ruby
crankharder has joined #ruby
Ziyan_ has joined #ruby
Ziyan has quit [Ping timeout: 260 seconds]
Furai has quit [Quit: WeeChat 3.5]
Furai has joined #ruby
Ziyan_ has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
Ziyan has joined #ruby
jpn has joined #ruby
emcb543 has joined #ruby
jpn has quit [Ping timeout: 250 seconds]
emcb54 has quit [Ping timeout: 260 seconds]
emcb543 is now known as emcb54
szkl has quit [Quit: Connection closed for inactivity]
Ziyan has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
idiocrash_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
bit4bit has joined #ruby
Ziyan has joined #ruby
jpn has joined #ruby
RickHull has quit [Ping timeout: 240 seconds]
jpn has quit [Ping timeout: 252 seconds]
AndreYuh1i has quit [Quit: Lost terminal]
Ziyan has quit [Ping timeout: 252 seconds]
Ziyan has joined #ruby
hanzo has quit [Quit: Connection closed for inactivity]
jpn has joined #ruby
jpn has quit [Ping timeout: 248 seconds]
victori has quit [Ping timeout: 248 seconds]
victori has joined #ruby
jpn has joined #ruby
jpn has quit [Ping timeout: 256 seconds]
hololeap_ has quit [Ping timeout: 240 seconds]
hololeap_ has joined #ruby
jpn has joined #ruby
jpn has quit [Ping timeout: 250 seconds]
g0zart has joined #ruby
CrazyEddy has quit [Ping timeout: 256 seconds]
<Guest26nakilon> so html:900kb->300kb, css:2500kb->750kb, images(with quality loss):500kb->50kb -- that is 3800kb->1180kb
CrazyEddy has joined #ruby
hololeap_ is now known as hololeap
oxfuxxx has joined #ruby
jpn has joined #ruby
goldfish has joined #ruby
jpn has quit [Ping timeout: 276 seconds]
oxfuxxx has quit [Ping timeout: 246 seconds]
oxfuxxx has joined #ruby
emcb548 has joined #ruby
emcb54 has quit [Ping timeout: 248 seconds]
emcb548 is now known as emcb54
oxfuxxx has quit [Remote host closed the connection]
Guest26nakilon has quit [Quit: Client closed]
g0zart has quit [Quit: Leaving]
hanzo has joined #ruby
jpn has joined #ruby
jpn has quit [Ping timeout: 256 seconds]
emcb545 has joined #ruby
emcb54 has quit [Ping timeout: 252 seconds]
emcb545 is now known as emcb54
___nick___ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
___nick___ has joined #ruby
___nick___ has quit [Client Quit]
___nick___ has joined #ruby
<weaksauce> rapha i think your best bet is to use an indexed column like I suggested
<weaksauce> and definitely not use offset
<weaksauce> you might be able to sql wizard your way out of it but I am not that good at sql
emcb540 has joined #ruby
emcb54 has quit [Ping timeout: 276 seconds]
emcb540 is now known as emcb54
roadie has quit [Ping timeout: 252 seconds]
jpn has joined #ruby
dionysus70 has joined #ruby
dionysus69 has quit [Read error: Connection reset by peer]
dionysus70 is now known as dionysus69
hololeap has quit [Ping timeout: 240 seconds]
jpn has quit [Ping timeout: 276 seconds]
hololeap has joined #ruby
crankharder has quit [Quit: leaving]
dionysus69 has quit [Read error: Connection reset by peer]
dionysus69 has joined #ruby
emcb543 has joined #ruby
emcb54 has quit [Ping timeout: 246 seconds]
emcb543 is now known as emcb54
havenwood has quit [Quit: The Lounge - https://thelounge.chat]
havenwood has joined #ruby
hololeap has quit [Ping timeout: 240 seconds]
hololeap has joined #ruby
hololeap has quit [Client Quit]
_ht has quit [Remote host closed the connection]
BiHi has joined #ruby
Thanzex has quit [Read error: Connection reset by peer]
Thanzex has joined #ruby
BiHi has quit []
___nick___ has quit [Ping timeout: 248 seconds]
Ziyan has quit [Quit: Textual IRC Client: www.textualapp.com]
jpn has joined #ruby
jpn has quit [Ping timeout: 248 seconds]
Vonter has quit [Ping timeout: 276 seconds]
oxfuxxx has joined #ruby
<leah2> nah
<leah2> just dump the table into a csv and parse that imo :p
oxfuxxx has quit [Ping timeout: 276 seconds]
oxfuxxx has joined #ruby
oxfuxxx has quit [Quit: [H]EAT ROX FUCK R0X SHIT BRIX. = The Yankies M0th3Rphackers Coconut Aerospace =]
osXnut has joined #ruby
bit4bit has quit [Ping timeout: 248 seconds]
shokohsc has quit [Quit: The Lounge - https://thelounge.chat]
shokohsc has joined #ruby
shokohsc has quit [Client Quit]
bit4bit has joined #ruby
ur5us has joined #ruby
FetidToot6 has joined #ruby
<rapha> weaksauce: as mentioned, it's an fts5 table, hence indexed by default. anyhow, going with .each and just doing everything in that one loop worked fine for getting the data i needed. the question came more out of curiosity how this could be parallelized. but that was just me trying to shave a yak, so, yeah, next topic.
FetidToot has quit [Ping timeout: 272 seconds]
FetidToot6 is now known as FetidToot
teclator has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
swaggboi has quit [Ping timeout: 260 seconds]
<weaksauce> i imagine you could have made it parallel it using that technique but first getting the partition ids to search from as starting points. and then making that parallel.
<weaksauce> but cool
<weaksauce> and by indexed I just meant the column you are partitioning on would have to be indexed
jpn has joined #ruby
swaggboi has joined #ruby
<rapha> i do believe (https://www.sqlite.org/lang_createtable.html; Ctrl-F 'retrieving or sorting records by rowid is fast') that rowid in sqlite is always indexed.
<weaksauce> seems like it's even faster than an index by the nature of the btree
jpn has quit [Ping timeout: 248 seconds]
<rapha> as for using the partition idea for parallelisation, i'll try that when i feel like shaving a yak the next time
<rapha> it'd certainly be nice for smaller datasets but which require more extensive calculations performed on each record
FetidToot0 has joined #ruby
<weaksauce> almost a naive reimplementation of google's map/reduce actually
FetidToot has quit [Ping timeout: 260 seconds]
FetidToot0 is now known as FetidToot
<weaksauce> or maybe more accurate a single server map/reduce
<rapha> huh?
* rapha googles google's map/reduce
<rapha> oh! wow!
shokohsc has joined #ruby
bit4bit has quit [Ping timeout: 256 seconds]
RickHull has joined #ruby
ur5us has quit [Ping timeout: 250 seconds]
naina has joined #ruby
naina has quit [K-Lined]
jpn has joined #ruby
jpn has quit [Ping timeout: 256 seconds]