Thanzex has quit [Read error: Connection reset by peer]
Thanzex has joined #ruby
nebiros has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
giorgian has joined #ruby
giorgian has quit [Ping timeout: 256 seconds]
roadie has joined #ruby
giorgian has joined #ruby
nebiros has joined #ruby
nebiros has quit [Changing host]
nebiros has joined #ruby
giorgian has quit [Ping timeout: 250 seconds]
roadie has quit [Ping timeout: 248 seconds]
giorgian has joined #ruby
RickHull has quit [Ping timeout: 250 seconds]
giorgian has quit [Ping timeout: 248 seconds]
Rounin has quit [Ping timeout: 250 seconds]
John_Ivan has quit [Ping timeout: 276 seconds]
John_Ivan has joined #ruby
roadie has joined #ruby
John_Ivan has quit [Read error: Connection reset by peer]
Sankalp has quit [Ping timeout: 276 seconds]
giorgian has joined #ruby
giorgian has quit [Ping timeout: 256 seconds]
Sankalp has joined #ruby
roadie has quit [Ping timeout: 260 seconds]
emcb54 has quit [Read error: Connection reset by peer]
emcb54 has joined #ruby
Ziyan has joined #ruby
roadie has joined #ruby
roadie has quit [Ping timeout: 248 seconds]
\{} has joined #ruby
roadie has joined #ruby
roadie has quit [Read error: Connection reset by peer]
\{} has quit [Quit: leaving]
ur5us has quit [Ping timeout: 260 seconds]
hanzo has joined #ruby
giorgian has joined #ruby
giorgian has quit [Ping timeout: 276 seconds]
giorgian has joined #ruby
z4kz has joined #ruby
giorgian has quit [Ping timeout: 248 seconds]
jpn has joined #ruby
teclator has joined #ruby
giorgian has joined #ruby
giorgian has quit [Ping timeout: 260 seconds]
<rapha>
weaksauce: it's a single FTS5 table, so already indexed. and sqlite always has a rowid column which might be used for that.
<rapha>
leah2: yes, that does. and it's what i'm doing right now. weaksauce was right and by morning after about 5 million records it had gotten so slow it wasn't much moving at all anymore.
<rapha>
maybe i posed this in a way that it became an XY problem
Ziyan has quit [Quit: My iMac has gone to sleep. ZZZzzz…]
<rapha>
what i need is basically File.write("results.csv", DB[:table].all.map{|r| r[:text]}.join("\n").split(/\s+/).tally.map{|x| x.join(',')}.join("\n")) ... but at 23GB of data that won't work RAM-wise and take very, very long, CPU-wise. That's what got me thinking about how to parallelize it.
giorgian has joined #ruby
Ziyan has joined #ruby
fowl has quit [Ping timeout: 276 seconds]
Rounin has joined #ruby
entropy has joined #ruby
entropie has quit [Ping timeout: 276 seconds]
entropy is now known as entropie
Ziyan has quit [Ping timeout: 276 seconds]
Ziyan has joined #ruby
fowl has joined #ruby
Ziyan has quit [Ping timeout: 248 seconds]
_ht has joined #ruby
jpn has quit [Ping timeout: 276 seconds]
Ziyan has joined #ruby
dionysus69 has joined #ruby
Guest26nakilon has joined #ruby
<Guest26nakilon>
has anyone generated the MHTML file?
<Guest26nakilon>
I mean I have a MHTML file saved from browser, now I want to edit it programmatically, but when I process it and then encode the <html> chunk like this: str.gsub("=", "=3D").force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" }.gsub(/.{75}/, "\\0=\r\n")" -- the resulting file encoding seems to be broken
<Guest26nakilon>
and when I use this code [str].pack("M") -- it puts =0D in the end of every line and then chrome can't render it at all
<Guest26nakilon>
oh, changed .force_encoding("ascii").b.gsub(/[\x80-\xF0]/n){ |_| "=#{[_].pack"M"}" } to .b.gsub(/[\x80-\xF0]/n){ |_| "=%X" % _.ord } and it almost did it: https://i.imgur.com/KYD91qk.png
<kjetilho>
Guest26nakilon: sorry, typo - I meant quoted-printable of course
<Guest26nakilon>
oh didn't know that, thanks
<Guest26nakilon>
I'm making a MHTML shrinker
<Guest26nakilon>
the youtube.com HTML weights 1 MB and half of it are, for example, unused SVG defs
<kjetilho>
heh.
<kjetilho>
Opera Mini was (is?) great
<Guest26nakilon>
together with an aggressive webp reconverter I'm going to compress the MHTML several times before adding to my repo where it's used as a test file
oxfuxxx has joined #ruby
AndreYuh1i has joined #ruby
<AndreYuh1i>
Hey everyone! Can I enable protect_from_forgery depending on an ENV var? For example I want to have it on production but not on staging.
roadie has joined #ruby
kjetilho has left #ruby [#ruby]
Ziyan_ has joined #ruby
Ziyan has quit [Ping timeout: 260 seconds]
<Rounin>
AndreYuh1i: There's apparently a protect_from_forgery if which takes a method of your own choosing, so you could read the env var there
<AndreYuh1i>
Rounin: oh I didn't know about that option. Thank you!
<Rounin>
Np :)
Thanzex has quit [Read error: Connection reset by peer]
Thanzex has joined #ruby
jpn has joined #ruby
dionysus69 has quit [Ping timeout: 250 seconds]
<Guest26nakilon>
by throwing out unused svg defs the html size shrinked from 905895 to 443879
oxfuxxx has quit [Ping timeout: 256 seconds]
dionysus69 has joined #ruby
jpn has quit [Ping timeout: 248 seconds]
SteveR has joined #ruby
Furai has quit [Quit: WeeChat 3.5]
<leah2>
rapha: cant you do that in sql directly?
Furai has joined #ruby
___nick___ has joined #ruby
<rapha>
leah2: that complete bit of code? i'd find that pretty amazing and would have no clue how to go about it.
<Guest26nakilon>
with better "id" regex even 905895 -> 303053
<rapha>
weaksauce: as mentioned, it's an fts5 table, hence indexed by default. anyhow, going with .each and just doing everything in that one loop worked fine for getting the data i needed. the question came more out of curiosity how this could be parallelized. but that was just me trying to shave a yak, so, yeah, next topic.
<weaksauce>
i imagine you could have made it parallel it using that technique but first getting the partition ids to search from as starting points. and then making that parallel.
<weaksauce>
but cool
<weaksauce>
and by indexed I just meant the column you are partitioning on would have to be indexed