00:04
ur5us has joined #crystal-lang
03:28
aquijoule__ has joined #crystal-lang
03:31
aquijoule_ has quit [Ping timeout: 265 seconds]
04:09
Chillfox has quit [Ping timeout: 250 seconds]
04:13
Chillfox has joined #crystal-lang
04:19
ur5us has quit [Ping timeout: 250 seconds]
06:02
sagax has quit [Read error: Connection reset by peer]
06:37
sagax has joined #crystal-lang
06:50
pusewic|away_ has quit [Ping timeout: 256 seconds]
06:51
pusewic|away_ has joined #crystal-lang
07:57
<
frojnd >
Let me take a look
07:59
taskylizard has joined #crystal-lang
07:59
<
FromGitter >
<paulocoghi> Sorry, now I understand your problem better
08:00
<
FromGitter >
<paulocoghi> You are correctly extracting the text under the desired <table> tag
08:01
<
FromGitter >
<paulocoghi> I'm looking into it
08:04
<
FromGitter >
<paulocoghi> I will continue in a separated thread, here
08:04
<
FromGitter >
<paulocoghi> Now I understand your problem better
08:05
<
FromGitter >
<paulocoghi> Since the text you want have undesired extra texts inside it, using extra "span" tags
08:06
<
frojnd >
Yeah problem is those extra span tags and don't know how to ignore them
08:06
<
FromGitter >
<paulocoghi> One approach would be to separately select the undesired texts, with an extra "span" selector
08:06
<
FromGitter >
<paulocoghi> and later search and eliminate their occurrences on the main text
08:07
<
FromGitter >
<paulocoghi> cleaning it
08:08
<
FromGitter >
<paulocoghi> There is not a CSS selector for the "pure text" inside an element
08:09
<
FromGitter >
<paulocoghi> So one possible approach would be to separately select the undesired texts, search and remove them from the main text
08:13
<
FromGitter >
<paulocoghi> Another approach would be to download the original Bible document used by BibliJa.net, which seems to be provided by The Digital Bible Library
08:13
<
FromGitter >
<paulocoghi> also from United Bible Societies
08:15
<
frojnd >
Problem is that Slovenian language doesn't have any public API so I'm stuck with biblija.net
08:17
<
frojnd >
And while I'm at doing it for one language I though I might just support all that are listed (en,si,fr,es,ca,eu)
08:17
ur5us has joined #crystal-lang
08:17
<
frojnd >
Since content is similar if not the same with those spans
08:18
taskylizard has quit [Remote host closed the connection]
08:18
taskylizard has joined #crystal-lang
08:19
<
FromGitter >
<paulocoghi> I found the Slovenian version used on Biblija.net on The Digital Bible Library, as well as the other languages
08:19
<
FromGitter >
<paulocoghi> I understand this is not your desired approach,
08:19
<
FromGitter >
<paulocoghi> but it my be easier and more durable
08:20
<
frojnd >
Hm can't access it
08:20
<
frojnd >
Ah jsut slow
08:22
<
frojnd >
Haha searching for that "Download" button ;D
08:24
<
FromGitter >
<paulocoghi> Haha, it's fine :)
08:24
<
FromGitter >
<paulocoghi> If my previous suggestions don't help
08:25
<
FromGitter >
<paulocoghi> I found this list of other Slovenian bibles online
08:25
<
frojnd >
It's not open lol
08:25
<
frojnd >
So there is no download
08:26
<
FromGitter >
<paulocoghi> Maybe one of the alternatives provides a better HTML structure, allowing an easier extraction
08:26
<
FromGitter >
<paulocoghi> Please inform me if one of them helps you
08:30
<
frojnd >
No, most of them points to biblija.net so...
08:30
<
frojnd >
Format is also not acceptable. I'm stuck with biblija.net
08:31
taupiqueur has joined #crystal-lang
09:01
ur5us has quit [Ping timeout: 268 seconds]
09:08
ur5us has joined #crystal-lang
09:19
ur5us has quit [Ping timeout: 268 seconds]
11:26
r0bby has quit [Ping timeout: 256 seconds]
11:28
r0bby has joined #crystal-lang
11:54
<
frojnd >
I'm looping over redundat text and then removing it from main text
11:54
<
frojnd >
s/redundat/redundant
12:00
<
FromGitter >
<paulocoghi> Considering the limitations on CSS selectors, by now I believe your approach is the appropriate one (maybe not the fastest, but it works pretty well).
12:02
<
straight-shoota >
It allows defining a custom transform policy which could take care of your special sanitization need
12:03
<
frojnd >
straight-shoota: interesting
12:46
ua_ has quit [Ping timeout: 250 seconds]
13:00
ua_ has joined #crystal-lang
13:05
taskylizard_ has joined #crystal-lang
13:08
taskylizard has quit [Ping timeout: 268 seconds]
13:21
taskylizard_ has quit [Remote host closed the connection]
13:21
taskylizard_ has joined #crystal-lang
13:52
raz has quit [Ping timeout: 256 seconds]
13:59
raz has joined #crystal-lang
13:59
raz has quit [Changing host]
13:59
raz has joined #crystal-lang
14:56
rymiel has joined #crystal-lang
15:00
hightower2 has quit [Ping timeout: 265 seconds]
15:11
hightower2 has joined #crystal-lang
15:47
taskylizard_ has quit [Quit: Leaving]
15:54
HumanG33k has quit [Ping timeout: 265 seconds]
16:03
HumanG33k has joined #crystal-lang
20:15
ur5us has joined #crystal-lang
20:23
taupiqueur has quit [Remote host closed the connection]
20:23
taupiqueur has joined #crystal-lang
20:40
dmgk has joined #crystal-lang
21:40
<
riza >
is there an equivalent in crystal? looking at .hexbytes, but that returns a Bytes not a numeric of any sort
21:44
<
FromGitter >
<Blacksmoke16> `pp "0x0a".to_i prefix: true`
21:48
<
riza >
though it looks like that might be a new method in 1.2.2 and carc.in isn't updated
21:48
<
FromGitter >
<Blacksmoke16> pretty sure its been around for a while
21:49
<
FromGitter >
<Blacksmoke16> is it not working for you?
21:49
<
riza >
maybe I need to take an eye break and get a snack
21:50
<
FromGitter >
<Blacksmoke16> ah, need to `require "big"`
22:10
taupiqueur has quit [Ping timeout: 260 seconds]
22:29
hightower2 has quit [Ping timeout: 265 seconds]
22:53
wolfshappen has quit [Quit: later]
23:38
wolfshappen has joined #crystal-lang