dude12312414 has quit [Remote host closed the connection]
Vercas has quit [Remote host closed the connection]
gxt_ has quit [Remote host closed the connection]
gxt_ has joined #osdev
Vercas has joined #osdev
nyah has quit [Ping timeout: 260 seconds]
frogurt_vendor is now known as kingoffrance
<MA-SA-YU-KI>
I wasn't trying to spam people. I am indeed doing a study for my Masters degree at the University of Waterloo.
<MA-SA-YU-KI>
I wanted to contact a moderator to see if this was alright to post, but there was no moderator on here so I pasted here. Or is there a channel or discord community you can recommend?
<Mutabah>
Really, the biggest issue is that you just slapped a pastebin with zero context - it reeks of low effort
<gog>
mew
<heat>
woof
liz has joined #osdev
wowaname is now known as opal
heat has quit [Ping timeout: 248 seconds]
gog has quit [Ping timeout: 255 seconds]
zaquest has joined #osdev
<MA-SA-YU-KI>
Sorry about that. I am on IRCCloud and it recommended that multi-line messages should be as a paste-bin. I assumed I was following IRC etiquette by complying with the recommendations since it's my first time on IRC.
<MA-SA-YU-KI>
Again, my apologies. I was not trying to spam this group.
tsraoien has quit [Ping timeout: 255 seconds]
<Mutabah>
It's a balance.
<Mutabah>
Large walls of text should be avoided, or put in a pastebin
<Mutabah>
But - they should be prefixed by a reason why someobody would want to open that link
chartreuse has quit [Remote host closed the connection]
<geist>
it's deleted so i dont know what the topic was
<geist>
MA-SA-YU-KI: i have a relative at waterloo, visited last year
<geist>
it's a... cold place.
<geist>
in the winter at least
kkd has quit [Quit: Connection closed for inactivity]
<Mutabah>
geist: Iirc it was a survey/study on effectiveness of copilot in solving problems?
<geist>
ah
<geist>
is copilot one of those ML based code gen things?
<Mutabah>
yeah, Github Copilot
<geist>
ah kinda outta principle i dont wanna touch one of those
<geist>
though i suppose it might be maybe helpful when learning a new language
<geist>
or maybe it just teaches you bad habits
GreaseMonkey has joined #osdev
elastic_dog has quit [Ping timeout: 255 seconds]
elastic_dog has joined #osdev
GeDaMo has joined #osdev
<zid>
I'm sure it's great for gluing frameworks together
<zid>
but I worry if you use it for anything technical that it will spit out good looking but ultimately flawed code
<zid>
and I am *much* better at writing silly bithacks than reading them
<zid>
plus there's a whole shit load of legal rammifications
MiningMa- has joined #osdev
MiningMarsh has quit [Ping timeout: 256 seconds]
MiningMa- is now known as MiningMarsh
papaya has joined #osdev
papaya has quit [Quit: leaving]
papaya has joined #osdev
pretty_dumm_guy has joined #osdev
Likorn has joined #osdev
the_lanetly_052 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 260 seconds]
Likorn has quit [Quit: WeeChat 3.4.1]
Likorn has joined #osdev
Likorn has quit [Client Quit]
Likorn has joined #osdev
Likorn has quit [Quit: WeeChat 3.4.1]
papaya has quit [Quit: leaving]
terminalpusher has joined #osdev
<mrvn>
zid: legal rammifications? Does the AI own your code? :)
<Mutabah>
mrvn: The AI was trained on a wide corpus of open-source code, and has been known (... with some priming) to reproduce significant chunks of its training set
<Mutabah>
Which leads to the question - is the AI a derivative work of the training set, and is something produced by the AI also a derivative work?
<mrvn>
Even if the AI isn't, is the output?
<mrvn>
Is copilot just a neural net and pattern matches ASTs?
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
eroux has joined #osdev
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
<mrvn>
Can I write this shorter for a vector? [](const auto &v) { return v.front();}
<Mutabah>
don't think so - if you're doing it a lot, you could make a helper function?
terminalpusher has quit [Remote host closed the connection]
the_lanetly_052 has quit [Remote host closed the connection]
<mrvn>
Would be nice if one could pass a member function pointer to sort()
kkd has joined #osdev
gareppa has joined #osdev
Vercas has quit [Remote host closed the connection]
gareppa has quit [Remote host closed the connection]
Vercas has joined #osdev
dennis95 has joined #osdev
gog has joined #osdev
arch-angel has quit [Ping timeout: 268 seconds]
arch-angel has joined #osdev
arch-angel has quit [Read error: Connection reset by peer]
arch-angel has joined #osdev
arch-angel has quit [Remote host closed the connection]
arch-angel has joined #osdev
nyah has joined #osdev
<MA-SA-YU-KI>
mrvn The study is looking at the quality of code developer's produce when they are assisted by CoPilot versus when they are not. We are specifically looking at systems programmers.
<mrvn>
That was my question. I was aksing how CoPilot works.
liz has quit [Quit: Lost terminal]
<MA-SA-YU-KI>
mrvn For how CoPilot works, you add it as a VSCode extension. And once it's enabled you can write a comment let's say `// print out hello world 8 times but append an emoji on even lines` and it would spit out a suggestion. You can use "TAB" to accept a suggestion, "ALT + ]" for the next suggestion, "ALT + [" for previous suggestion or "ESC" to reject a suggestion.
<mrvn>
That's how you use it, not how it works.
<MA-SA-YU-KI>
Oh right. That's probably beyond my understanding at the moment. But as far as I know there are these things called language models i.e. GPT-3. So OpenAI trained GPT-3 model on a corpus of open-source code from Github and called it Codex. That's really as far as I understand from the papers they published. I can attach links to Codex research paper if you're interested.
<MA-SA-YU-KI>
mrvn
<mrvn>
So basically it cut&pastes code snippets from the learning set. That makes the answers GPL if any of the learning set is GPL. Very dangerous if you aren't writing GPL code.
<MA-SA-YU-KI>
mrvn I don't think it cuts and pastes code snippets from the learning set. Language models are more complex than they seem tbh. Sure there was a recent legal debate about it. They used the terms "copyleft" and "licenses". But I think that CoPilot released a FAQ about the ownership of the code you write using the tool. Not sure if that's a good enough legal framework.
<MA-SA-YU-KI>
geist You could be right about it either teaching you a new language or teaching you bad habits. But I think this would apply largely to beginner programmers or students. But the thing is, we don't really know. That's kind of one of the research questions we are looking at. Does experience affect the quality of code when using CoPilot? So far it's only been students that have participate in my study so I still don't know.
gildasio1 has quit [Quit: WeeChat 3.5]
<mrvn>
It will highly depend on what you feed it to train it. If you onlöy hand pick good sources as input it's likely to learn good practices. If you feed it random open source projects from github then there will be tons of bad habbits in the input. No way for the AI to learn good habits from that.
<mrvn>
And then there is the problem when you have 2 ways of doing the same small thing. And which way is a good habbit depend on the larger context. If the AI doesn't consider a large enough context then it can't learn which solution is the good one.
<mrvn>
Remember when they put up an AI to learn and respond to twitter posts and within hours it was spewing hate speach? Garbage in, garbage out.
<mrvn>
MA-SA-YU-KI: can you tell CoPilot what version of c++ you want to use? C++03/11/17/20 are quite different.
<zid>
Apparently what it's super useful for is writing documentation
<mrvn>
oh that might be useful. see what other monkeys typed as comments for some code
<MA-SA-YU-KI>
It's largely framed as programming assistant. Well technically, I can tell it to use the a specified CPP standard. If I have a "CMakeLists.txt" file, I can go in and type as a comment "# use c++ 17 standard" and it would give me an appropriate suggestion.
<MA-SA-YU-KI>
I personally used this approach when I was designing the C++ problems for the study. Of course I have to make sure that the VSCode makes the ".cpp" file aware of the CMakeList.txt
<MA-SA-YU-KI>
mrvn
<j`ey>
what did it generate that was useufl to you?
<j`ey>
adding the flag is easy, generating actual C++ that's compliant to that version is different
<mrvn>
This could be a fun afternoon: Generate random code from CoPilot and see if it does anything useful.
<MA-SA-YU-KI>
One interesting thing we did was ask it to create a simple mapreduce program for a matrix multiplication in Java.
<mrvn>
maybe you shoul write an optimizer
<zid>
glue and boilerplate it can just straight up give you out of a corpus, and it's more like a free set of macros :p
<MA-SA-YU-KI>
Initially, it spat out rubbish until I went to the top of the .java file and added a comment. // import the necessary packages for a simple mapreduce program
<MA-SA-YU-KI>
Also, I do appreciate the discussions we are having. I get the sense there is quite some apprehension to CoPilot or even going near it.
<mrvn>
have you tried feeding it stackoverflow questions and seeing if the answers are any good?
<MA-SA-YU-KI>
I initially thought the point of contention would be that "CoPilot would put developers out of a job". But from my usage, it's not even close.
<j`ey>
i would never use it
<j`ey>
and I can't imagine anyone I know would
<MA-SA-YU-KI>
mrvn I have not tried that tbh. I have my doubts that it would bring anything useful for something that was asked on SO. SO questions are notoriously more complex than they appear to be.
<MA-SA-YU-KI>
j`ey even for 1 hour for a study for someone's master's thesis?
<mrvn>
My feeling is that CoPilot could fix broken code but at best it can create average code. That's what is learns: "What code does everyone else use?" Unless you feed it only excelent code it will always be mediocre.
srjek|home has quit [Ping timeout: 248 seconds]
<j`ey>
MA-SA-YU-KI: well its not free anymore right?
<zid>
MA-SA-YU-KI: cobol proved that wouldn't be the case 50 years ago.
<zid>
The *point* of cobol was to be able to write "business logic" into natural language.
<zid>
It turns out, the problem with business logic is that people have no idea how to write it down in the first place
<zid>
not the language they use
<mrvn>
zid: The problem of "Do what I mean, not what I said"
<MA-SA-YU-KI>
j`ey there's a 60 day free trial. So you can use it for the study and cancel the free trial immediately after the study. But students with the Github Student Developer Package can use it for free, and verified open-source contributors can also use it for free. But there's a 60-day free trial and you don't pay anything.
<j`ey>
I see, but no I'm too lazy to sign up
<zid>
I don't write boilerplatey code that I feel it'd be exceptionally useful for
<mrvn>
MA-SA-YU-KI: Do I have to give my credit card number to sign up?
<zid>
if I were a javascript programmer plugging frameworks into each other or whatever I bet it did 90% of my job for me
<mrvn>
Might be good detecting code that should be using c++ algorithm
<mrvn>
or ranged-for
<MA-SA-YU-KI>
mrvn unfortunately yes (I already did that because I didn't know about the student developer pack thing at the time). But yes, you put your card and then it says you have a 60-day free trial. And then you click cancel, so it doesn't charge you after the 60 days are up.
<mrvn>
MA-SA-YU-KI: another idea to maybe look into: Use it to upgrade legacy code to modern.
<zid>
buut even then, legal issues
<mrvn>
MA-SA-YU-KI: too close to oh so many scams. I wouldn't do it just for that.
<zid>
as it's well known that if you start typing various very-copyrighted pieces of source that are famous, copilot will just straight paste them in
<zid>
I've seen the video tweets
<zid>
// fast inverse sq *carmack's intestines tumble onto the screen*
<mrvn>
zid: I bet that's in there to show how helpfull it can be because that's surely something tgesters would ask. There is a reason such codes are famous and copyrighted.
<MA-SA-YU-KI>
mrvn You subscribe on Github website on your Github account so it's Github that manages the payments it's no less secure than subscribing for Github Pro. I don't know if there is an option for people to use my own Github account but.
<mrvn>
If you create a metric how close some source it to something CoPilot suggest for you would have a metric for plagarism.
SpikeHeron has quit [Quit: WeeChat 3.5]
<mrvn>
MA-SA-YU-KI: what comes out when you type in // Miller Rabin primality test?
<mrvn>
Can you paste that somewhere?
SpikeHeron has joined #osdev
<MA-SA-YU-KI>
mrvn there's a FAQ that kind of answers your question about plagiarism. The frequently asked questions section of https://github.com/features/copilot
SpikeHeron has quit [Client Quit]
SpikeHeron has joined #osdev
<bauen1>
MA-SA-YU-KI: Where does it mention plagiarism ? The only think I can find is that it can (and will) reproduce some input code verbatim. And I don't think taking the word of the company that wants to sell you something is a good idea, obviously they will tell you it's fine (or at least tip toe around the problem as much as possible), after all they have little incentive to do otherwise.
<j`ey>
I can't see how you could use this inside a company
<j`ey>
lmao return 0
<j`ey>
very nice
<MA-SA-YU-KI>
I did the int main and return 0 myself. Then added the comment for Miller Rabin primality test.
<MA-SA-YU-KI>
j`ey is return zero cringe or?
<j`ey>
oh
<j`ey>
MA-SA-YU-KI: what about 'write a function to determine if a number is prime'
<bauen1>
Also, if copilot was trained on code under a certain license, doesn't that make it "a derived work of that code", I guess the answer is no ?
<MA-SA-YU-KI>
bauen1 it says here that "GitHub Copilot is a tool, like a compiler or a pen. GitHub does not own the suggestions GitHub Copilot generates. The code you write with GitHub Copilot’s help belongs to you, and you are responsible for it."
<j`ey>
aka it might be under a license
<bauen1>
MA-SA-YU-KI: Well, what does the "and you are responsible for it." imply ?
<MA-SA-YU-KI>
j`ey I'll put that comment in a function called 'is_prime that takes an int and returns a bool'. Is that fine?
<j`ey>
sure
<j`ey>
MA-SA-YU-KI: btw copilot was explicitly banned where I work
jafarlihi has joined #osdev
<GeDaMo>
I expect most companies will ban it
<jafarlihi>
Anyone knows what dumping means in context of netlink and how it is different from a normal request?
<bauen1>
same for the module where I'm a TA in university, also "don't use it" with an implied: "If you do use it an there's a problem you will be the first person that has to figure out the legallity of it", i.e. why companies are already banning it to avoid being the first to find out if they can use it.
<MA-SA-YU-KI>
j`ey seems like I didn't have to add the comment, the first suggestion after putting the function signature is what you see above. I also did not compile this. https://www.irccloud.com/pastebin/ftzJM4XR/
<nur>
I can't even think about turning it on without Ben Kenobi's disembodied voice in my ear going "turn off the copilot...uuuse the foorrce"
<GeDaMo>
It's not very efficient
<MA-SA-YU-KI>
j`ey on closer look it seems rather inefficient. I feel like if I explicitly tell it to use the gcd implementation of a primality test.
<MA-SA-YU-KI>
bauen1 I can understand why banning CoPilot would make sense for a company. Interestingly enough, I don't know if students for the course I am TAing know about CoPilot. Again, it's a first year programming course in Python. But from the office hours, it doesn't seem like it.
<j`ey>
I just find it bizzare
<MA-SA-YU-KI>
I get the sense that finding professional systems programmers to participate in the study would extremely difficult if not impossible. I was thinking about talking to members of serenityOS but I was warned about their community so I am holding off on that.
<MA-SA-YU-KI>
But thank you for all the feedback.
<j`ey>
well that in itself might count as some results
<bauen1>
MA-SA-YU-KI: well, if a student turns in code that contains e.g. 10-50 lines of code verbatim copied from a project on github, but does not attribute it, who is at fault ?
<clever>
bauen1: something that may help in detecting that, is grep.app
<clever>
its a search tool, that lets you search for strings in almost any project on github
<clever>
faster and less error prone then the github native search
<MA-SA-YU-KI>
bauen1 and I think at UWaterloo they use MOSS.
<clever>
i mostly use it to find the source behind an error message
<bauen1>
clever: oh interesting, i'll bookmark that, until now i've mostly used debians source code search
<bauen1>
clever: but i'm more looking at this from the perspective of the person that has to grade the code, so assuming that it was already turned in
<clever>
yeah
<clever>
grep.app may help to find out if the code being submitted is copied
<bauen1>
if you argue that it was the students fault, then you can't ever use copilit (except in some very small niches where copyright does not matter), but if it wasn't the students fault, then what is the difference between copilot and just copying some source code, and just how close can you get to the thin line that seperates these two
<mrvn>
MA-SA-YU-KI: That is a horrible primality test.
heat has joined #osdev
<mrvn>
Whats the gcd implementation of a primality test?
<mrvn>
Odd that it found the wikipedia link for Miller Rabin but not any implementation
<MA-SA-YU-KI>
bauen1 I feel like that argument is on the spectrum of using autocomplete and intellisense tools like TabNine. Where copying code directly from somewhere is one end and manually writing every line of code without using any documentation whatsoever. I feel like using CoPilot is somewhere on that spectrum.
<heat>
CoPilot is copying code directly
<heat>
you're not even aware of where it's coming from so you can't attribute copyright
<heat>
it's a stupid idea
<heat>
very legally dubious
<mrvn>
I would want to use CoPilot with a private instance. Give me the untrained tool, I will add a training corpse of my own choosing and the let me use it.
<mrvn>
heat: I would even think the CoPilot is to blame when copying code because it gives it without creditation.
<clever>
basiacally, you give it the type of a function (haskell only), and some testcases it must meet
<heat>
If I use copilot and it starts giving me linux's GPLv2 nvme driver, who's at risk?
<heat>
I can tell you who, it's me
<mrvn>
If I had lots of time and money and CoPilot gave out lines from my code then I would sue them.
<clever>
it will then brute-force combine functions to meet the type, and then test if they meet your requirements
<clever>
and auto-generate code for you
<mrvn>
clever: that seems to only be feasable for some small problems.
<mrvn>
and very good and complete unit tests
<clever>
yeah, its only good for short bits of code
<clever>
like say how to turn "foo bar" into ["foo","bar"]
<mrvn>
I have something like that for boolean expressions and 74xx gates.
<clever>
i have manually worked it out with 74xx gates before, first write up an entire truth table, every possible input, and the desired output, then look for places where you can shove in an AND gate and such to combine inputs
<heat>
mrvn, how can you know it was copilot who gave out your code and it wasn't copied manually?
<heat>
how little of a snippet does it need to be in order to be validly copyrightable?
jafarlihi has quit [Quit: WeeChat 3.5]
<clever>
> "foo bar".split(" ")
<clever>
mrvn: the solution i wanted, implemented in JS :P
<heat>
it's a super legally dubious field IMO
<mrvn>
clever: that's one of the normal forms. Gives you huge AND/OR gates with many inputs but then has constant depth. Since you usualy don't have a 12 input AND gate you end up with deep nets.
<mrvn>
clever: and you don't take advanate of NAND, NOR, XOR, NXOR, .. gates.
<clever>
yeah, thats where automation could help
<clever>
oh, and now i remember what i was doing with that
<clever>
it wasnt logic gates, it was an if statement, lol
<mrvn>
My code has a list of 74xx chips I own and tries to use the minimum number of chips. Often a chip has multiple instances of a gate type.
<clever>
same basic idea
<clever>
yeah, thats where automation can help speed it up
<mrvn>
well, it's not exactly speedy or memory friendly. It's grows exponentially.
<clever>
well, compared to doing it by hand
<heat>
MA-SA-YU-KI, what's the problem with serenityOS's community?
<j`ey>
heat: only issue i have is that they dont provide the iso :P
<heat>
j`ey, it's also not written in... you know
<heat>
does it even support arm64
<j`ey>
heh
<MA-SA-YU-KI>
heat When I was looking for systems devs, I went on the discord for SkiftOS. They thought I was a scammer. But after that was cleared up, they suggested I head to osdev or SerenityOS but "be warned SerenityOS has a toxic community".
<j`ey>
i think nico added some skeleton stuff for arm64
<heat>
ah, no, it supports aarch64
<j`ey>
looks like it's for rpi4 only
<heat>
the one issue I have with serenity OS is that it's technically dubious
<j`ey>
oh?
<heat>
the kernel is a bit primitive AFAIK
<heat>
most of the work goes elsewhere
<j`ey>
ah well, the kernel is just one part of an OS :P
<heat>
maybe technically dubious was the wrong term to use
mzxtuelkl has joined #osdev
<heat>
most of the work goes to weird shit
<heat>
like js
<GeDaMo>
Are they using JS for the UI? Because I could sort of understand that
<j`ey>
no, they have a browser
<MA-SA-YU-KI>
heat I hear their trying to implement a browser that runs on their OS.
<heat>
well, sure
<heat>
but its an OS, not a web browser
<heat>
its feature-creep galore
<GeDaMo>
Is the browser the OS? :P
<j`ey>
well they seem to have enough peopleworking on it to make it work sooo
<heat>
most of the work goes into flashy shit like the web browser, GUI apps
<heat>
i simply don't appreciate that
<heat>
im a kernel weirdo
<j`ey>
just different goals
<heat>
like at some point they could just replace their kernel with linux and nothing would change
<heat>
and what they're doing is fine, but it's just not my cup of tea
<heat>
(also the way they capitalize everything gives me chills)
<j`ey>
KernelObjectBufferManagerFactory
<heat>
LibC
seer has joined #osdev
seer has quit [Quit: quit]
seer has joined #osdev
<bauen1>
mrvn: it might be enough to have your own code on github and sue either github or someone using copilot on the grounds that copilot is derived from your code ? Although that is much more fragile than "someone copied my code without attribution"
<mrvn>
bauen1: the data set copilor produces when teached is derived from the code I would say. But that is fine, the data set is GPL. Nothing for me to sue about.
<mrvn>
it's when they distribute the code that they violate the license.
<bauen1>
mrvn: but the data set includes code that has licenses incompatible with the GPL, so you can't actually distribute the data set (so you can't distribute copilot) without violating the licenses of the codes used ? I'm sure I'm missing something because they obviously had lawyers that signed off on at least the distributing copilot part
sympt has joined #osdev
<mrvn>
bauen1: where do they distribute CoPilot? Isn't that a webservice?
<bauen1>
mrvn: oh, i missed that, but then what about code licensed under the AGPL ?
<MA-SA-YU-KI>
mrvn I don't know if it's a webservice but it's generally used as a VSCode extension but I haven't tried using it without internet access though.
<mrvn>
bauen1: not my problem.
<mrvn>
MA-SA-YU-KI: how many GB is it? the data set must be huge.
Vercas has quit [Quit: buh bye]
<bauen1>
mrvn: true, but that just got me thinking, if instead of going after straight up copy pasting, couldn't you argue that any code copilot generates is derived from _all_ the data it was trained on, in fact it would be very hard (or rather impossible as far as i understand) to proof it is not.
<bauen1>
i really need to sign up for a course that covers copyright law, i find it very interesting
<bauen1>
I really can't wait for the first case about copyright concerning copilot, irrespective of the way it ends