<ukky>
braewoods_: idea to rewrite pkgutils in C bothers me for a few month now
<braewoods_>
It would be easier in C++. The only drawback I see to using C++ is library usage.
<braewoods_>
But pkgutils being a C++ library would be beneficial for prt-get.
<braewoods_>
But so would C.
<braewoods_>
ukky, so what phase are you at then?
<ukky>
Just started writing Makefile
<braewoods_>
that'll probably be fine. I'm using meson in this project due to it not being distribution or Linux specific.
<braewoods_>
But meson requires python and ninja which is a lot of dependencies for a core project.
<ukky>
I cannot use anything but 'make'
<braewoods_>
and C or C++ or shell.
<ukky>
No C++
<braewoods_>
Fair just saying it's available.
<braewoods_>
the main allure of C++ is the ease of data structure usage.
<ukky>
I know that it is an option, but that is the point, I don't want to use C++, though I know it fairily well
<braewoods_>
Indeed. I know a fair bit about the low level code that the existing pkgutils uses. I also know some performance improvements that can be made.
<ukky>
C++ is good as demo of possibilities
<braewoods_>
As for licensing, I'd suggest reusing the one that the old one does.
<braewoods_>
GPL tends to get pretty invasive but it's already what is used.
<ukky>
That is why I asked about licensing, but because this will be new code, any license can be selected.
<braewoods_>
Indeed but everything in CRUX I have seen is GPL. Though even if a license conflict could occur, it's unlikely to matter here.
<ukky>
I am new in CRUX community and don't know many things. It is easier to search main site and then ask here. The site says it is up to a developer
<braewoods_>
ukky, how much have you used C?
<ukky>
braewoods_: hidss seems well written
<ukky>
do you mean for how long and in how many projects?
<braewoods_>
roughly. C is a more difficult language than most.
<braewoods_>
You'll get a lot of crashes and memory corruptions if you're not careful.
<ukky>
I have been using C for a bit more than 30 years.
<braewoods_>
Ok so you're more experienced than I am.
<braewoods_>
Embedded C? Userspace C?
<braewoods_>
Most of mine is writing C code for a kernel's userspace.
<ukky>
But I also can reverse-engineer a program written in C, so I know what assembler code compiler generates.
<ukky>
Mostly embedded C
<braewoods_>
Ok. I can offer my assistance if you want to do try a joint rewrite. CRUX usually has more memory than a lot of embedded targets so I often write code that will use more RAM if it helps runtime in some way.
<braewoods_>
What C standards are you considering? I would suggest C11 at least.
<ukky>
I did a lot of Windows drivers development; also UEFI BIOS development; PCI BIOS development (assembler); but I didn't do much Linux development, just wrote some small tools
<braewoods_>
Ah so you don't know much about our C hosted environment then.
<braewoods_>
You may want to read up on POSIX.
<ukky>
My target is very strict C, maybe even C89
<braewoods_>
Why? We're using toolchains that can be used with C99 or newer.
<braewoods_>
And portability isn't our main concern for pkgutils.
<ukky>
C99 would be maximum I am willing to go, no C11 for me
<braewoods_>
ANSI C is fine but C99 did add a lot of useful stuff for developer convenience.
<braewoods_>
snprintf for one.
<braewoods_>
Ok fair enough. C11 doesn't really add that much as it is.
<braewoods_>
C99 is way more useful with what it added for macros and snprintf.
<ukky>
I try to avoid macros
<braewoods_>
What's your reason? I sometimes use convenience macros like my allocator one to aid with error checking to a degree.
<braewoods_>
But I avoid complex macros. I only use them for wrapping stuff in a minor way.
<braewoods_>
It's fine. Macros have always been a highly debated topic. C developers use them all differently.
<braewoods_>
I generally use inline functions instead.
<ukky>
I am not fan of the inlines either
<braewoods_>
Ok. Sorry to bother you then.
<ukky>
You are not bothering me
<ukky>
All devs have preferences
<ukky>
I can explain why I don't use macros nor inlines
<braewoods_>
Go ahead. I have some ideas of why for macros but not for inline.
<braewoods_>
Seeing as inline functions were added to replace macros in more contexts.
<braewoods_>
I also try to avoid macros but I sometimes find their convenience worth using them.
<braewoods_>
But they can obfuscate what's really going on for someone who is not familiar with the codebase.
<ukky>
exacly, macros hide stuff. If some developer in a team wrote a macro, every other developer has to lookup what is in that macro. Plus, you cannot set break point inside macro.
<braewoods_>
I see. I considered my macros mostly harmless because I just use them to rewrite a code pattern for very common things like allocating for an array and also checking that I entered the right type.
<braewoods_>
if the type is wrong i'll usually get a compiler warning with that approach.
<braewoods_>
ukky, do you ever use goto? I've found a single use-case in simplifying function exit cleanup.
<ukky>
wrt inlines, compiler can optimize code pretty good, if code is written properly. It is much easier to debug non-inlined code
<braewoods_>
also worth noting that inline strictly speaking is a compiler hint. the compiler is free to just ignore it. which GCC does when not optimizing.
<braewoods_>
GCC will inline functions at times when not even declared inline, particularly static ones.
<ukky>
there are more important thing to care about vs considering what to inline. Inlinung, imo, is just fine-tuning of already a perfect, bug-free, code.
<ukky>
wrt goto, it depends. My current project at work uses it, so I am using it too. And I have only 88 KiB to fit my program into.
<braewoods_>
Right. I figured. I think I developed different habits because my C targets can afford to be more wasteful.
<braewoods_>
Most desktop platforms have larger stacks. Linux defaults to 8MB typically.
<braewoods_>
In any case if I can assist you with understanding POSIX / Linux kernel apis when writing pkgutils let me know.
<ukky>
But I use a _lot_ of '#if defined' :)
<braewoods_>
I do that too sometimes but I often only need to do so for different unix kernels.
<ukky>
In your hidss there are quite a lot '#if' to accomodate multiple targets
<braewoods_>
ukky, yep. that's because I need to support different native HID kernel apis.
<braewoods_>
there's no standard abstraction I can leverage.
<ukky>
and Windows, and *BSD
<braewoods_>
hidraw is for Linux, one for windows, and uhid APIs that vary a bit across BSDs.
<braewoods_>
i consolidated all the unix ones into uhidraw because of how much common code fragments there were.
<braewoods_>
I can't say I enjoy writing for Windows but Windows 10 certainly added better utf8 support which makes it less painful to work with nowadays.
<ukky>
and 'meson' was you choice of build system because it is multi-platform?
<braewoods_>
Yes and it is an easy to use build system compared to things like autotools.
<braewoods_>
autotools has a lot of legacy stuff that is largely irrelevant to open source developers. They're mostly proprietary unix clones that I can't test code on anyway.
<ukky>
yes, autotools are so complicated. I used cmake for a few years, but now switched to 'make' because I need to cross-compile to different CPU arch.
<braewoods_>
meson can also get complicated but it is far easier to follow than autotools.
<braewoods_>
the 'cf' folder in my project has cross compilation for building with mingw toolchains.
<braewoods_>
So I can build for windows easier.
<braewoods_>
But for pkgutils meson is overkill.
<ukky>
yes, I have seen it. But it just redefines toolchains, right? Can you compile two different targets for different CPU in a single pass?
<braewoods_>
Kinda. You can make different build directories and have meson setup an appropriate build for each one. and run these in parallel.
<braewoods_>
But not in the same build directory.
<braewoods_>
They're treated as separate build environments and I totally understand why.
<braewoods_>
On Linux for example the libc is different for each CPU ABI.
<braewoods_>
ukky, I can link you to the document about x86-64 ABI for unix clones which Linux also uses if you want to try making direct system calls.
<ukky>
It is like cmake then. Single pass, single build directory, single toolchain.
<braewoods_>
Though I normally don't recommend it for the sake of portability. bypassing libc is usually unwise.
<braewoods_>
'system interfaces' specifics the C stuff.
<braewoods_>
A lot of it is ISO C.
<ukky>
Is this the same web-site that has POSIX shell specs?
<braewoods_>
In general you may be better off using the manpages instead for specific functions.
<braewoods_>
Yes.
<braewoods_>
Because in practice you care about what glibc implements more so than POSIX itself at the C level.
<braewoods_>
and glibc has a lot of GNU, BSD, and POSIX extensions on top of ISO C.
<ukky>
Because I remember CSS style, somebody from Void Linux have sent me a reference to that web-site about dash specs.
<braewoods_>
Some of which are very convenient, like asprintf.
<braewoods_>
or getline.
<braewoods_>
getline is like fgets but uses a dynamic buffer instead.
<braewoods_>
asprintf dynamically allocates the formatted string.
<ukky>
for fast pkgutils, asprintf then cannot be used.
<braewoods_>
Indeed. It would be better to just use strdup after building it in a stack buffer.
<ukky>
even strdup would slow down pkgutils
<braewoods_>
Erm. What are you thinking of? the way I use strdup is so limited that the slowdown is generally a rounding error.
<braewoods_>
When I was attempting to rewrite pkgutils, I only used strdup to make a permanent copy of the text database into a usable data structure.
<braewoods_>
Anything temporary was stack based.
<braewoods_>
As anything relying on malloc() can end up becoming a bottleneck.
<ukky>
thinking about pre-allocating big chunks of memory for database via malloc
<braewoods_>
I see. In that case I have a different suggestion.
<ukky>
i.e. no small malloc's
<ukky>
what is your idea?
<braewoods_>
Given the existing database format, you could mmap the database text file into your program memory space and then replacing newlines with zeros to get viable strings. You just need to dynamically allocate the structures so the strings are organized correctly.
<braewoods_>
less memory waste.
<braewoods_>
The downside is you can't easily edit the database in this way.
<braewoods_>
As you can't just 'free' it like regular memory blocks.
<ukky>
we would still need to add sorting algo to that mmap
<braewoods_>
The sorting can be applied to the dynamic data structure that manages it all. Plus the existing program already sorts the data because it uses binary search.
<ukky>
we need to insert/remove items into/from database
<braewoods_>
I am sure you know of qsort and bsearch already.
<braewoods_>
Array based functions we could leverage.
<ukky>
my idea is to implement double-linked list plus some sorting algo, like binary search, but it should be manual, you cannot call bsearch()
<braewoods_>
why do you prefer a linked list? arrays are generally faster on modern hardware.
<ukky>
arrays are harder to expand
<braewoods_>
fair point. even the best mitigations only do so much.
<braewoods_>
linked lists can always use a pool allocator to mitigate malloc overhead.
<ukky>
is 'pool allocator' like virtual memory, mapped upon first access?
<braewoods_>
No. It's a design pattern used in some C programs for fixed-size allocations. Allocations are returned to a pool and recycled instead of going to malloc every time.
<braewoods_>
I've seen it used in realtime code databases such as game servers or MUDs.
<braewoods_>
Linked lists are a popular target for these.
<braewoods_>
If the pool has no allocations available, it makes one using malloc. Otherwise it uses one from its reserves.
<braewoods_>
It has similar semantics to malloc and free.
<braewoods_>
But it has the benefit of not needing to search for available memory.
<ukky>
But this pool allocator is part of your code then?
<braewoods_>
Currently no. But I have used them before.
<braewoods_>
The backing memory can even be static memory.
<ukky>
I mean a developers code, like it will be part of pkgutils
<braewoods_>
Oh, for a project like pkgutils it may have little to offer. But if you frequently need to build linked list nodes, it's a very valuable feature.
<braewoods_>
pkgutils is a quick run situation typically so it may not offer much speedup.
<braewoods_>
Due to setup costs.
<braewoods_>
I have only ever used pool allocators in programs that run for long periods.
<ukky>
There are some design pattern terms that I just don't know about.
<braewoods_>
Understandable. I studied a lot of memory management to help me better design my C programs. Most of them are specialized so they have limitations for general use.
<braewoods_>
And you are probably used to writing code for space efficiency rather than speed efficiency. So many of these wouldn't be useful since they are more wasteful with memory.
<ukky>
In my case I mostly never allocate memory on embedded systems, just reserve static storage.
<braewoods_>
Right. I've also written pool allocators that used static memory.
<ukky>
But some time ago I was part of the team working on HW-accelerating OpenGL driver, and memory strategy was very important
<braewoods_>
You reserve memory for a fixed number of a given type and have another array that contains the addresses of all of them organized as a stack. Then you push or pop addresses to that stack to manage the memory.
<braewoods_>
That's how it worked for my pool allocator.
<braewoods_>
Very simple way if you don't want to use malloc for linked lists but you have to reserve memory in advance so it has its own problems.
<braewoods_>
OpenGL... ARM? Most of the Linux OpenGL stuff is open source.
<braewoods_>
The only proprietary drivers I know of are on ARM.
<braewoods_>
or NVIDIA.
<ukky>
No, it was OpenGL for Windows, X86_64
<braewoods_>
Oh. Was this the application side instead of a hardware driver?
<braewoods_>
I know little about using OpenGL besides as a desktop user.
<ukky>
and it was almost 20 years ago. And no, it was HW OpenGL driver, i.e. implementing OpenGL API, not using OpenGL API
<braewoods_>
Ah. The kernel side.
<ukky>
yes
<ukky>
My idea for pkgutils was to pre-allocate space for about 500 double-linked strictures per-package via malloc, and then add more if needed
<braewoods_>
Ok. The headers you'll need for most file descriptors is as follows:
<braewoods_>
unistd.h
<braewoods_>
sys/stat.h
<braewoods_>
fcntl.h
<ukky>
one structure for one package
<braewoods_>
open(), close(), read(), write() are the basic file descriptor APIs. you can open more than just regular files.
<braewoods_>
ioctl() is sometimes used for kernel APIs to device files.
<braewoods_>
Especially on BSDs.
<braewoods_>
Linux also uses them but not as much as BSD due /sys eliminating much of the need for ioctl().
<braewoods_>
You can just read the data as regular files under the special /sys sysfs filesystem.
<braewoods_>
Very shell friendly.
<ukky>
Do we need to handle device files in pkgutils?
<braewoods_>
I would say no, unless you are interfacing with the terminal.
<braewoods_>
But current pkgutils doesn't do any special terminal stuff.
<braewoods_>
Terminals are manipulated usually to create special visual effects or to create a UI.
<ukky>
pkgadd seems like uses libarchive to create new files
<braewoods_>
It probably does. I have also wanted to change some of pkgutils behavior during database operations.
<braewoods_>
Such as the ability to recover from an interrupted operation.
<ukky>
like what? (me too)
<braewoods_>
Currently it has no way to recover if pkgadd or pkgrm is interrupted.
<braewoods_>
Such as power failure.
<ukky>
I didn't know that
<braewoods_>
Also the ability to process more than one package per pkgrm or pkgadd run.
<braewoods_>
This is a slowdown due to the repeated setup costs.
<ukky>
agree
<braewoods_>
If you can process multiples in one run of the program, you can reduce runtime somewhat.
<ukky>
it would be much faster to handle all 'add' or 'rm' operations in a single pass
<braewoods_>
Which brings me to another bottleneck that isn't obvious to new shell scripters.
<braewoods_>
Shell is one of the slowest things out there but it is one of the easiest ways to combine unix utilities for building packages and the like.
<braewoods_>
And the main reason comes down to the overhead of setting up command pipelines.
<braewoods_>
If you know revdep, it used to be 100% shell script. The rewrite I did reduced runtimes by over 90%.
<braewoods_>
You may be able to reuse some of revdep to start pkgutils C rewrite.
<braewoods_>
The database reader at least.
<braewoods_>
Oh. I forgot. It's C++ right now.
<ukky>
revdep is/was a good candidate for C/C++ implementation, instead of shell
<braewoods_>
rofl
<braewoods_>
But you can use it to get an idea of how to read it at least.
<braewoods_>
pkgutils uses a BSD extension, flock.
<braewoods_>
it's a kernel level lock instead of a lock file.
<ukky>
okay
<braewoods_>
it's invisible to anything that isn't using the same API.
<braewoods_>
and if you didn't know, unix file descriptors are all of type int. They are integers that are an index into the kernel's file descriptor table for your process.
<braewoods_>
They are guaranteed to be between 0 and one less than your process' file descriptor limit.
<braewoods_>
Inclusive between.
<braewoods_>
Most processes have 3 file descriptors already open by the time main is called.
<braewoods_>
stdin, stdout, stderr.
<braewoods_>
0, 1, 2.
<ukky>
thanks, I have seen 'int fd' when declaring file descriptors
<braewoods_>
FILE is a wrapper around the native file descriptors.
<braewoods_>
POSIX also has some special FILE creator functions.
<ukky>
I understand that, it is just new C type
<braewoods_>
fmemopen for one.
<braewoods_>
and open_memstream.
<braewoods_>
All IO operations go to memory.
<braewoods_>
And are compatible with existing STDIO function calls.
<ukky>
I see, open_memstream() is declared in stdio.h, and defined in libc
<braewoods_>
I'd suggest using the manpage to know the details though.
<braewoods_>
libc functions are usually well documented.
<braewoods_>
it's also possible to construct FILE from a file descriptor if you wanted to open it first yourself for some reason.
<braewoods_>
fdopen() does this.
<braewoods_>
You may also want to know about opendir() and closedir(). These are used to iterate over the file names of a directory.
<braewoods_>
Oh and readdir()
<braewoods_>
dirent.h is their header.
<braewoods_>
fdopendir() also for existing file descriptors.
<braewoods_>
One reason these exist is so you can control how the file descriptor is opened, like if you wanted to use openat() to open relative to an open directory.
<braewoods_>
ftw() and nftw() for file tree walking aids.
<braewoods_>
I have used these to implement a reduced version of 'rm' before.
<ukky>
Those funsctions for processing directory entries are used in farkuhar's version of 'prt-get sync'
<braewoods_>
yea they probably are. there's only one low level API for directory access after all.
<ukky>
he enumerates port drivers in /etc/port/drivers directory
<braewoods_>
port drivers... does that mean the transport method? http and such.
<ukky>
no, just executing port drivers, instead of 'ports -u'
<braewoods_>
Oh.
<ukky>
braewoods_: thanks for all the info. If your are interested in creating C version of pkgutils, we should cooperarate. It is time to get some rest for me.
<braewoods_>
ukky, ok.
groovy3shoes has joined #crux-devel
groovy2shoes has quit [Read error: Connection reset by peer]
SiFuh has quit [Remote host closed the connection]
SiFuh has joined #crux-devel
ppetrov^ has joined #crux-devel
ppetrov^ has left #crux-devel [#crux-devel]
groovy3shoes has quit [Remote host closed the connection]
groovy2shoes has joined #crux-devel
groovy2shoes has quit [Ping timeout: 276 seconds]
groovy2shoes has joined #crux-devel
farkuhar has joined #crux-devel
groovy2shoes has quit [Remote host closed the connection]