[Dev] [RFC] Yet another fullpkg/treepkg/libretools replacement

Michał Masłowski mtjm at mtjm.eu
Thu Nov 1 20:17:16 GMT 2012


Hello.  The following long text are my raw notes for deepbuild, a
reimplementation of a part of libretools used for building mips64el
packages that I plan to write.

As these notes should suggest, my main motivation for this project is to
implement solutions of some problems that I personally consider too
difficult to solve in fullpkg and related scripts.  Some other problems
that could be more easily solved using the designs described here might
be just fun to implement, maybe they are not really needed.

I don't remember now the specific problems which motivated me to choose
some of the approaches listed below, so I'm not able to answer "this is
too complex" questions now.

What other problems does this have?  Are there problems with ordering
mips64el builds or processing PKGBUILDs for which these methods would be
bad?


A set of tools for ordered building of Parabola packages.


* TODO to plan
** code modules
** formats of databases, etc
** iterative milestones


* design goals
** in Python 3, GPL3+-licensed
** fast
** easy to maintain
** tested
** documented
** not too Parabola-specific
e.g. should be possible to reuse some parts for a more portable distro

look for clean solutions, don't use hacks specific to how Arch
packages are typically developed (like parsing shell scripts with
regular expressions or comparing build dates)


* host-like object names
could use different names
** master
has the graph, does fetching of sources and packages, sends jobs to
chroots, releases packages
** chroot
just builds a package, doing network communication only with the
master


* TODO abs crawler
(I already have a script for this, it just gets the metadata of
PKGBUILDs in parallel using a separate shell process for each one)

check that it sources the PKGBUILDs in their directories

support quickly and reliably finding which PKGBUILDs were updated, so
it can be automatically done for each invocation of a command using
the database

initially just require running it manually after abs changes


* TODO graph construction
understanding the set of packages to build as a tree leads to complex
code and many potential problems

depends, makedepends and checkdepends for outdated packages

only depends for non-outdated packages (can change their
installability)

a package depends on possibly virtual packages

which package provide a package: use x86_64 pacman dbs for this

packages not in abs are always up to date, mips64el arch=any packages
are an example; get their dependencies from pacman dbs

choosing the provider: ignore ones replaced by other providers, use
the "first" remaining one found

ignore the existence of package conflicts, it will result in a build
failure if it's a problem (otherwise could have a difficult and slow
graph generation algorithm, could not get a single graph for all
outdated packages)


* TODO graph build
a topological sort, could schedule several packages at once at
different chroots (probably on different hosts)

assume all enabled chroots are equal and independent, ignore all the
theory of scheduling

build only already fetched packages, wait until one is fetched if none
are available if the fetcher works (error otherwise)

package fails before build: save the log (i.e. dependency installation
problem)

package fails during build/check/package: save the build directory

package fails: keep building packages which don't depend on it

package succeeds: stage it (also add to a local repo on the chroot)
and save logs


* TODO chroot management
can be remote
** use a local repo
for packages not released yet; keep it on the master
** operations
*** create a chroot, run a job in a chroot
maybe replace mkarchroot

have no network access in the chroot, except to a single program
sending the tasks and the loopback (for package testsuites)

have premade /etc/makepkg.conf and /etc/pacman.conf, use only a single
repo from the master
*** change packages to listed ones (or groups) with dependencies and upgrade
can fail for many reasons, consider them failing the package

problem: might be interactive (parabola-keyring), find a workaround
*** build single package
send a full source package to nonlocal chroots or assume a shared
filesystem?
*** pacman cache management
will introduce multiple packages of the same version, should also
avoid downloading multiple times

do all downloads on master only
*** run an interactive shell


* master directories
** abs trees
multiple ones, equally supported
** released package cache
** unreleased package repo
** staged packages
** graph
** source cache
** configuration
** repo dbs
** abs db
all PKGBUILD metadata + paths to PKGBUILDs
** maybe other databases generated from the above


* TODO caches
each is a content-addressable filesystem (avoiding the common
conflicts of multiple cached files with the same names) with an index
of name -> content mapping (not used for sources unless the user asks
for it: they can handle multiple contents); for packages the newest
one/from the repo db is used

ugly thing: will use multiple hash algorithms, not accepting
collisions on any of them


* TODO make the separate released and unreleased package caches useful as pacman's cache
have a single mirror on the system that's provided by deepbuild?

use the same solution for chroots


* TODO source fetching
** background process fetching specified packages
try to reuse http connections

continue fetching partial files
** notify build-graph after a package is fetched
** use an appropriate order for it
try to always keep a package ready to build, maybe just use a
topological sort of the graph as fetch order

(could use the scheduling theory for this)
** rewrite URLs to use faster mirrors
keep lists of mirrors of important services
** fail a package if sources cannot be fetched
also if the non-URLs files are missing in the tree


* UI
a single command, deepbuild, with subcommands
** TODO make a consistent naming of these
** sync-db
gets the databases for pacman -Sy (for multiple arches) using the
mirrorlist, saves the data in its own format
** sync-abs
gets all PKGBUILD metadata
** make-graph
generate a dependency graph of all outdated packages, specified
packages (i.e. the full graph without parts which aren't needed to
build these packages), or specified packages and all their reverse
dependencies (e.g. for xorg-server ABI changes)

can use different architecture repos to determine outdateness: e.g. to
fix mips64el build problems on x86_64

should determine if the graph is not buildable, e.g. due to cycles or
lack of dependency providers (listing the unarched providers from
another arch)
** build-graph
build packages from the graph
** show-graph
show the graph, what's built and what isn't
** edit-graph
add/remove package, forget a dependency (e.g. to fix a cycle), rebuild
a package (i.e. after fixing it without changing the rest of the
graph, e.g. when fixed a chroot update problem)
** input-graph
the user manually specifies the graph, e.g. for toolchain builds (with
two glibc and two binutils builds)
** fetch-source
fetch sources for the graph, for specified packages or for all
packages
** clean-source
remove sources not for any package in abs

print size statistics
** list-staged
** release
signs the staged packages and releases them to repo

then remove these packages from the staged list

don't do multiple releases at once

is a nontrivial scheduling algorithm needed for this?
** unstage
remove a package/list of packages from the list to be released
** arch-diff
list packages having one arch and not having another one
** clean-packages
like pacman -Sc with support for multiple arches
** clean-local-packages
removes all unreleased local packages
** chroot-shell
** chroot-new
** TODO more commands if needed for the above features


* TODO configuration
- list of abs trees (find an algorithm to determine which one to use
  when more than one has the package)
- list of repos
- list of chroots
- key fingerprint for signing packages
- packager name and email for use in the packages
- lists of repos to use for building packages for a given repo
  (e.g. only multilib has multilib available, only ~user has ~user,
  standard repos have only standard repos)


* TODO Parabola connection settings
PARABOLAHOST=parabola
LIBREDESTDIR=/srv/http/repo/public

have a recommended ssh config in the documentation


* libretools scripts: usefulness for deepbuild and replacements
** abslibre-commit
no need for it; produces useless commit messages and discourages from
using git directly
** add-mips64el
leads to arching packages we don't need; make-graph and arch-diff will
show data needed for this
** aur
out of scope
** buildenv
will use a different chroot management solution
** chcleanup
will manage chroot packages differently
** createworkdir
will automatically create needed directories; will NOT clone abs repos
since users should know and use git and the script won't know what
repos they prefer (e.g. Parabola already has two)
** diff-unfree
out of scope
** fullpkg
separately use make-graph and build-graph
** fullpkg-build
build-graph will perform a similar task with different algorithms
** fullpkg-find
make-graph, completely different algorithm; don't deal with copying
files; split sourcing PKGBUILDs and generating the graph
** is_built
will have something similar internally
** is_unfree
will assume that all packages are free, we don't have (and shouldn't
have) nonfree packages in our abs trees
** lb
useless
** libreaddiff
{make,show,edit}-graph will replace it in a more manageable way
** librebasebuilder
out of scope
** librechroot
will reimplement it
** librecommit
like abslibre-commit
** librediff
use diff manually
** libremakepkg
will reimplement with a different interface
** libremessages
not a script
** libremkchroot
chroot-new
** librerelease
release
** librerepkg
out of scope
** librestage
will be automatically done by build-graph
** libretools.conf
will have defaults in the program and user's configuration file
overriding some settings

won't have these:

- BLACKLIST
- DIFFTOOL
- WORKDIR: just ~/deepbuild
- ARCHES: usually just some specific ones are used
- CHROOTDIR: will have any dirs to specified chroots
- HOOKPRERELEASE
- ABSLIBREGIT
- COMMITCMD: should not have an option with only one supported value
- FULLBUILDCMD
- TORUPATH: will have fixed layout under the workdir
- SIGEXT: only one correct value
- HOOKPKGBUILDMOD
- HOOKLOCALRELEASE: the only example given is obsolete
** mips-add
like add-mips64el
** mips-release
obsolete; will have different local repo handling
** pkgbuild-check-nonfree
like is_unfree
** prtools
not checked how they differ from main tools
** toru
sync-abs and its database will do the part that I need
** toru-info
useful for debugging only?
** toru-path
what does it do?
** toru-utils
not a script
** toru-where
will have a different internal replacement
** treepkg
like fullpkg; will have different algorithms
** updateabslibre
has the problems of createworkdir
** update-cleansystem
won't have the cleansystem file doing upgrade, cleaning and installing
packages on the chroot at once


* distributed package building notes
** why not distcc
- much work done is not building (e.g. tests, non-C/C++ code)
- not all buildsystems support it well (and some don't support it at
  all)
** potential uses
- building a package in a virtual machine instead of a chroot
- building many packages on multiple mips64el machines
- building many packages in multiple i686-hurd virtual machines on an
  SMP host (since Hurd doesn't support SMP yet)


* TODO master-master build notifications
somehow be able to notify all other masters (e.g. via the repo server)

notify of a build starting, failing or ending (separate states:
released or aborted), so other masters will treat it as if it failed
(not building any packages depending on it) until it's released or
aborted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <https://lists.parabola.nu/pipermail/dev/attachments/20121101/15346c60/attachment.sig>


More information about the Dev mailing list