Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 794175 - LLVM buildbot on Gentoo Infra
Summary: LLVM buildbot on Gentoo Infra
Status: CONFIRMED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-04 16:11 UTC by Michał Górny
Modified: 2022-09-13 11:55 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2021-06-04 16:11:14 UTC
TL;DR: I'm wondering if we could dedicate some fast-ish machine to run LLVM buildbot worker on Gentoo + possibly give a bounty to someone to implement it.


Background
==========
LLVM project is using buildbot-based CI to test for regressions.  Platforms that provide buildbot workers have the major advantage over others that regressions affecting them are reported automatically to upstream developers.  They tend to respect that and either fix their regressions or revert their commits.

Gentoo currently doesn't have a buildbot.  It relies entirely on me periodically testing stuff for regressions, then manually bisecting them and trying to get upstream developers' attention on them.  Unfortunately, the later I find a regression the harder it is to get anyone to care.

This has become a proper PITA lately.  12.0.0 had multiple regressions that weren't fixed through RCs.  I had to put a lot of effort to get people to fix them, and now I have trouble getting them to backport the fixes to 12.0.1.  Things would be much better if we detected these regressions earlier.

I believe having a buildbot worker is a worthwhile goal because:

1. Gentoo generally uses newer packages than other buildbots available upstream

2. Gentoo uses a comparatively unique build design via ebuilds and upstream doesn't have anything like that


Goals
=====
To satisfy the goal, I would like to request:

1. Infra's opinion of dedicating one of the stronger machines for the purpose of running a buildbot worker.  LLVM is quite heavy on compiling and we would need to be able to rebuild it quite often to have upstream accept us.

2. Trustees' opinion on requesting a bounty for someone to do the necessary work, i.e. prepare buildbot setup for Gentoo, set the buildbot worker up and get it all accepted upstream (amount TBD).
Comment 1 Aaron Bauman (RETIRED) gentoo-dev 2021-06-07 22:51:14 UTC
(In reply to Michał Górny from comment #0)

> Goals
> =====
> To satisfy the goal, I would like to request:
> 
> 1. Infra's opinion of dedicating one of the stronger machines for the
> purpose of running a buildbot worker.  LLVM is quite heavy on compiling and
> we would need to be able to rebuild it quite often to have upstream accept
> us.
> 

If infra cannot support with current hardware then the Foundation could vote on allocating funding for more resources.

> 2. Trustees' opinion on requesting a bounty for someone to do the necessary
> work, i.e. prepare buildbot setup for Gentoo, set the buildbot worker up and
> get it all accepted upstream (amount TBD).

Would you please expound on the buildbot? Is this an existing framework that needs to be implemented/adjusted to a Gentoo environment from other buildbots, a "from the ground up" type of project, or something else?

Overall, I love the idea and support both requests... barring no surprises or barriers coming up that would require substantial financial obligations or a single dev having to maintain long term.
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2021-06-08 04:29:16 UTC
For (1) we need to fund some kind of machine.
For (2) I assume for each target we want to support we follow these instructions: https://llvm.org/docs/HowToAddABuilder.html ?

Then in theory the Gentoo worker runs builds from LLVM and reports to LLVM when they fail; An owner for it for ongoing maintenance seems likely from our side (base machine install, buildbot config and troubleshooting, etc.)

For (1) we need to work out what 'fast' is; looking at the pool "FAST" seems to be at least 12 cores and 64G of RAM; most machines are significantly larger than this (e.g. >40 cores and 128G or more memory.)

I can spec out something relatively cheap-o:

SuperMicro 6028U-TR4T+ 2U SuperServer W/ X10DRU-i+
Processors: 2x Xeon E5-2680 v3 2.5GHz 12-Core Processors - $480.00
Memory: 256GB (16x 16GB) DDR4 Registered Memory - $1,440.00
Storage Controller: No Raid Controller Installed (2x SAS Cables)
Hard Drive (1): 800GB 12Gbps SAS SSD (0CN3JH) - $249.00
Hard Drive (2): 800GB 12Gbps SAS SSD (0CN3JH) - $249.00
Hard Drive (3): 800GB 12Gbps SAS SSD (0CN3JH) - $249.00
Hard Drive (4): 800GB 12Gbps SAS SSD (0CN3JH) - $249.00
Hard Drive (5): NO Trays Included
Hard Drive (6): NO Trays Included
Hard Drive (7): NO Trays Included
Hard Drive (8): NO Trays Included
Hard Drive (9): NO Trays Included
Hard Drive (10): NO Trays Included
Hard Drive (11): NO Trays Included
Hard Drive (12): NO Trays Included
PCIe Add-On Card (1): No PCIe Network Card Included
PCIe Add-On Card (2): No PCIe Network Card Included
Power Supply: 2x 1000W 80+ Platinum Power Supplies
4-Post Rack Rail Kit: Supermicro Sliding Rails Included - $85.00

Is $3,350.00, +200$ for a RAID controller and it comes with disks (and we could source those elsewhere for example.

A similar box at Hetzner is about 120 euro /mo (150$). So call the main server 4k; 4k / X = 120 = 120/4 = 30 months (or 3 years) before the new box pays for itself over renting (minus power and cooling, which OSUOSL donates in kind to us.) We'd burn 2U and a lot of Watts at the OSL though (obviously the watts matter more.)
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2021-06-08 07:21:21 UTC
(In reply to Alec Warner from comment #2)
> For (2) I assume for each target we want to support we follow these
> instructions: https://llvm.org/docs/HowToAddABuilder.html ?
> 
> Then in theory the Gentoo worker runs builds from LLVM and reports to LLVM
> when they fail; An owner for it for ongoing maintenance seems likely from
> our side (base machine install, buildbot config and troubleshooting, etc.)

Yes, roughly that.  The overall idea is to:

1. Set up a buildbot worker.

2. Write a zorg configuration for our workflow [1].

3. Get it deployed and running.

The primary problem is that we aren't going to fit into any of the existing workflows, so it's not as simple as their 'new buildbot' instructions.  I don't think we can do it without setting our own testing buildbot server and getting the config patch ready for submission while testing it locally.

As for the exact hardware requirements, I'd defer establishing them to whoever wants to do the work.

[1] https://github.com/llvm/llvm-zorg
Comment 4 Aaron Bauman (RETIRED) gentoo-dev 2021-06-08 16:45:47 UTC
(In reply to Alec Warner from comment #2)
> For (1) we need to fund some kind of machine.

<snip>

> 4-Post Rack Rail Kit: Supermicro Sliding Rails Included - $85.00
> 
> Is $3,350.00, +200$ for a RAID controller and it comes with disks (and we
> could source those elsewhere for example.
> 
> A similar box at Hetzner is about 120 euro /mo (150$). So call the main
> server 4k; 4k / X = 120 = 120/4 = 30 months (or 3 years) before the new box
> pays for itself over renting (minus power and cooling, which OSUOSL donates
> in kind to us.) We'd burn 2U and a lot of Watts at the OSL though (obviously
> the watts matter more.)

I am a fan of just "renting" the box from Hetzner. The ability to refresh it after a couple of years at no additional cost, not burdening OSL with the electric, and a couple other factors.
Comment 5 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2021-06-09 05:10:54 UTC
b-man: you're not on IRC again, so you missed the discussion about this.

Specifically, I proposed that it initially runs on the new releng system builder, using excess CPU there, to prove the value.

If that's good, move to a dedicated host or purchased system.

Per mgorny's note, most of this work isn't going to be getting the hardware, but rather setting up the buildbot: e.g. zorg-buildbot use the ebuild instead of any other wrapper for building. I'd like to see some progress made towards that, while the new host comes online.
Comment 6 Aaron Bauman (RETIRED) gentoo-dev 2021-06-09 18:31:30 UTC
(In reply to Robin Johnson from comment #5)
> b-man: you're not on IRC again, so you missed the discussion about this.
> 

Yea, I don't have much time to be on there these days, but try and keep up with items such as this.

> Specifically, I proposed that it initially runs on the new releng system
> builder, using excess CPU there, to prove the value.
> 

That sounds like a good start!

> If that's good, move to a dedicated host or purchased system.
> 

Perfect!

> Per mgorny's note, most of this work isn't going to be getting the hardware,
> but rather setting up the buildbot: e.g. zorg-buildbot use the ebuild
> instead of any other wrapper for building. I'd like to see some progress
> made towards that, while the new host comes online.

Are you implying progress must be made before we discuss a possible bounty for such work as proposed by Michał?
Comment 7 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2021-07-07 04:40:12 UTC
mgorny: If you intend the bounty to be the way forward here, could you please produce a statement of work and/or request-for-proposal.

The document needs to include:
- summary of project
- acceptance specifications
- expected duration
- possible bounty amount