metrics.opensuse.org

My initial proposal for metrics collection by running queries against the OBS database was turned down. I later tackled the problem using the OBS API with a lot of post-processing. The results are stored in InfluxDB and presented using Grafana and can be viewed on metrics.opensuse.org. I plan to provide a dedicated article walking through what the numbers mean and pointing out the effect of various tooling changes over the last year. Now that the instance is live the OBS team has expressed interest in co-maintaining the machine and providing some general OBS statistics.

During deployment a number of resource related issues were resolved. The memory footprint of the ingest portion was reduced by 60% (12GB to 4.5GB). Overall, the end result is fairly reasonable considering the ~800MB of XML data that is processed to produce over 2 million rows of data for Factory alone.

There remain a few issues to resolve, most importantly in regards to regular ingestation of new data. Short of querying all 800MB+ of data each time it would be preferred to have OBS allow for sorting the data so only the last couple pages can be queried. Excluding requests in the pseudo-state ignored will likely remove many of the graph peaks and thus clarify the data.

Tumbleweed snapshots

For Hack Week 16 I spent the week researching and refactoring the tool to provide Tumbleweed Snapshots on Amazon S3 storage. The reason for the switch was a lack of tangible progress on official openSUSE hosting after a year. The primary focus was to reworked snapshot tool to handle the non-file-system S3 storage which does not support soft or hard links while avoiding duplicate files.

The solution was to use S3 static website hosting redirection rules. A full copy of the root level files which do not contain a version in the name are copied in a directory matching the snapshot number. The directories containing the versioned rpms are then diff and any files unique between the current and previous snapshot are placed in the shared directory. Rules are then setup to redirect to shared whenever a file is not found in specific snapshot path. When a snapshot is removed the unique rpms between snapshot being removed and the next newest snapshot are removed from shared and the snapshot directory removed. The current snapshot is always redirected to download.opensuse.org so the existing mirror networks receives the bulk of the load. The resulting directory structure looks like the following.

  • 20171115
  • 20171116
  • 20171117
  • shared

The full layout is included within the directories. zypper then accesses via the specific snapshot number (ex. 20171117) and the redirection rules handle the rest.

Each snapshot directory contains the following files created by the tool.

  • disk: summary about disk usage for snapshot
  • rpm.list: full list of rpms provided by the snapshot
  • rpm.unique.list: unique list of rpms provided by the snapshot when compared to previous snapshot

The rpm.list file is used for diffing to determine what should be placed in the shared directory.

During development I had attempting to use s3fs-fuse with the intention of rsyncing the files into a s3fs mount, but that proved extremely fruitless. Using the aws-cli sync command works great, but thus requires more space since the files have to be sourced locally.

The snapshots are currently being triggered manually as rsync.opensuse.org is regularly out-of-sync and using specific mirrors requires a delay. I have planned out how to automate it, but have yet to work on it. stage.opensuse.org access would be preferred, but seems to require a static IP which is not required otherwise.

The end result will be published to the opensuse-factory mailing list (and this blog) within a week.

repo-checker

In order to continue decreasing the staging process time, the repo-checker will now review requests in staging projects that build properly but have openQA failures. This removes the need to wait for repo-checker once the openQA failures are resolved.

The repo-checker was moved to a more powerful machine to speed up processing time significantly. Combined with the build hash skips from the previous week the repo-checker loop time dropped from several hours to typically under half in hour. Some example batches with the hash skip:

[D] requests: 442 skipped, 0 queued
[D] requests: 439 skipped, 3 queued

After openSUSE:Factory:Staging:D finished building:

[D] requests: 380 skipped, 59 queued

deployment

The openSUSE-release-tools.spec was changed to only run %systemd_postun for oneshot services to avoid needless restarts on update. This package is used to deploy the tools in production.

miscellaneous

The daily CI run was disable due to empty submit request generation. The intention is to catch integration issues as a result of OBS or osc changes as quickly as possible.

The NonFree workflow is partially broken and being worked around manually.

As a follow-up to enabling factory-auto for Leap last report a fix for delete requests was merged.

The DNS CNAME record for this blog release-tools.opensuse.org was created!

last year

Last year at this time I was working on Tumbleweed snapshot repositories.

  • research into various layers of OBS repo and kiwi product publishing workflow
  • research into libzypp repo structure info for URL pattern
  • developed plan and started on prototype