GitHub Actions and GitLab as the XWiki CI

vmassol · April 26, 2020, 5:25pm

This weekend I’ve research a bit more GA (I had done it a while back already); to see if we could use it if we wanted to.

Learnings

The limits are pretty nice and would fit us (and it’s nice since it’s hosted and we don’t need to manage that): https://help.github.com/en/actions/getting-started-with-github-actions/about-github-actions#usage-limits
- We also 20 parallel executors which is more than what we currently have (we have 8 ATM).
Even if GitHub change their mind in the future, there’s the ability to hav local executor machines for it which is pretty cool.

Examples of some basic workflows:

Current blockers

It doesn’t provide any way to gather junit test results OOB (See https://github.community/t5/GitHub-Actions/Publishing-Test-Results/td-p/31242) so we need to find an action for that or create one. I’ve found 2 that could work but they both have problems:
- https://www.check-run-reporter.com/: Problem reported at https://github.com/check-run-reporter/feedback/issues/3. Results at https://github.com/vmassol/xwiki-commons/runs/618539158
- Scope: https://scope.dev/. Nice thing is that you get test results on the go + flaky test identification, test perf, etc. However 2 problems that I need to report (not done yet):
  - What’s the cost?
  - It got stuck when I tested it on commons and I had to kill the workflow after 1 hour.

Thoughts

It’s very interesting and very well integrated with GH
We could use it to build very simply our docker images. Example by junit: https://github.com/junit-team/build-env/blob/master/.github/workflows/dockerimage.yml
Since we have a docker build image for xwiki, we don’t need that much from the CI, which is good since that allows us to depend less on it.
The usage of remote servers for gathering tests and their history is both good and bad. Good because this allows us to go further with test analysis and handling in general (and be independent of the CI) and Bad because this ties us to another service which may not be available (fragility).
Next steps:
- be able to gather test data and failing tests and see why they failed.
- find out if we can prevent a build at each commit and instead stack them as we do in our ci (now it’s less an issue since we don’t operate the agents).

To be continued.

mleduc · April 28, 2020, 9:41am

That sound really a promising path toward a more stable and maintainable CI.
The fact that’s it is well integrated with GH is a big pro for sure.

I’m not sure how Gitlab’s CI compares to GH Actions but that might be a alternative to consider if we want to keep some control on the CI.

PS: @vmassol the first and third examples workflow links are 404

vmassol · May 3, 2020, 9:30am

Actually, I think I lost them when I recreated the repos…

vmassol · May 3, 2020, 9:40am

I’ve now tried GitLab CI:

We can have it work even though our repos are on GitHub (it sets up a mirorring clone)
Seems a bit more advanced than GitHub Actions (GA)
We can use gitlab.com which is very nice for open source projects. However, note that the CPU/Memory for the agents are lower than GA and it could be not enough for our platform builds (https://docs.gitlab.com/ee/user/gitlab_com/#shared-runners - 3.75GB of RAM). Also we get 10 parallel pipelines (https://docs.gitlab.com/ee/user/gitlab_com/#gitlab-cicd).
Same as GA it’s possible to setup our own runners (even for gitlab.com).
Same as GA, junit test reporting is currently not working, see https://gitlab.com/groups/gitlab-org/-/epics/2854

In order to not loose it, here’s the current .gitlab-ci.yml script I have:

variables:
  # This will suppress any download for dependencies and plugins or upload messages which would clutter the console log.
  # `showDateTime` will show the passed time in milliseconds. You need to specify `--batch-mode` to make this work.
  MAVEN_OPTS: "-Xmx2048m -Xms512m -Dhttps.protocols=TLSv1.2 -Dmaven.repo.local=$CI_PROJECT_DIR/.m2/repository -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=WARN -Dorg.slf4j.simpleLogger.showDateTime=true -Djava.awt.headless=true"
  # As of Maven 3.3.0 instead of this you may define these options in `.mvn/maven.config` so the same config is used
  # when running from the command line.
  # `installAtEnd` and `deployAtEnd` are only effective with recent version of the corresponding plugins.
  MAVEN_CLI_OPTS: "--batch-mode --errors --fail-at-end --show-version -DinstallAtEnd=true -DdeployAtEnd=true -Dmaven.test.failure.ignore"

image: xwiki/build:latest

# Cache downloaded dependencies and build results across jobs
# Note: We don't cache the target directory, see
# https://blog.deniger.net/post/gitlab-maven-optimize-build/
cache:
  paths:
    - .m2/

# Note that Gitlab only caches /build and /cache ATM and since the XWiki build image
# puts the m2 repo in /root/.m2/repository we need to symlink it below.
# Current directory set by gitlab is /build/<github org name>/<github repo name>
# Variable name is $CI_PROJECT_DIR
before_script:
    - ln -fs /root/.m2 $CI_PROJECT_DIR/.m2

build:
  stage: build
  script:
    # Apparently gitlab runner doesn't run our bashrc, I think it's because it
    # overrides the root user's home directory to be /builds, so we need to force
    # execute it to have the mvn executable in the PATH.
    - . /root/.bashrc
    # Remove the xwiki dependencies from the cache repo so that we build them or
    # download them and have reproducible builds
    - rm -Rf $CI_PROJECT_DIR/.m2/repository/org/xwiki
    - rm -Rf $CI_PROJECT_DIR/.m2/repository/com/xpn
    # We run the clean goal because we cache target/ so that the test job won't have
    # to rebuild everything and thus will be faster. However this seems to carry over
    # across pipeline executions.
    - mvn $MAVEN_CLI_OPTS clean install -DskipTests -Plegacy,snapshot
  artifacts:
    # When not specifyin expire_in the default is used whih is 30 days or forever, see
    # https://docs.gitlab.com/ee/ci/yaml/#artifactsexpire_in
    # Note: see https://blog.deniger.net/post/gitlab-maven-optimize-build/ for the syntax used.
    paths:
      - "*/target"

test:
  stage: test
  script:
    - . /root/.bashrc
    # Setup a display for functional tests requiring one (non-docker tests)
    - vncserver :1 -geometry 1280x960 -localhost -nolisten tcp
    - export DISPLAY=:1
    # Make sure we don't recompile sources, see:
    # https://blog.deniger.net/post/gitlab-maven-optimize-build/
    - find . -name "*.class" -exec touch {} \+
    - mvn $MAVEN_CLI_OPTS test -Plegacy,integration-tests,docker,snapshot
 artifacts:
    reports:
      junit:
        - target/surefire-reports/TEST-*.xml
        - target/failsafe-reports/TEST-*.xml

vmassol · May 3, 2020, 12:15pm

I’ve now reviewed our current CI pipeline and extracted out the requirements we need from a CI system. I’ve documented this at https://dev.xwiki.org/xwiki/bin/view/Community/ContinuousBuild#HCIRequirements

We can now have a matrix and make sure that GA or GitLab CI fulfill them. We already know that they don’t fulfil well the Test Report part FTM.

vmassol · May 3, 2020, 3:50pm

I’ve now started evaluating the 3 CI systems against the listed requirements at https://dev.xwiki.org/xwiki/bin/view/Community/ContinuousBuild#HCIRequirements

TODO:

Add requirements on performances (memory, CPU, etc)
Add cost requirements
Add risk/control requirement (e.g. ability to have local agents, ability to install the software locally)
Add a section about known problems with each CI system (for example, to explain why we would want to change from jenkins to something else).

vmassol · May 3, 2020, 5:28pm

I’ve also just read the doc from circleci.com and it’s really nice. It’s the best doc I’ve seen and I really like the features. The concepts are very clear and nice and they’ve paid a lot of attention to easily making the parts reusable.

They offer free executors for open source projects (up to 400K credits per month): https://circleci.com/open-source/ (not yet sure what this allows though…).

They seem to support test results too: https://circleci.com/docs/2.0/collect-test-data/#maven-surefire-plugin-for-java-junit-results and https://circleci.com/blog/how-to-output-junit-tests-through-circleci-2-0-for-expanded-insights/.

Only (potentially big) downside is that it’s not open source.

EDIT: Found some interesting doc about limits https://circleci.com/docs/2.0/oss/

These credits can be spent on Linux-medium resources. This means, according, to https://circleci.com/docs/2.0/configuration-reference/#resource_class, 2 vcpu and 4GB RAM.
Each organization can have a maximum of four concurrent jobs running. That’s a bit on the low side for us.

vmassol · May 11, 2020, 8:19am

FTR I’ve tested circleCI over the weekend. It’s very nice. There’s test report but I need to check it more (in case of test failures) to see the extent of the reporting. I had another problem. I ran the build for platform. See https://github.com/vmassol/xwiki-platform/blob/master/.circleci/config.yml. It stops after 30mn or so with “Received “killed” signal” (see https://app.circleci.com/pipelines/github/vmassol/xwiki-platform/4/workflows/95df2595-a729-4d2a-9c50-8fbcb3326949/jobs/4). This seems to indicate there’s not enough memory and the OS killed the build. However we’re supposed to have 4GB and we’re supposed to use 2GB for Maven. Now, I haven’t checked what we set up for surefire, that could be the issue.