There are some ideas floating around about using Kubernetes in deployment-prep to catch up with the use of k8s in the WMF production environment. To date nobody has wanted to provision a Kubernetes cluster in deployment-prep using either the wikikube or toolforge puppet profiles. There is another option in OpenStack Magnum which can be automated using OpenTofu.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Spike | bd808 | T372498 Figure out how to provision a Kubernetes cluster using Magnum and OpenTofu | ||
Resolved | dcaro | T372353 Request creation of deploymentpreps3 VPS project | |||
Resolved | BUG REPORT | bd808 | T372365 OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep | ||
Invalid | BUG REPORT | None | T372835 Push by kokkuri to registry.cloud.releng.team/bd808/deployment-prep-opentofu failing after working last week | ||
Resolved | BUG REPORT | bd808 | T372848 Images from registry.cloud.releng.team should be usable by the "wmcs" runners | ||
Resolved | brennen | T372937 Create new GitLab project group: cloudvps-repos/deployment-prep | |||
Resolved | BUG REPORT | Andrew | T373227 Provisioning of Kubernetes cluster via Magnum stopped working around time of OpenStack upgrade | ||
Resolved | Andrew | T369044 Upgrade cloud-vps openstack to version 'Caracal' | |||
Resolved | Andrew | T371707 Update designate sink plugins to work with caracal |
Event Timeline
I have now setup and torn down a small 2 node k8s cluster a number of times from my laptop using various stages of the config in https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu. Things there steal blatantly from prior art by @rook in https://github.com/toolforge/paws/tree/main/tofu as well as things that @taavi did in https://gitlab.wikimedia.org/repos/cloud/metricsinfra/tofu-provisioning/.
My work so far:
- T372353: Request creation of deploymentpreps3 VPS project
- Created a service account named "BetaDevOpsBot" to run tofu from inside the beta cluster
- The account has not been added to the keystone password safelist as it will be using application credentials for authn.
- Created application credentials for the BetaDevOpsBot service account in deploymentpreps3 project
- Created application credentials for the BetaDevOpsBot service account in deployment-prep project
- Created ec2 credential for the BetaDevOpsBot service account in deploymentpreps3 project
- Created an empty "opentofu" s3 container in deploymentpreps3 project to store the tofu apply state
Interesting things learned so far:
- Application credentials need to include the "Unrestricted (dangerous)" permission or things will blow up when Magnum tries to create a service account related to the cluster (T372365#10063201)
- The bucket setting for the tofu s3 backend needs to be "$PROJECT_ID:$CONTAINER_NAME" and $PROJECT_ID is not the same as $PROJECT_NAME for newer Cloud VPS projects. We switched to using UUIDs for the $PROJECT_ID value via T274268: Wind down use of project ID and project name equivalency in OpenStack
The tofu config provisions a Kubernetes client config file at kubeconfig for accessing the cluster. The cluster can only be reached from inside of the Cloud VPS network, so to use the config from my laptop I need to tunnel the requests over ssh. That can look something like:
$ tofu apply openstack_containerinfra_clustertemplate_v1.template_v126: Refreshing state... [id=28705ec4-e720-4857-8c92-3eaf98d86bc9] openstack_containerinfra_cluster_v1.k8s_v126: Refreshing state... [id=aa888fb2-8d36-4a96-9738-981a52ccddc4] local_file.kubeconfig: Refreshing state... [id=9f69d57013b1e95a8b4cc56e49248cd9a25c86d2] No changes. Your infrastructure matches the configuration. OpenTofu has compared your real infrastructure against your configuration and found no differences, so no changes are needed. Apply complete! Resources: 0 added, 0 changed, 0 destroyed. $ ssh -q -N -f -D 1080 deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud $ HTTPS_PROXY=socks5://localhost:1080 KUBECONFIG=kubeconfig kubectl get nodes NAME STATUS ROLES AGE VERSION beta-v126-44nrazht5jhg-master-0 Ready master 42m v1.26.8 beta-v126-44nrazht5jhg-node-0 Ready <none> 33m v1.26.8
bd808 updated https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/1
ci: Validate and apply OpenTofu config via pipelines
bd808 merged https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/1
ci: Validate and apply OpenTofu config via pipelines
bd808 opened https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/2
ci(tofu-apply): Allow passing args to tofu apply
bd808 merged https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/2
ci(tofu-apply): Allow passing args to tofu apply
bd808 opened https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/3
ci(tofu-apply): Run job from WMCS runners
bd808 merged https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/3
ci(tofu-apply): Run job from WMCS runners
bd808 opened https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/5
ci: Split tofu plan and apply into separate jobs
bd808 merged https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/5
ci: Split tofu plan and apply into separate jobs
The GitLab CI pipeline at https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/pipelines/71739 shows a gitops automation loop for running OpenTofu against the deployment-prep project in action. This works a lot like the existing automation in https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner that is used to manage the Digital Ocean runners.
TODO:
- Run tofu apply stage from inside Cloud-VPS so that it can use helm to interact with the provisioned Kubernetes cluster. Currently blocked by T372848: Images from registry.cloud.releng.team should be usable by the "wmcs" runners.
- Figure out ingress for the Kubernetes cluster and add setup for that to the OpenTofu config. The Cloud VPS OpenStack does not yet include a LBaaS feature, but we can borrow from things Taavi figured out in https://gitlab.wikimedia.org/repos/cloud/metricsinfra/tofu-provisioning to setup HAProxy nodes if that helps the design.
bd808 opened https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/6
ci(tofu-apply): Run job from WMCS runners
bd808 merged https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/merge_requests/6
ci(tofu-apply): Run job from WMCS runners
I haven't made a whole lot of progress on the ingress question. I was able to determine that the HAProxy setup used in the metricsinfra project is mostly handled by a custom Puppet role and profile which configure HAProxy. The tofu module provisions instances and configures them to have the role applied which is neat, but not really reusable in its current state.
After discussion with @thcipriani I think that if we do end up needing to provision and configure things like HAProxy nodes for this use case we should experiment with the ansible provider as a mechanism for managing the needed instances. This is not to say that Puppet cannot be used, but if we need to invent net new things it may be simpler in the long term to do so without using the ops/puppet.git system which requires active assistance from production SREs.
I have also updated the tofu config to manage {masters,workers}.k8s.svc.deployment-prep.eqiad1.wikimedia.cloud A records in DNS. I believe the workers.k8s.svc.deployment-prep.eqiad1.wikimedia.cloud service name and the known port mappings from https://wikitech.wikimedia.org/wiki/Kubernetes/Service_ports may be most of what we need to point the CDN edge at MediaWiki containers in the cluster once we progress to having a mechanism for creating and deploying those containers.
While higher availability is nice, is it valuable in this case? PAWS, quarry and superset all run without HAProxy nodes, and I have yet to identify an instance where the service was not accessible because the instance dns was connecting to had failed. In addition regardless of how much the networking is redundant, magnum in our current setup only has one control node. Perhaps waiting until octavia is available is the better method? As it allows for cloud native load balancing/redundancy, and should allow multiple control nodes.
Is ingress separate from the HAProxy setup in your case? If not what is the desired ingress setup?
This needs research. My hoped for end result is that we can apply the helm charts from https://gerrit.wikimedia.org/g/operations/deployment-charts to this cluster. I believe this means supporting both things that are using https://wikitech.wikimedia.org/wiki/Kubernetes/Service_ports behind LVS in the prod cluster and also things using the https://wikitech.wikimedia.org/wiki/Kubernetes/Ingress istio ingress layer. I may be totally confused about how the prod wikikube bits connect however. I assume that there are folks on the Service Operations team who could help flesh out the meaningful requirements here.
I think working out the ingress system is outside of the scope of this tech spike, but a critical next step if management decides we can continue to invest in the direction of introducing a wikikube work-alike Kubernetes cluster in deployment-prep.
My hoped for end result is that we can apply the helm charts from https://gerrit.wikimedia.org/g/operations/deployment-charts to this cluster.
I'm seeing about 80 helm charts in that repo. Is each of these deployed to prod? And the hope for this project is to get each of them deployed to a cluster in deployment-prep ?
I believe all of these charts are in use in some "production" Kubernetes cluster, but not all of them are part of the WikiKube cluster's services.
And the hope for this project is to get each of them deployed to a cluster in deployment-prep ?
I would like to see MediaWiki and its direct support services (changeprop, citoid, kask, shellbox, etc) that are currently deployed in deployment-prep move to a Kubernetes cluster to better match the production deployment and configuration mechanisms. Today MediaWiki is deployed matching the "bare metal" Puppet based legacy method from production. Support services seem to have been largely deployed via containers, but containers managed by systemd + Docker instead of Kubernetes. This deployment difference has caused some support confusion in the past as folks used to taking care of services in production find out that they need to learn about an alternate deployment stack in deployment-prep.
I am not currently advocating for everything deployed in production to be deployed in deployment-prep. I have taken that position in past, but today I am attempting to be more pragmatically focused on keeping the MediaWiki testing functionality of deployment-prep working as production MediaWiki completes the switch from bare metal to containers and then continues to evolve in a container centered manner. If we accomplish that goal then I think we would be in a better place to discuss the potential value and cost of doing pre-production testing of additional projects in deployment-prep vs elsewhere.
There is a larger conversation to be had about the future utility of deployment-prep and how we might better support the variety of use cases it has grown to encompass. T215217: deployment-prep (beta cluster): Code stewardship request has some past discussion about challenges and frustrations related to the environment. A number of folks have burned themselves out attempting to provide solo or shallow group support for the environment. There has been at least one serious attempt by the Foundation to "build a better beta" that ultimately failed. I think I can include myself in the camp of folks who want various workflows currently supported by deployment-prep to continue in the future, but really do not care much where those workflows are realized as long as it is somewhere that is as accessible to the Wikimedia community as it is to Wikimedia Foundation staff.
I agree moving from a docker/systemd approach to a k8s approach is better, both to better resemble prod and to be a more expected framework. And that the first step should be to to just update what is already in deployment-prep to look more like production, rather than add anything new to deployment-prep, and the general conversation of is deployment-prep the right place for things that are not already in it or would elsewhere be a better option is a second day conversation.
I've read through T215217 and discussed with a few people on what is happening with deployment prep. The burnout and failures have been mentioned. Though the general feel that I get is that regardless of deployment prep being a mess, it isn't going away, even if some of the use cases are moved to other projects. So it does seem of value to get it more reliable, probably first as something that can be deployed reliably, then as something that resembles prod, nicely these two steps can at first be preformed in the same manner with what you're doing in this ticket.
I find this project interesting and if you don't mind I would like to look a little more at what you're working on with it. Though I probably won't get too far in the immediate future as I disappear for awhile next week.
I've created a T372498 branch on the https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/ repo. The primary changes were to introduce multiple datacenter logic. Mostly to be able to test without fear of upsetting anything that may be happening in eqiad1 deployment-prep though with a view that in the distant future a "prod" datacenter could be added. I took out the ssh keys for the cluster, as I haven't found them useful, the times that I have used them is to dig around in the control node in a situation where it is deploying but the worker nodes are not. Though all the logs in the control node appear to be in parts of openstack. Additionally if access to a control or worker node is needed post cluster deploy it is possible with:
kubectl debug node/<node name> -it --image=ubuntu chroot /host/
If you're finding other uses of the ssh key please let me know.
Looks like what you found in https://github.com/opentofu/opentofu/issues/388 has been resolved, so the ec2 keys can probably be done in a nicer fashion than is currently setup. I haven't tinkered with this yet.
The secrets are encrypted with git-crypt the encryption key can be found on my home directory in bastion.alphacluster.codfw1dev.wikimedia.cloud
The whole thing should be deployable by running bash deploy.sh <datacenter> Seems to be working in codfw1dev haven't got it running in eqiad1 so far. I'm getting an error requesting that the bucket be generated before the state is stored there. But I'm getting an error about projects with - in their name cannot have a bucket. Did you get the state in a bucket working before?
I've started tinkering with deploying the charts from https://gerrit.wikimedia.org/g/operations/deployment-charts are all the referenced images private to production? So far the ones I've tried seem to be. From what is described above it seems that the images do exist somewhere for use in deployment-prep where would I learn more about where they are pulled from to update the values files?
As noted in the comments in main.tf, the variable approach requires tofu v1.8.0+. Currently the runtime container that the CI pipelines use only has tofu v1.7.2 available to it via the https://apt.wikimedia.org/wikimedia/dists/bookworm-wikimedia/thirdparty/tofu/ component. Getting a newer version of tofu packaged in our allowed apt repos would be great assuming it doesn't cause problems elsewhere.
The object store in use by the GitLab CI automation (and from my local laptop) is connected to the deploymentpreps3 Cloud VPS project that was created specifically to work around the project id issue. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu/-/settings/ci_cd have the auth data needed to work with the existing "675d02a3344846919fd7fdee700b53a2:opentofu" bucket.
I have some local uncommitted things that use the ssh key to enable using the loafoe/ssh provider, but I do not yet have a deeply compelling use case for this.
Trying to improve my understanding of how deploys are done in prod. Asking around in a few places, mostly related to what is happening here and @bd808 knows a lot of things, so figured I would ask here too.
Current understanding of deployments is: Code is updated, then helm is updated in https://gerrit.wikimedia.org/r/operations/deployment-charts After an update to the chart is merged, deploys can be done from a deployment server either through deployment.eqiad.wmnet or directly. Moving to the desired service cd /srv/deployment-charts/helmfile.d/services/<service> and running helmfile helmfile -e <env> -i apply
Helmfiles seem to reference /etc/helmfile-defaults/general-{{ .Environment.Name }}.yaml files, which appear to contain environment variables for each environment. Though I'm not sure where these files themselves are generated.
Is my basic understanding of how helm is deployed correct?
Can any deployment server deploy to any environment? deploy1003.eqiad.wmnet could deploy to codfw?
Is there a way to deploy all the projects to an environment?
I'm also not sure where cluster access is granted for helmfile to make updates to k8s. Where is that managed?
Would it be feasible to add environment variables for beta cluster, and k8s connection information. Then have the deployment server be able to deploy to beta cluster? The hope being that beta cluster could benefit from any updates that are made to production, without having to track them independently of how production updates are being tracked.
I think I understand why you would want this, but it also seems like the exact inverse of the intent of a pre-production integration environment.
For iteration 0 the thing that needs to move to Kubernetes is MediaWiki and that is all basically driven by scap. Today in deployment-prep MediaWiki is deployed by scap though the jobs in https://integration.wikimedia.org/ci/view/Beta/ which 1) update the git clones on the deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud server, 2) run scap to sync the staged deployment directory to the MediaWiki hosts, and finally 3) apply database schema updates and data migrations needed by the current release. scap knows how to do things with the helmfile setup on the production deploy hosts. I naively imagine that we will be able to translate that to what is needed to update the deployment-prep deployments as well once we design that layer, hopefully with a reasonable prod interface overlap.
The thought is to have a simple method to make deployment-prep mostly the same as production (at this point only in the k8s bits) while allowing developers to overwrite services at will. For example deployment-prep citoid is currently using an image from 2019, I would imagine such divergence hinders testing. The hope would be to get all the k8s bits updated to production versions, then allow anyone to update a given service with patch versions that they want to test/demo, and have some kind of periodic reset (Not sure what an appropriate period would be at this point) to bring everything back up to production version.
Though I may not understand the desire. Is this approach outside of the intent of this project?
I am going to close this tech spike task as completed. @rook should probably start a new task to track the changes they are working on to move beyond the initial tech spike into a more full featured WikiKube work-alike system.