Skip to content

AWS EKS deployment of Synapse results in run/secrets/synapse permissions issue #532

@ShaneCray

Description

@ShaneCray

I tried this:

We have deployed Synapse to AWS EKS, but are having difficulty getting workflows to run.

Our operator config is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: operator-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: operator
  template:
    metadata:
      labels:
        app: operator
    spec:
      volumes:
      - name: synapse-pvc
        persistentVolumeClaim:
          claimName: synapse-pvc
      serviceAccountName: operator
      containers:
        - name: operator
          image: ghcr.io/serverlessworkflow/synapse/operator:1.0.0-alpha5.15
          env:
            - name: CONNECTIONSTRINGS__REDIS
              value: garnet:6379
            - name: SYNAPSE_RUNNER_IMAGE
              value: ghcr.io/serverlessworkflow/synapse/runner:1.0.0-alpha5.15
            - name: SYNAPSE_OPERATOR_NAMESPACE
              value: default
            - name: SYNAPSE_OPERATOR_NAME
              value: operator-1
            - name: SYNAPSE_RUNNER_API
              value: <DOMAIN>
            - name: SYNAPSE_RUNNER_LIFECYCLE_EVENTS
              value: "true"
            - name: SYNAPSE_RUNNER_CONTAINER_PLATFORM
              value: kubernetes
            - name: SYNAPSE_RUNTIME_MODE
              value: kubernetes
            - name: SYNAPSE_RUNTIME_K8S_SERVICE_ACCOUNT
              value: operator
            - name: SYNAPSE_RUNTIME_K8S_NAMESPACE
              value: ssmo-dev-shared-synapse
      volumeMounts:
      - name: synapse-pvc
        mountPath: /run/secrets/synapse
---
apiVersion: v1
kind: Service
metadata:
  name: operator
  namespace: ssmo-dev-shared-synapse
spec:
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: operator
  type: ClusterIP
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: operator-role
rules:
- apiGroups: [""]
  resources: ["pods", "secrets", "configmaps", "persistentvolumeclaims", "serviceaccounts", "services"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: operator-role-binding
subjects:
- kind: ServiceAccount
  name: operator
  namespace: ssmo-dev-shared-synapse
roleRef:
  kind: ClusterRole
  name: operator-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: synapse-pv
spec:
  capacity:
    storage: "10Gi"
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: "efs-sc"
  csi:
    driver: efs.csi.aws.com
    volumeHandle: "${AWS_EFS_FILESYSTEM_ID}::${AWS_EFS_FULL_ACCESS_AP_ID}"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: synapse-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: "10Gi"
  storageClassName: "efs-sc"
  volumeName: synapse-pv

This happened:

Anytime we attempt to get a workflow running, this warning persists and the workflow request times out:

[14:57:32] warn: Synapse.Runner.Services.SecretsManager[0]                                                                                                                                                                                                         │
│       Failed to load secrets because there are none or because they are improperly configured. Error: Access to the path '/run/secrets/synapse' is denied.                                                                                                         │
│ [14:57:32] info: Microsoft.Hosting.Lifetime[0]                                                                                                                                                                                                                     │
│       Application started. Press Ctrl+C to shut down.                                                                                                                                                                                                              │
│ [14:57:32] info: Microsoft.Hosting.Lifetime[0]                                                                                                                                                                                                                     │
│       Hosting environment: Production                                                                                                                                                                                                                              │
│ [14:57:32] info: Microsoft.Hosting.Lifetime[0]                                                                                                                                                                                                                     │
│       Content root path: /app                                                                                                                                                                                                                                      │
│ [14:57:32] info: System.Net.Http.HttpClient.Default.LogicalHandler[100]                                                                                                                                                                                            │
│       Start processing HTTP request GET https://synapse.dev-shared.ssmo.appdat.jsc.nasa.gov:8080/.well-known/openid-configuration                                                                                                                                  │
│ [14:57:32] info: System.Net.Http.HttpClient.Default.ClientHandler[100]                                                                                                                                                                                             │
│       Sending HTTP request GET https://synapse.dev-shared.ssmo.appdat.jsc.nasa.gov:8080/.well-known/openid-configuration                                                                                                                                           │
│ [14:59:12] fail: Synapse.Runner.Services.RunnerApplication[0]                                                                                                                                                                                                      │
│       An error occurred while running the specified workflow instance: System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.                                                    │
│        ---> System.TimeoutException: A task was canceled.                                                                                                                                                                                                          │
│        ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.                                                                                                                                                                                     │
│          at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)                                                                                                                           │
│          at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)                                                                        │
│          at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)                                                                                                                              │
│          at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.<SendCoreAsync>g__Core|4_0(HttpRequestMessage request, Boolean useAsync, CancellationToken cancellationToken)                                                                              │
│          at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.<SendCoreAsync>g__Core|4_0(HttpRequestMessage request, Boolean useAsync, CancellationToken cancellationToken)                                                                         │
│          at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellat │
│          --- End of inner exception stack trace ---                                                                                                                                                                                                                │
│          --- End of inner exception stack trace ---                                                                                                                                                                                                                │
│          at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)                            │
│          at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellat │
│          at IdentityModel.Client.HttpClientDiscoveryExtensions.GetDiscoveryDocumentAsync(HttpMessageInvoker client, DiscoveryDocumentRequest request, CancellationToken cancellationToken)                                                                         │
│          at Synapse.Core.Infrastructure.Services.OAuth2TokenManager.GetTokenAsync(OAuth2AuthenticationSchemeDefinitionBase configuration, CancellationToken cancellationToken) in /src/src/core/Synapse.Core.Infrastructure/Services/OAuth2TokenManager.cs:line 77 │
│          at Program.<>c__DisplayClass0_2.<<<Main>$>b__14>d.MoveNext() in /src/src/runner/Synapse.Runner/Program.cs:line 60                                                                                                                                         │
│       --- End of stack trace from previous location ---                                                                                                                                                                                                            │
│          at Synapse.Api.Client.Services.ApiClientBase.ProcessRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /src/src/api/Synapse.Api.Client.Http/Services/ApiClientBase.cs:line 62                                               │
│          at Synapse.Api.Client.Services.ResourceHttpApiClient`1.GetAsync(String name, String namespace, CancellationToken cancellationToken) in /src/src/api/Synapse.Api.Client.Http/Services/ResourceHttpApiClient.cs:line 63                                     │
│          at Synapse.Runner.Services.RunnerApplication.RunAsync(CancellationToken cancellationToken) in /src/src/runner/Synapse.Runner/Services/RunnerApplication.cs:line 117                                                                                       │
│ [14:59:12] info: Microsoft.Hosting.Lifetime[0]                                                                                                                                                                                                                     │
│       Application is shutting down...                          

I expected this:

Given that we have an EFS mount pointing to the path that the operator config specifies, the expectation is that we wouldn't be running into this kind of issue.

Furthermore, should we mount the EFS to the correlator, api, and garnet pods? I'm not entirely sure what local deployments of Synapse bind to from a file system perspective.

Is there a workaround?

No response

Anything else?

No response

Platform(s)

No response

Community Notes

  • Please vote by adding a 👍 reaction to the issue to help us prioritize.
  • If you are interested to work on this issue, please leave a comment.name: Bug Report 🐞

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy