stream terminated by RST_STREAM with error code: PROTOCOL_ERROR

752 views Asked by At

Setup

We have a Golang microservice gRPC service mesh setup on EKS with Kubernetes(1.27), Istio(1.18) and Knative(1.11). Deployments are made via Knative Serving and all pods contain a queue-proxy container provided by Knative. We pass contexts between adapters for traceability with calling a function:

How context propagation is called before adapter calls:

outgoingCtx, err := common.PropagateContext(ctx)
if err != nil {
    return nil, err
}

result, err := client.PerformAction(outgoingCtx)
if err != nil {
    return nil, err
}

How context propagation is implemented:

func PropagateContext(ctx context.Context, newFields ...*Field) (context.Context, error) {
    md, ok := metadata.FromIncomingContext(ctx)
    if !ok {
        return nil, fmt.Errorf("unable to propagate context")
    }
    for _, field := range newFields {
        md.Set(field.Key, field.Value)
    }
    return metadata.NewOutgoingContext(ctx, md), nil
}

Traffic is routed from Route53 to an AWS ALB and then istio-ingressgateway service which communicates with Knative services.

Problem

Services suddenly stopped working 3 days ago without any changes to the code. We successfully narrowed the issue to context passing between microservices.

Custom metadata, such as authorization tokens, that we add to the request does not result in any problems. However, the metadata fields which istio and alb inserted, as seen in below metadata, to context between service calls causes the error below:

Error: stream terminated by RST_STREAM with error code: PROTOCOL_ERROR.

Metadata:

k-proxy-request [activator]
x-envoy-attempt-count [1]
x-envoy-external-address [IP3]
X-envoy-peer-metadata-id [router~IP4~istio-ingressgateway-REMOVED.istio-system~istio-system.svc.cluster.local] x-b3-sampled [0] 
x-envoy-decorator-operation [SERVICE-REVISION.default.svc.cluster.local:81/*] 
x-b3-spanid [SPAN-ID] 
x-forwarded-for [IP1, IP2, IP3, IP4, IP5] 
x-request-id [REQUEST-ID] 
:authority [SERVICE.domain.com] 
x-forwarded-port [443] 
content-type [application/grpc] 
x-forwarded-proto [http] 
x-amzn-trace-id [Root=1-TRACE-ID] 
content-length [15]
x-b3-parentspanid [PARENT-SPAN-ID] 
authorization [AUTH-TOKEN] 
forwarded [for=IP1;proto=http, for=IP@, for=IP3, for=IP4]
x-envoy-peer-metadata [REMOVED] 
x-b3-traceid [TRACE-ID] 
user-agent [grpc-go/1.56.2]
authorization [<JWT>] 

Note that this is not always the case, sometimes the request may succeed when a connection is reopened.

We solved the problem by removing any metadata other than authorization from above attached. But still cannot understand the root cause of the issue.

0

There are 0 answers