In this post I will focus on HTTP Servers, more specifically on the type net/http.Server that is part of the standard library in Go and the configuration options we need to indicate when using it: Timeouts. Or to rephrase it, it means being able to handle errors in a way such our service can still operate those errors could come from inside our implementation or being caused from the inputs that we’re receiving from our customers. …the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation. to Wikipedia, Resilience (emphasis mine) is: I believe the value used by SDKs is static and will always be sqs. Is it possible there were any IAM authentication/authorization changes to the credentials that your Pods are using? Maybe a change to the SQS queue policy or how your producer is placing messages in the queue? Any known networking changes within your GKE cluster's nodes that may have caused this?Īnd for setting the host in NewRequest, I think it would be best to open an issue/PR for this or an issue to get some more information from the maintainers as I'm not very familiar on the reason why it isn't being set. ![]() I assume you mean you're using Google Kubernetes Engine and talking to SQS over the public internet. From my understanding, SQS won't send a response back to your client until it collects the maximum number of messages you requested or when the wait time expires, which ever happens first. Simply restarting the pods fixed the issue but i would like to not have this happen again if we can help it Thanks for the details! Does your queue also have its ReceiveMessageWaitTimeSeconds attribute set to enable long polling? I imagine so if you're able to get messages after changing the header timeout value.Īnother thing you may want to consider is what you have MaxNumberOfMessages set to is a single consumer (go routine, Pod, etc) processing more than one message at a time? I don't think the default behavior would be a hang, but I'm not really sure. I assume this is because there is no default timeout set on the httpclient so the operation never fails and it gets stuck there in that thread waiting forever for a response (would that scenario make sense to you aswell?). We dont have any contrats per say but usually it is best practice to configure these things (we have been using it without for 2 years and never noticed issues) but a few days ago we experianced network issues between GCP and AWS, the applications that run on GCP were basically no longer receiving messages from the queue. (this particular app im testing with has plenty of spare resources and not under contention at all) ![]() I kept increasing the ResponseHeaderTimeout timeout value untill we stopped seeing as many errors and we are now at 25 seconds. Log.WithError(err).Error( "can not get message from the queue") ![]() this function is needed to configure the default http client of the AWS sdk, // if this is not done no timeouts are set leaving connections hanging forever and causing any disconnect from SNS // SQS to stay disconnected values taken from: // // we also wrap the http client inside the faceit tripper so we have metrics about the http request from the sdk func getDefaultHttpClient() *http.Client ), We found an example in the docs however it seems our services complain hen using this custom httpClient. We found out that its now possible to specify the httpclient to the aws-sdk. ![]() We recently started running into isseues with the aws-sdk, pimarily we are using sqs/sns.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |