Transcript
Introduction
In this video, we are going to look at how to avoid a common mistake in error handling with the integration platform Tray.ai. After working with Tray.ai for a while, you will have the need to centralize error handling. Now let’s look at the best way to do that, based on my experience.
Error Alerting Workflow Overview
In Tray, there is the concept of an error alerting workflow, which is a triggering mechanism that launches a workflow execution when an error occurs. You can have an error alerting workflow set up at the account level or at the workflow level. When both of them are set up, the workflow-level alerting will prevail.
The Tray account can be set up to use a global error handler so that every error will trigger that workflow. If we look at our settings right now, we don’t have that setting set up, so if an error happens, the workflow stops, and we don’t know about it unless we look at the executions.
Setting Up the Global Error Handler
Now we select a global error handler which we created and let’s look at it. So, we have a global error handler. It has an alerting trigger. There is some filtering logic, an extraction of the error message, and then a notification workflow is called.
Notification Workflows for Error Handling
Now let’s say that we use a separate workflow to notify the end system. That might be a data warehouse, a system like Airtable or SmartSuite, Sentry, or even sending a message to Slack, whichever is your preference.
Let’s look at the workflow. Right now, the logic is really not there, and we terminate as an example. So let’s see what happens. We should be good to go. Let’s run the error one more time. We can see the execution of the error alerting workflow. We can see it called the notification error, and we can see that it terminated successfully.
Common Mistake: Infinite Error Loop
Now let’s look at the common mistake which can easily happen. Let’s say that we notify an external system and we want this to be a robust notification. If it fails, we need to make sure that the error does not trigger another error and snowball into a high usage bill.
Let’s see how we can burn through an unlimited amount of tasks. Basically, the error alerting caused the child, which fails, and the failure triggers another alert, which fails, and so on.
Example of Infinite Loop in Error Handling
Let’s see that in practice. Let’s move this termination to the bottom. We have an error script here. Now we run the workflow with an error. It has an error. We look at the global alerting workflow. We look at the notification execution, and we can see a failure, another one.
And that’s the reason why we put some random delay so that we don’t run too many errors at the same time. We can see that this keeps on triggering, and the only way to stop it is by disabling this workflow.
Fixing the Infinite Loop
OK, now it’s stopped. We can re-enable the workflow, and let’s see how we can fix it. Now we can see that on the left-hand side, we pre-created another workflow. Let’s open that up.
This is another alerting trigger workflow, which does nothing, and I’m going to use that for a specific purpose now. So let’s go on the global error handler. We go to the workflow settings, and at the bottom, we select that specific alerting workflow as the alerting workflow of the global error handler. The same thing we do for the notification workflow.
Testing the Fix
Let’s see if that makes a difference. We trigger the error again. We look at the global error handler. Let’s look at the notification workflow. It executed once, and there is an error. And let’s look at the ignore error handler, which executed once as well, so that as you can see, it stopped the infinite loop.
Conclusion
So basically, to overcome the problem, the way I implemented this is with a global error handler workflow and a secondary alerting trigger that ignores errors. The secondary trigger is set up as an alerting workflow for both the global error handler and any other child or callable workflow.
In our case, we only have one: the notification workflow. And this is how you can implement a safer error handling strategy on Tray.ai.