MQ Summit 2025 - Gateway Architecture 3.0 with Matt Pavlovich

Feb 4

Video Summary

Matt Pavlovich discussed the benefits of messaging gateways, emphasizing their role in reducing human touchpoints and simplifying application development. He highlighted the importance of centralizing applications across physical locations to handle network performance and availability issues. Pavlovich shared his experience with ActiveMQ, mentioning the upcoming 6.0 release with virtual thread support. He also discussed the shift from orchestration to choreography, the introduction of batch queues, and the need for stricter platform policies to improve developer experience. The discussion also covered the importance of security, cost-effective monitoring, and the potential for industry standardization to streamline messaging adoption.

The video can be seen on YouTube, hosted by Code Sync’s channel. MQ Summit can be found Online or on LinkedIn.

Video Contents (28:00)

Abstract

Messaging gateways need to adapt to modern organization challenges. Organizations that require multi-location connectivity, hybrid-cloud or multi-cloud deployments need to deploy and manage messaging gateways and application connectivity to the gateway in a way that reduces human touch points, provides self-service, and limits the need to change and redeploy code.

This session will quickly introduce messaging gateway architecture, provide an overview of how it is changed and what capabilities are needed for modern platform teams to deliver on excellence.

Talk Objectives

Attendees will learn what a messaging gateway architecture is, what problems it solves, and what use cases it is best suited.

A short trip through historically how early messaging gateways worked and how modern challenges require new capabilities in order for organizations to be successful long term– this goes beyond technically what messaging systems can do and requires gateway solutions to address scaling of the message throughput, but also scaling platform ops team and keeping a handle on costs.

Target Audience

Directors, Architects, Platform Ops, DevOps, and Senior Software Engineers.

Introduction

(0:18) Yeah, and thank you to the conference organizers. I think this is just, it's been a great conference. We were talking to folks at lunch. I don't remember the last time I saw this kind of cross-connect between industries. I think it's, it's really great. So really appreciate y'all opening it up industry-wide. It's, it's already been worth it, even before the chance to speak. So thank you.

Overview

(0:38) And as Richard mentioned, going to talk about messaging gateways. There's been a lot of we're messaging people, and so I think a lot of here, we've already covered a lot of the key ideas of what it is, so we'll blow through that pretty quickly. But want to talk about just just in our experience, of implementing some pretty large gateways and then also kind of cross connecting it with just the end user base, like what the enterprise, you know, developer and operation teams, what the makeup that whole, that whole ecosystem looks like and and how we can build upon what has been done in the past. So the idea here isn't to blow up everything and make something new, but start taking another iterative step and really help enterprises and organizations move their platforms forward and also share some of the things that we've done in the platforms that we've implemented. You know, again, a lot of this is going to be about reducing human touch points. A lot self services is really everything these days, and things that nature so go quickly.

What is a Messaging Gateway?

(1:44) You guys know what a messaging gateway is. In short, it's really just an ability to move messages between physical locations. Kind of the easiest way to think about it. We just had a great presentation there from Matteo talked about connecting cars. The use case I'm going to speak on is really more the retail or read like a retail endpoint or something would be like vending machines. Something's more like fixed physical locations.

Why?

(2:07) The main benefit is that applications that are using a gateway don't need to handle all the problems that come with inconsistent network performance and availability or maintain how to connect to servers. You know, Sam talked about early developers have to and you kind of breezed over like the error handling, the retry, but that's kind of everything really anymore. When you're dealing with distributed systems and when you centralize your applications across physical locations using a gateway, you're reducing the amount of code that developers are responsible and the errors and things that they've got to handle, you're taking a ton off their plate, and so you're going to provide a lot more consistency. Things will be handled on the same way. And you're really taking a whole lot of thing, lot of code, off their plate that they have to work with, because in the Hello World sales demo of sending a message down to a store, it's maybe a half a dozen lines when you start adding retry configuration error handling, that all of a sudden that balloons out to, you know, much larger. And the savings is really massive. It's really, and I see it rarely quantified. Is really just this idea of, you know, hey, let's create some data highways and allow application teams have a nice code repository of, this is how you you know, this is how you send. This is how you get, this is how you handle the failover, the retry, and then let everybody kind of multiply and get that organizational velocity, which is, you know, one layer up above, kind of the team sprint layer.

About

(3:33) This is me, Apache ActiveMQ member, JB did a good job talking about the, you know, what we're doing, the projects. A lot of what we work on is with ActiveMQ. I'm leading a lot of the observability migration modernization. We just I think JB is kicking the release out. The next six to zero release includes the virtual thread support, which is really exciting. Historically, I've done a lot of work in hybrid messaging. So I've migrated well over 10,000 IBM MQ managers. I've also done a lot of work with permanent hybrid IBM MQ, ActiveMQ, several other vendors. I got my start in the early SOA days. So yeah, there's you remembers, you know, and the way it worked, and kind of an interesting, I think, background that I started really an open source first, you know, through high school and college, got out into the enterprise. I was working for a mid size telecommunication company, and we knew we needed to do a an API platform, which, back then, was called Soa with messaging, and we had to do the Bake Off between the top three products. And so, of course, we went through ranked them, you know, gave that to management, said, Hey, we feel really good about number one, like, price capability, it's gonna do everything we need. Number two, you know, kind of expensive, a couple whistles and bells, but we could do without it. Number three, we're like, don't touch it. Like, please God no. Like, we can't do option three. So of course, what did we go with? Option three... You know, there was supposed to be some sort of dealing between the companies. You know, they needed some internet access. Of course, that didn't happen. So me and my team were stuck with option three that like, literally didn't work. And so that's really how I got into, you know, adopting some of these open source technologies. And so we were out there digging around in the early days of meal and active and queue and service mix to find, like, please, something that we that we can deal with. And that's how I got into it. So I think it brings kind of an interesting perspective, you know, kind of being on the enterprise side first. So we've always kind of carried that with us.

Shift

(5:37) And how we attack things is thinking about, you know, how does this impact the people developing and managing to it? You know, it's great to be in a room with really bright people, but a lot of times, and we're seeing trends with developers where, you know, a developer in an enterprise might be in the seat for only two years, right? An ops team member maybe a year, year and a half. So if we're thinking about having these really complex systems do really cool and awesome things. It's like, how do we make it easier? How do we make it easier to comprehend, easier to manage, easier to run and really gain confidence in the organ? We want to gain organizational confidence and that we might solve all the technical problems, but if we don't solve the people problems, they're not gonna be able to realize that. So yeah, so a lot of this shift, the idea here of shifting concepts of what we've done in the past is, you know, really now gateways really must provide more of everything, so all the technology, all the security, performance capabilities, but it really needs to require less from DevOps and mass and messaging platform operation teams. That's kind of a contradicting goal, but I feel like it's the reality of what we've done in the organization.

Gateway Architecture v3.0

(6:43) So some of the concepts with what we call our gateway three architecture is we want we've really noticed that moving more towards choreography over orchestration. So instead of having like, big clusters that know about everything what's going on, we want to build as many, like kind of individual operating with as least, number of runtime dependencies as possible. We're still retaining a lot of key elements of store and forward. There's a lot of really great things out of 30 year old er architectures. Every time I think we've kind of seen the end, to go back and look at some old architectures from 25, 30 years ago, it's like, aha, that's why they did that. But we want to do things to really help developers, you know, get out of you know, I think what we're policies are a little too permissive. So we want to layer on like things that are stricter policies about what can get sent into the platform. There's conversations about messaging schema. I think we could take that a step further. Obviously, self-service must be there, and more observability. This takes that load off the operation teams, put it in the hands of the developers, which enables them. We're not trying to throw it over the fence, but we're trying to shorten that latency, that support latency, and if we can give them a common set of tools and documentation example code, then the developer teams can be more self sustaining. I think what Matteo just talked about was, you know, what they did with the virtual topics, and MQTT and Pulsar is very similar to the transmission queue concept and IBM MQ, and then we apply that same concept in this and ActiveMQ. So, you know, not every queue needs to be like have all the same characteristics, and it can be used differently. As he showed we also introduced the concept of a batch queue and routing queues, which other products have similar things. And we want to focus on developer automation experience and the developer experience. And then just kind of a key thing here is, you know, the solution for handling back pressure, for multi-threaded producers, it starts looking like a whole broker really quickly. So if you've got, if you've got send multiple producing threads, and you want to make that highly reliable and handle the back pressure, pretty soon you're going to start taking compensating actions, like we're going to page this to disk, or we're going to, you know, throw an error, we're going back. All of a sudden, everything you look on that producer side starts looking like a broker. And really, this gateway architecture becomes a core building block to the broker mesh. And as JB mentioned in his ActiveMQ talk, one of the key things ActiveMQ has is this VM transport, and a second capability of that is you can code an app. It could be like a REST endpoint. You're receiving data from third party, and then you want to a queue. You can have a URI in your app, or you start off deploying that endpoint that talks to a broker over another network. Well, just by flipping the URI, I can make an embedded broker run in that service and forward the data to another cluster of brokers. So we can do some really cool things, which is really more of like a broker mesh. And so this concept of what, what's out on the edge, or what the leaf notes, they don't always need to look like a server. They don't always need like a look like a whole broker. We kind of kind of have these, like micro or mini brokers that are running inside is this existing applications that may have originally just been written to be producers, a. All will be sharing more information and examples on that kind of later, but just wanted to talk through kind of the building block and where this kind of leads,

Gateway Architecture Diagram 1

(10:08) Quick gateway architecture, yay. We've got a cloud one or two apps. You know, this could get into the hundreds of 1000s, but we, you know, we need to handle the routing and, you know, network mediation between physical locations.

Gateway Architecture Diagram 2

(10:21) This starts, you know, getting wider. We really like doing partitions over big, big, wide clusters that know everything. This allows for gradual upgrades, rollbacks, if you've get, you know, some things that can happen in a physical when you're when you're deploying to physical environments is you can have large natural disasters. Fiber cuts hurricanes, and you can lose whole swaths of nodes, not just the single node failure. And so when you have broker architectures like this, you can shuffle that traffic around and handle kind of big events. There's a lot of focus on like single node and single element failure, but there's times you can get the cascading, as I mentioned.

Gateway Architecture Diagram 3

(11:00) And so when you get when you get to kind of broker three, or gateway three architecture, we talked about having a rotary layer on the top, this is about managing and mediating. When you're sending to five or 10,000 physical endpoints and you've got a dozen application, the number of destinations really explodes. And so this is really about making steps to move from a large number of inbound connections to a small number of queues, then from a small number of queues to a large number of connections. So by leveraging different queue types. So the top layer, we'd be using what we call like a routing queue. The middle layer, we'd be using transmission queue by kind of changing that architecture, and by making hops, we enabled the ability to kind of manage that, that challenge which, you know, message brokers spend tons of time on, which is numbers of connections, numbers of destinations, things like that, numbers of threads,

Batch Queues

(11:55) So batch queues. This is pretty cool. We needed to solve a problem where kind of came out in active. And queue has got a couple features that allow us to do this. So one of the one of the challenges you have is if you want to run a large number of messages related like a pricing data set, and you don't want it to be released until it's all loaded. So this is what a batch queue does. So it's a one time use queue. The messages get loaded up once the message group is closed, then the messages will flow down. So the messages are never forwarded down if the batch isn't completely loaded to the first hop, and then this queue would be garbage collected when done or expired. And so this is the kind of the mindset here is think cattle versus pets for your queues. So we don't need these queues to always be long living, right? If we're doing a pricing data set. One of the cool things this allows us to do is we can still get the effectiveness of a large transaction with the excuse me, we can get the effectiveness of a large transaction, but still using asynchronous sends, which is going to kind of counter intuitive, but since no messages would be delivered to the next hop, it waits till it's all closed, so we fast load everything, and then once it's closed, we know we received everything, then messages start moving to the next layer. And so it looks like a large transaction you can overcome. You know, Max commit counts. You don't have to have the latency with transactions. You get async send and with ActiveMQ, when you turn all the async flags on, you start getting, like stream, like performance, which is, which is really useful. There's other key use cases for, like, one time use queues, specifically like large, long running, scheduled tasks. So if you have a CI job, you want to take a single message off a queue, but you're building a piece of software that takes 30 or 45 minutes. You want to keep that message in a transaction. You can't have that that consumer open for that longer. It starts looking like a slow consumer. And so by using these one-time use patterns, we can still get the reliable delivery, the transacted semantics, the individual processing but overcome some of these things that become contradictory to each other, that we run into in messaging like all the time. So batch queues has been a key fun part for this.

Message Routing

(14:17) So with message routing, in previous iterations of gateways, there's a lot of static addresses and maps, or you may even kind of what I call gen two. There was a lot of like camel or mule integration. A lot of problems with that. That we found is organizations struggle with code upkeep. So if you're an operations team and they're doing their DevOps team, their coding time is typically spent in automation around deployments, monitoring, alerting, scaling and getting into, you know, languages and frameworks becomes a challenge, and finding resources for that. And so we want to have kind of more things built in platform teams need fewer parts, right? Like less things to know about just make it work so they can scale and increase velocity. So we like to look. At a metric like, how many ops team people do you have, versus to number brokers, and then how many developers, like, how many services can they support? And so we think those are good metrics to help guide organizations with. So the shift would be, you know, in routing, this isn't new, but we want to do routing without additional disk rights. We want to have abstract client-coded addresses from the actual queues, broker names and topology, also the ability to support like per tenant routing maps, and then self-service to those routing maps. So if you've got an application and they want to, you know they're moving things to stores and they want to, and we're pushing things through different routing and transmission queues, and transmission queues. And when they get to them, they want to fan out, you know, day one, they might have one queue for their app, and then they want to be able to, like, use different message properties. We'd want them to be able to update their maps. Once they've kind of tied into the platform, they should be able to add, you know, within reason, the ability for their messages to kind of train, or them to add, like, a new flow, kind of thing.

Platform Policies

(15:59) Platform policies. I think this is a really been a big challenge. The dead letter queue problem becomes, I think, a bigger conversation we get into, like application development, where we typically talk about errors in a way of, like, we'll just retry, but there's a lot of errors that we already know up front that are never going to get better. You know, if you send a JPEG of the, you know, the funny cat meme in a consumer is expecting JSON. That's never going to work. We don't need to do redeliveries. We don't need to do retry we don't need to need to be looked at in a dead letter queue. So really want to think about, you know, dead letter queues. It was, it's kind of the, I think there was kind of the mindset of, the mindset of, hey, we want to be cool admins. We're gonna let you send whatever you want, and you got this place to go manage it. And you know, we're being as platform managers. We're being very like accommodating, but it's a little too accommodating, because then it actually puts more work on the developers to have to have these jobs to do all these things. So it requires constant monitoring, rework, and all this stuff. You guys get it. So we want to have this concept of a dead letter queue and a retry letter queue. So we already know there's a whole set of things that can get fixed with a retry, and there's a whole bunch of things that can't so if you're dead, just be dead, like we don't want to redeliver, we don't want to retry like you're dead, and we might put you in a dead letter queue, because it may be unexpected and you can get alert about it, or it could be in a dead letter queue, and it could quickly expire and just go away on the on the policies. We want to be very or in the past, we were very permissive in receiving messages. We want to really increase this, like, if there's header requirements, if there's a message type, if there's, you know, we don't allow you to use priority. We require to use, you know, these other capabilities. We weren't like, you're, oh, you're not connected with failover. Really reject early, like at the per tenant level. So before they even send anything, the developers can be like, Nope, you're not using the failover protocol. You can't talk to us, or you connected a producer to the consumer only, transport connector. You're not able to do that. So there's a lot of things that we can do to really increase that kind of, that immediate feedback to developers about things that would cause their application problems on the dead letter queue. As I mentioned, we want to only DLQ things that are really dead, and then we've got this kind of, this concept of a retry letter queue. A retry letter queue is cool because if you move everything you know is retryable, it can be automatically replayed, and some things will go and some things may not, but that's fine, but it makes an automatable way to handle that error and fallout. And again, the whole goal here is to provide that immediate feedback to developers. And DevOps have kind of established their processes this. This enforces. It a technical in the past, it's been things like, there's a wiki page, or there's an example config somewhere, and everyone's supposed to just, know, you know, put this, put this in your URI to make it compliant to our platform. So we want to really do more enforcement.

Enterprises should demand more

(18:57) You know, as I mentioned, I started out really. I started out open source, then I went enterprise, then I went services, then I'm in now I'm doing more product. And so I've kind of always carried that enterprise hat with us, which is kind of a contradiction. And I think, but we try to be open and transparent, just share. I think it allows us to kind of connect with our customers and say, Hey, we've been in your seat, you know, I think for, you know, for customers, I think if they have to constantly replatform, it's a major disruption. And there's, you know, confidence gaps in the industry. So I really like the things I was hearing, you know, from Richard in conversations about, you know, standardizations, things that we could do to make the experience better for enterprises just as a whole. So, you know, ideally, if we had, you know, more refresh on standardizations, APIs and protocols. You know, the real uplift in any enterprise isn't changing the deployed brokers, right? It's the app code. It's the cost of having to change the five to 10,000 apps that most modern enterprise. Enterprises have these days. And so the more we drive towards that, I think it's just a better deal for the industry. And being, you know, coming from the enterprise seat, that'd be something like I'd be wanting. So that's something that we should we share with our enterprises. Like, Hey, you guys, you know, you guys have a say in this. You're part of the ecosystem. You all should demand this. Control source, well, we'll skip over that for now. Yeah. Then lastly, just really application code should not change when topology or product provider change. That's really the expensive part. You know, you can you can move providers, you can move cloud vendors. But if you got to go change the code, that's really where the pain is, and really the cost and the impact of the business. And so we try to encourage customers to think about it this way, that the company with the lowest cost to deliver change will win their segment. So if they've got really expensive platforms, or their cost of change is high, they're going to struggle, and their competitors will have an advantage.

Struggles with streaming

(20:55) We're seeing some struggles with streaming, like at the Gateway layer, for example, a change in a message retention policy at the at the broker level is, kind of, is a change in the original contract which the app was coded. So if an app was coded with a 14 day retention policy, and then costs are high and management comes down. Say, you know, finance is like, Hey, we're now doing three day well, if you app code it to something with a much higher retention like, that's, you know, there's a there's a conflict there, the same message is rarely going to every endpoint anymore. Like, if you're sending the price of toothpaste, that's changing by region, by store. So you're typically doing a lot of, like, really kind of targeted messaging. Batch queues are really valuable, because these don't deliver if there's an error in the batch. And so this can be preferable over constantly redelivering the same data. So this batch queue concept is we've seen as being very fruitful in this. Just another way to think about queues. Queues are really great for moving data, like we're talking a lot about API and app integration, but if we're really just moving it, queues are excellent for them. And then the idea of retaining tons of messages that don't ever get replayed, like, we just see a lot of effort in, like, replication and replaying all these things, and, like, a lot of this stuff never actually happens. And so there's this big cost. And so we're, I think we're starting to see conversations about, you know, which workloads work best and which kind of platforms,

Quick wins

(22:22) I want to put together just a couple quick wins a lot of one of the questions we get is, Hey, should we put a broker in the store or the physical location, like a distribution center? And the answer is, yes, absolutely. Get good at it. The number of apps and devices and physical locations is only going to continue to grow as the demand for automation, integration, robotics, AI, all these things, so it's in your benefit to get good at being able to put things where you're doing critical business usage. The other thing is, you know, from physical location going straight to the cloud, that's a big hop, right? You're, you're the number of layers in between. It's not one network link, right? If you're in, you know, and you're connecting up to a cloud provider in Chicago, and you're out in, you know, Missouri or North Carolina, that's a lot of network links. I'm in the telco world, I'll tell you, there's a lot of perl still there. There's a lot of duct tape, like, you know, the network works, but there's just a lot of layers, right? And you know, one of the things that when people ask us, Well, where do we put the broker? And the answer, we always tell them, is put it as close to your app on all ends as you can. So same physical location, same Kubernetes cluster, sometimes, same pod, some, same process, but the closer you put the broker to the producer, and then closer to the consumer the really, that's where you're going to get the benefit of doing messaging. As soon as those those hops go longer, you're you're starting to lose some of the win, which is that the ability to manage all the network changes.

The mindset of partition is not one big cluster. This term partition, we're really talking about a segmenting traffic, not one big cluster. So that, I'd probably go back to that earlier word, choreography. So we kind of design a system where these nodes can talk to these endpoints by themselves. They don't need any other third party dependencies. They're not talking to a cluster controller. There is an operator in configuration automation that's being pushed or pulled, but the brokers themselves can run with that a lot of times without a load balancer. Again, going with the idea that there's when there's less things in the stack, there's less security patches, there's less updates, there's less downtime, there's less interruption of the apps.

I definitely recommend security early. It's very hard to add later. In our experience, once dev teams kind of get their arms around two way SSL, they'll share it with others. I think the best way that we've seen communicating like how to do hard things is get one at one app dev team figure out how to do it, and they're going to do the work for you. They'll go tell the other app teams once they have success with it. And two-way SSL solves all the problems, I think the jump to tokens creates that other third-party dependency. We've seen some challenges with that trying to. Plug into like OAuth systems, you know, back-end data processing and OAuth. OAuth wasn't really designed for it. It can be done. I think there's a halfway with it. But if you just go all the way to two-way SSL, you kind of solve a lot of other problems as well.

Monitoring cost is a real concern. We're talking 1000s of endpoints, each with dozens and 1000s of queues. You know, keeping that time series data is really expensive. An alternative to that is using health checks. So just monitor the health checks that come back for those nodes at the end of the day, if your gateway is flowing and the data is being processed, if the health checks pass on the store endpoint, and you kind of monitor the flow with the brokers, you don't need to have time series data of every queue, every endpoint for all of time. So yeah, and then also, if, you know, if a broker support cost is a factor, you know, you might have a wrong vendor. There's, you know, definitely alternative ways to get to implement and do these things in really cost effective manner. This kind of going back to number one, you know, I think one of the things we hear is, you know, the cost, like licensing costs, things like that, can be expensive. And so we just really want to encourage, like, definitely, you want to get, you know, brokers close to where you're doing the work.

Call for collaboration

We need a ‘curl’ for messaging

(26:14) Lastly, I added this slide after some of the conversations, you know, we had a lot of really good, you know, conversations, I think, you know, in our experience, one of the things that's really been a challenge in messaging adoption is there's not like a curl for messaging with rest and API integration and the challenges. And Sam spoke on those earlier today that app teams have with, you know, all the threads trying to handle their own retry and connectivity and errors and security and logging, like all this stuff, micro services, but at the end of the day, one of the things that makes that stuff really attractive is is you've got really simple tools that everybody knows how to use. You can get a REST endpoint. You can, you know, post a payload and an admin and, you know, Unix ops guy can do it, somebody in a Kubernetes pod can do it. You know, curls just installed everywhere, and we don't really have an equivalent of that on messaging and so, I guess, you know, really, my kind of challenge, just you know, us as a industry, would be, hey, you know, we can do this. The Open SSL has already provided a, you know, a pattern, right? It supports multiple protocols, multiple algorithms. You know, we could do something similar with, you know, a set of command tools. So, yeah, anyway, appreciate guys having me any questions. All right, excellent, got it in time. Thank you very much. Oh, coffee's next, sorry. Oh, I didn't mean to steal your thunder!

Code Sync YouTube Video

Matt Pavlovich

Matt Pavlovich, the Chief Technology Officer and Technical Practice Lead at HYTE Technologies, directs the HYTE Product Development Team. With a wealth of experience in the Open Source Software community, Matt is also a Committer on the Apache ActiveMQ project. Known for his technical prowess and leadership skills, Matt has successfully led numerous large-scale ActiveMQ implementations worldwide. Under his guidance, HYTE's services and tools enable accelerated Enterprise application development and enhance the supportability of middleware solutions.