Organizational impact of micro-service architecture Using micro-services in your engineering organization is where the hype has been at for the past few years. As others like Martin Fowler have pointed out , building a micro-services architecture based on the hype is probably a bad idea as a true micro-services architecture comes with a whole set of new challenges that probably is not worth it for a good chunk of use cases. Since the technical comparison of monoliths and micro services has been done several times (here and here) I will not go into whether or not you should pick a micro services architecture. Instead, I want to focus on what happens to your engineering team communication, responsibilities and overall impact on your engineering organization if you were to pick a micro-services architecture. Team driven vs. standardized architecture One of the big decisions an engineering organization makes as they go micro services route is to decide whether or not all micro-services will look the same or if teams will be given the freedom to use the tools, languages and architecture that makes sense for them. I am all for experimenting and picking a good fit for team and the engineering organization based on the makeup of the engineering team. As such, I have preferred giving teams the freedom to choose their own implementation strategy as long as the overall ecosystem adhered to a few key rules that impacted ease of communication between services, log collection/analysis and monitoring. Decentralized decisioning By giving teams the ability to pick and choose what they think is best for their use case, engineering organizations essentially start pushing technical decisioning from a centralized model to a more distributed one where decisions are made at the edges. The first thing that this does is to spark more architectural discussions, debates and designs within your team. This creates an opportunity (and a good challenge) for engineers to start communicating their ideas to other engineers more effectively and helps them develop their verbal and written communication skils. Making experience years count If you end up with teams that are autonomous in their design and architecture decisions your engineers will probably end up making some unproven assumptions but as long as these are incremental, small micro-services that are released to production in a controlled way as part of A/B tests, your engineers will quickly start getting feedback in the form of hard-data points about their assumptions, architectural decisions and implementation choices. Compared to a monolithic architecture where junior engineers are expected to incrementally introduce new functionality by replicating existing behaviors, micro-services give them the ability to quickly learn based on real-world feedback. As a result, the experience your engineers start gaining actually will mean something because they are constantly learning new things and forming their views on good engineering based on real world evidence rather than executing a cookie cutter approach over and over for years. It’s not all rainbows and unicorns All this sounds great and at this point you may be wondering why would any organization not choose to go with micro-services if it has all these positive impacts on engineers. The answer is in the numbers; as your organization scales the number of services from a handful to a few dozen, discovery, documentation, deprecation, tracing and overall coordination among teams require the creation of a well oiled machine. Setting up the necessary systems, processes and encouraging the desired positive team behavior is no small feat. As such, a concious effort needs to be made to understand the needs of teams, find solutions that work internally and then learn from these experiments to improve. I would even argue that if you cannot afford to dedicate the time/engineering management capacity to restructure your organization, create efficient communication paths and introduce systematic improvements you probably should not roll out a micro-services architecture. Simply because, if not tended to, it would be dice roll to expect to reap the benefits I highlighted above.
This is a pet peeve of mine; everytime I see a manager trying to get something done by carrying the message between folks he manages and upper management, it makes me cringe. If you are an engineering manager and find yourself in this situation; change the company or change your company. You need to ask for things you can manage. What do we mean by “manage” ? A well defined area in which you have accountability and decision making ability. To make sure you are set for success, you should look for a few indicators: What are the boundaries? Is there a clear objective I am managing towards? What is my accountability? What is my decision making ability? These 4 simple questions may give away if you have your manager title but you are merely a communication channel that carries the message back and forth between the decision maker and the engineer. I call these folks messengers. This is not a role anyone signs up for. It usually shapes up to be this way after the fact, due to existing processes and culture of the company. However, engineering managers have the opportunity to change this. Ask for more accountability and decision making authority. If you find yourself in this situation; just ask and take on 1 more area of true management every quarter. You will be amazed how things start changing for the better. If you are a Director or VP of Engineering and you find yourself making these decisions, it is time to change. The more responsibility you give away, the more you gain in terms of direction setting and higher order thinking space.
By now all of us architects are very much used to the idea of spinning a new server in the cloud and scaling our solution horizontally as needed. Virtually nonexistent setup times and a good API provided by cloud vendors makes this a possibility. Infrastructure has come a long way in the form of IAAS and PAAS. Also, open source software has been a great enabler of this movement; namely deployment automation technologies and linux distributions that are customized per use case. The situation in most enterprise systems however, is not that great. It is not rare for me to run into a horizontally scaled solution that is not efficiently utilizing the computing resources. Take the following example; a software solution that serves an end to end business process is created as a single service. When the volume increases on the client, the service is scaled by creating an exact replica and using a load balancer to reduce the load. To elaborate on the issue at hand, I will use a hypothetical car insurance company. As you may already know, there are some basic concepts in car insurance business such as finding a quote that fits your needs, comparing them with competitors, signing up for a policy and finally paying for it. If everything goes well and you don’t get into an accident, your interaction with the insurance company may just end there. Their “service” oriented software may be running on a server that looks like the following: When this company starts becoming successful and gets permanent growth, the current computing capacity may no longer serve its needs. What I mean by permanent growth is a net gain on the number of users. Say, from X to 2X. This growth is a pretty happy and desirable scenario which can be the result of geographic expansion, an acquisition or a marketing campaign. Any sane architect would just double the capacity, distribute the traffic with a load balancer go on with life. The server stacks may look something like this after the capacity upgrade: While this is going on, an X-Ray of the business processes may show us some service resource usage distribution that looks like this. Now lets think about a slightly more interesting scenario. Hurricane Sandy happens. A lot of cars are damaged and as a result the organization is experiencing a spike in the number of claim requests it gets. Since the servers are tuned to handle the current capacity (with a foreseen +/- 5% elasticity) things start slowing down. Customers who are on the phone with the customer service reps start experiencing a longer wait time. Mobile phone based claim submission requests start timing out. Overall, customer satisfaction goes down and there is very little this organization can do about it because they cannot procure and deploy new services overnight and configure their client software to handle the spike. If they had access to cloud resources, they could certainly grow the number of servers temporarily into the cloud and turn them down later. However, in the absence of that there is not much this organization can do. This is exactly why a DRA like framework is needed in the enterprise. Imagine, if this organization could temporarily limit (or even turn off) its capacity to sell more policies and re-allocate those computing resources to the business function that it needs at the moment. The service allocation could look something like this: This re-configuration could save the company thousands of unhappy customers and more importantly ensure that the business resources were utilized to the maximum when they were needed. This concept is not entirely new. There are similarities to Cory Isacson’s Software Pipelines and SOA: Releasing the Power of Multi-Core ProcessingSoftware Pipelines and SOA and cloud computing in general. Some advanced organization with very good engineering teams are able to achieve this scenario by maximizing technologies like Chef/Puppet in the cloud. However, it is not an easily accessible to the common enterprise. What if there was an application/services framework that facilitated this? I believe that a solution like this would truly align the business’ needs with IT capabilities of the enterprise.