Maintaining large software systems is hard. Such systems are usually complex by nature. On diagrams they look nice - couple figures exchanging messages each other. The devil is in details - as always. Microservices add extra complexity. We moved from monoliths to farms of services. Now we can deploy changes separately - we decoupled the responsibilities (SOLID rocks!). Now we have more nodes where out messages are flowing from and to.
Said that, the overall picture does not differ from how it looked before microservices era. It is still hard to maintain and extend large systems.
I wanted to touch just one problem here.
One of our system generated wrong message timestamp after update.
So we have we have crap in out database :) It is way easier to track if system or subsystem is down. It's much harder to track that it is malfunctional or unstable. And of course this only happen on PRODUCTION!
So the dev team is somehow protected with excuse that it was simply not possible to forecast such issue.
How about introducing test actors in production system?
If we have an entity that generates test messages, for example test-company, used by automated tests then we could have continuous testing on production. Such messages would have to be filtered out before sending them to 3rd party systems (usually), but overall we could have handlers or other monitors hooked and expecting some test messages to arrive in certain moment in time.
The concept is bit similar to chaos monkey in terms of doing extra work on production environments.
The idea is really tempting. Let me list all the pros and cons.