Redundancy and Replication
Redundancy means duplicating critical data or services to increase the system's reliability.
For example, if only one copy of a file is stored on a single server, then losing that server means losing the file. Since losing data is seldom a good thing, we can create duplicate or redundant copies of the file to solve this problem.
This same principle applies to services too. If we have a critical service in our system, ensuring that multiple copies or versions of it are running simultaneously can secure against the failure of a single node.
Creating redundancy in a system can remove single points of failure and provide backups if needed in a crisis. For example, if we have two instances of a service running in production, and if one fails or degrades, the system can fail over to the other. These failovers can happen automatically or can be done manually.
Another essential part of service redundancy is to create a shared-nothing architecture, where each node can operate independently of one another. There should not be any central service managing state or orchestrating activities for the other nodes. This helps scalability greatly since new servers can be added without special conditions or knowledge. Most importantly, such systems are more resilient to failure as there is no single point of failure.