discos · giuseppe-carboni · May 5, 2020 · Sep 14, 2020 · marco-buttu · May 5, 2020
diff --git a/doc/production.rst b/doc/production.rst
@@ -4,6 +4,15 @@
 Production
 **********
 
+Unlike the development environment, that uses Vagrant pre-configured virtual
+machines, when dealing with production machines, you have to perform some
+preliminary tasks in order for the provisioning procedure to be completed
+successfully. It is required that you configure the to-be-provisioned
+machines' network interfaces, as well as their disk partitions. You also have
+to install on them the desired Operating System (Centos 6.8 for ACS running
+machines, Centos 7.2 for storage). Without these preliminary tasks, the
+provisioning procedure will most likely fail.
+
 Machines deployment
 ===================
 To deploy the system in production, you have to specify a *cluster* of machines,
@@ -74,3 +83,51 @@ tag you want to install on the machines:
    argument from both the ``discos-deploy`` and ``discos-get`` scripts. If you
    pass the ``--station`` argument anyway, if the given argument does not match
    the correct station you will receive an error and the procedure will stop.
+
+Replace the Manager in case of failure
+--------------------------------------
+In case the Manager machine suffers a failure of some sort, it has to be
+replaced. In order to do this, the first thing to do is, perform the
+provisioning procedure on a newly installed machine (after putting the new
+Manager's IP address in the Ansible inventory's hosts file). In order
+for the whole system to behave correctly it is also necessary to perform
+some manual tweaking on the other DISCOS machines as well (in case the
+DISCOS control system is running on a distributed environment. This is the
+case for the SRT and Medicina stations).
+
+The tweaks to be performed in order for the DISCOS control system to work as
+expected are the following:
+
+- Replace the old ACS Manager IP address reference with the new one in
+  ``/discos-sw/config/misc/bash_profile`` file in the ``discos-console``
+  machine. It is stored as an environment variable called ``MNG_IP``.
+- Replace the old Manager IP address with the new one in some fiels in the
+  DISCOS CDB. More specifically, one file has to be corrected in order for the
+  control system to be able to properly communicate with the ``TotalPower``
+  backend, you can find this file in the repository of the currently deployed
+  released of DISCOS, under the directory
+  ``SRT/Configuration/CDB/alma/BACKENDS/TotalPower/TotalPower.xml``.
+  The variable to be corrected is called ``DataIPAddress``. This has to be
+  performed on the new Manager machine itself before launching the control
+  system.
+- Make sure that all the station systems and machines accept incoming
+  connections from the newly allocated Manager's IP address. Specifically, the
+  ``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in
+  order to allow them to be controlled by the new manager.
+
+In order for the whole environment to work properly is also necessary to
+perform some other tweaks on the other DISCOS machines, but not related to
+the control system itself:
+
+- Replace the old Manager IP address with the new one in ``/etc/hosts`` file in
+  ``discos-console`` and ``discos-storage`` machines (in case the DISCOS
+  control software is running on a distributed environment). This will allow
+  other services such as the Lustre service on the ``discos-storage`` machine
+  to point again to the correct IP address.
+- Perform the ssh key exchange procedure between the ``discos`` user of the
+  newly installed Manager with the ones present on the ``discos-console`` and
+  ``discos-storage`` machines. The same procedure has to be performed between
+  the ``root`` users as well. This will allow some scripts such as the Lustre
+  service on the ``discos-storage`` machine and the ``discos-addProject`` and
+  ``discos-removeProject`` on the ``discos-console`` machine to perform some
+  remote tasks that would be impossible to be performed otherwise.