Releases: dberzano/elastiq
Improved daemon stability
- Does not use
screenanymore for running in background. - A single logfile is used, and possible stacktraces are also written there (and no longer in a separate .err file)
- Consider VMs allegedly running when enforcing maximum quota
Fixed quota enforcing with VMs still starting up
Max quota now takes into consideration the number of VMs allegedly running, i.e. the requested VMs that did not join the cluster yet.
Fix calculation of required VMs
Quick fix release.
Better packaging supporting Python 2.6 and 2.7
RPM for Python 2.6 is suitable for Scientific Linux CERN 6.
RPM for Python 2.7 was tested on Fedora 20.
Debian package for Python 2.7 was tested on Ubuntu 12.04 and 14.04.
Support for Ubuntu 12.04+
Packages are now created by means of the excellent Effig Package Manager.
Contextually, .deb packages for Ubuntu 12.04+ are now available.
Tested platforms:
- RHEL 6 (and compatible OSes like SLC 6, CentOS 6)
- Ubuntu 12.04
Thanks to @reneme for pointing me to FPM.
Improved stability
More sensible defaults
Default for waiting jobs threshold changed to zero, and reduced timeout before starting virtual machines.
Fix crash in check instances
Bugfix release: fixed a bug that could cause elastiq to throw an unhandled exception when checking for owned virtual machine instances.
Deal with deployment errors
elastiq now deals with deployment errors. Errors are handled by keeping a permanent state of the instance IDs launched by elastiq: this state is saved to a text file (one instance ID per line). By default, the state file is located at:
$HOME/.elastiq/statein case it is launched by an unprivileged user;/var/lib/elastiq/statein case it is launched by root.
Two types of deployment errors are considered.
Virtual machines in "error" state
Some virtual machines might never boot and go to an EC2 "error" state. In some cases (like OpenStack) such virtual machines are not cleaned up automatically by the cloud, and take up quota.
- VMs are periodically checked for the "error" state
- VMs in "error" are redeployed (i.e., they are terminated and a new one is requested)
The period of error check can be configured via the configuration file:
[elastiq]
check_vms_in_error_every_s = 20Virtual machines do not join the cluster in time
Some virtual machines correctly boot but they do not join the cluster "in time". After a time defined by the variable:
[elastiq]
estimated_vm_deploy_time_s = 600if the virtual machine has not joined the batch cluster yet, a termination is triggered. If the termination fails, it will retry to terminate it in another 10 seconds.
Note: in this case no new virtual machine is requested to replace the terminated one. This is because a "late" virtual machine is allegedly due to a misconfiguration and not an infrastructure glitch. Redeployment will be anyway retriggered, if necessary, by the "scale up" mechanism of elastiq.
Fixes in user-data variable substitution
Fixed bugs in user-data variable substitution.