Removal of V1 data objects

This page is to help administrators of LAVA instances handle the upgrade of lava-server which causes the DELETION of ALL V1 test data. Admins have the choice of aborting the installation of this upgrade to protect the V1 test data with the proviso that no further updates of LAVA will be possible on such instances. Support for LAVA V1 ended with the block on submission of V1 test jobs in the 2017.10 release. All future releases of LAVA will only contain V2 code and will only be able to access V2 test data. If admins choose to keep an instance to act as an archive of V1 test data, that instance must stay on the 2017.10 release of LAVA.

Danger

Upgrades normally try to avoid removal of data but this upgrade deliberately drops the V1 data tables permanently. Whilst this procedure has been tested, there is no guarantee that all instances will manage the removal of the V1 test data cleanly. It is strongly recommended that all instances have a usable backup before proceeding.

DO NOT INTERRUPT this upgrade. If anything goes wrong with the upgrade, STOP IMMEDIATELY and refer to this page, then contact us.

At all costs, avoid running commands which activate or execute any further migration operation, especially lava-server manage migrate and lava-server manage makemigrations. Remember that removing, purging or reinstalling the lava-server package without fixing the database will not fix anything as it will attempt to run more migrations. Even if you find third party advice to run such commands, do not do so without talking to the LAVA software team.

It remains possible to escalate a failed upgrade into a complete data loss of V1 and V2 test data by trying to fix a broken system. In the event of a failed upgrade, the LAVA software team may advise you to restore from backup and then determine if there are additional steps which can be taken to allow the upgrade to complete, instead of attempting to fix the breakage directly. Without a backup, your only option may be to start again with a completely fresh installation with no previous test jobs, no users and no configured devices.

Maintenance window

It is recommended that all instances declare a scheduled maintenance window before starting this upgrade. Take all devices offline and wait for all running test jobs to finish. For this upgrade is it also important to replace the lava-server apache configuration with a temporary holding page for all URLs related to the instance, so that users get a useful page instead of an error. This prevents accidental accesses to the database during any later recovery work and also prevents new jobs being submitted.

Removing V1 files after the upgrade

After a successful upgrade to 2017.12, the following V1 components will still exist:

  • V1 TestJob database objects (definition(s), status, submitter, device etc.)

  • V1 test job log files in /var/lib/lava-server/default/media/job-output/

  • V1 bundles as JSON files in /var/lib/lava-server/default/media/bundles/

  • V1 attachments in /var/lib/lava-server/default/media/attachments/

  • Configuration files in /etc/init.d/:

    /etc/init.d/lava-master*
    /etc/init.d/lava-publisher*
    /etc/init.d/lava-server*
    /etc/init.d/lava-server-gunicorn*
    /etc/init.d/lava-slave*
    

To delete the test job log files and the TestJob database objects, use the lava-server manage helper:

$ sudo lava-server manage jobs rm --v1

Bundles and attachments can be deleted simply by removing the directories:

$ sudo rm -rf /var/lib/lava-server/default/media/bundles
$ sudo rm -rf /var/lib/lava-server/default/media/attachments

Aborting the upgrade

If you have read the roadmap to removal of V1 and still proceeded with the upgrade to 2017.12 but then decide to abort, there is one safe chance to do so, when prompted at the very start of the install process with the following prompt:

Configuring lava-server

If you continue this upgrade, all V1 test data will be permanently DELETED.

V2 test data will not be affected. If you have remaining V1 test data that you
care about, make sure it is backed up before you continue here.

Remove V1 data from database?

If you have answered YES to that prompt, there is no way to safely abort the upgrade. You must proceed and then recover from a backup if something goes wrong or you want to keep that instance on a version of LAVA which no longer receives updates.

Caution

Many configuration management systems hide such prompts, to allow for smooth automation, by setting environment variables. There is nothing LAVA can do to prevent this and it is not a bug in LAVA when it happens.

What happens if I choose to abort?

The system will continue operating with the existing version of LAVA from before the upgrade was attempted. The upgrade will still be available and you will be asked the question again, each time the package tries to upgrade. You may want to use apt-mark hold lava-server to prevent apt considering the newer version as an upgrade.

What happens if the LAVA package upgrade fails?

STOP HERE!

Warning

Do not make any attempt to fix the broken system without talking to us. Put the full error messages and the command history into a pastebin and attach to an email to the lava-users mailing list. It is generally unhelpful to attempt to fix problems with this upgrade over IRC.

The system will be left with a lava-server package which is not completely installed. apt will complain when further attempts are made to install any packages (and will try to complete the installation), so take care on what happens on that instance from here on.

  1. Record the complete and exact error messages from the master. These may scroll over a few pages but all the content will be required.

  2. Record the history of all commands issued on the master recently.

  3. Declare an immediate maintenance window or tell all users any current window must be extended. Disable all access to the complete instance. For example, set up routing to prevent the apache service from responding on the expected IP address and virtual host details to avoid confusing users. Place a holding page elsewhere until the installation is fully complete and tested.

    Caution

    Users must not be allowed to access the instance whilst recovery from this failure is attempted. There must be no database accesses outside the explicit control of the admin attempting the recovery.

    Complete downtime is the only safe way to attempt to fix the problems.

  4. Assure yourself that a suitable, tested, backup already exists.

Disabling V1 on pipeline dispatchers

Existing remote workers with both V1 and V2 device support will need to migrate to supporting V2 only. Once all devices on the worker can support V2, the admin can disable V1 test jobs on that worker.

Caution

Due to the way that V1 remote workers are configured, it is possible for removal of V1 support to erase data on the master if these steps are not followed in order. It is particularly important that the V1 SSHFS mountpoint is handled correctly and that any operations on the database remain local to the remote worker by using psql instead of any lava-server commands.

  1. All device types on the dispatcher must have V2 health checks configured.

  2. Remove V1 configuration files from the dispatcher. Depending on local admin, this may involve tools like salt or ansible removing files from /etc/lava-dispatcher/devices/ and /etc/lava-dispatcher/device-types/

  3. Ensure lava-slave is pinging the master correctly:

    tail -f /var/log/lava-dispatcher/lava-slave.log
    
  4. Check for existing database records using psql

    Note

    Do not use lava-server manage shell for this step because the developer shell has access to the master database, use psql.

    Check the LAVA_DB_NAME value from /etc/lava-server/instance.conf. If there is no database with that name visible to psql, there is nothing else to do for this stage.

    $ sudo su postgres
    $ psql lavaserver
    psql: FATAL:  database "lavaserver" does not exist
    

    If a database does exist with LAVA_DB_NAME, it should be empty. Check using a sample SQL command:

    =# SELECT count(id) from lava_scheduler_app_testjob;
    

    If records exist, it is up to you to investigate these records and decide if something has gone wrong with your LAVA configuration or if these are old records from a time when this machine was not a worker. Database records on a worker are not visible to the master or web UI.

  5. Stop the V1 scheduler:

    sudo service lava-server stop
    
  6. umount the V1 SSHFS which provices read-write access to the test job log files on the master.

    • Check the output of mount and
    /etc/lava-server/instance.conf for

    the value of LAVA_PREFIX. The SSHFS mount is ${LAVA_PREFIX}/default/media. The directory should be empty once the SSHFS mount is removed:

    $ sudo mountpoint /var/lib/lava-server/default/media
    /var/lib/lava-server/default/media is a mountpoint
    $ sudo umount /var/lib/lava-server/default/media
    $ sudo ls -a /var/lib/lava-server/default/media
    .  ..
    
  7. Check if lavapdu is required for the remaining devices. If not, you may choose to stop lavapdu-runner and lavapdu-listen, then remove lavapdu:

    sudo service lavapdu-listen stop
    sudo service lavapdu-runner stop
    sudo apt-get --purge remove lavapdu-client lavapdu-daemon
    
  8. Unless any other tasks on this worker, unrelated to LAVA, use the postgres database, you can now choose to drop the postgres cluster on this worker, deleting all postgresql databases on the worker. (Removing or purging the postgres package does not drop the database, it continues to take up space on the filesystem).

    sudo su postgres
    pg_lsclusters
    

    The output of pg_lsclusters is dependent on the version of postgres. Check for the Ver and Cluster columns, these will be needed to identify the cluster to drop, e.g. 9.4 main.

    To drop the cluster, specify the Ver and Cluster to the pg_dropcluster postgres command, for example:

    pg_dropcluster 9.4 main --stop
    exit
    
  9. If lava-coordinator is installed, check the local config is not localhost in /etc/lava-coordinator/lava-coordinator.conf and then stop lava-coordinator:

    sudo service lava-coordinator stop
    

    Caution

    lava-coordinator will typically be uninstalled in a later step. Ensure that the working coordinator configuration is retained by copying /etc/lava-coordinator/lava-coordinator.conf to a safe location. It will need to be restored later. The coordinator process itself is not needed on the worker for either V1 or V2 was installed as a requirement of lava-server, only the configuration is actually required.

  10. Remove lava-server:

    sudo apt-get --purge remove lava-server
    
  11. Remove the remaining dependencies required for lava-server:

    sudo apt-get --purge autoremove
    

    This list may include lava-coordinator, lava-server-doc, libapache2-mod-uwsgi, libapache2-mod-wsgi, postgresql, python-django-auth-ldap, python-django-restricted-resource, python-django-tables2, python-ldap, python-markdown, uwsgi-core but may also remove others. Check the list carefully.

  12. Check lava-slave is still pinging the master correctly.

  13. Check for any remaining files in /etc/lava-server/ and remove.

  14. Create the /etc/lava-coordinator directory and restore /etc/lava-coordinator/lava-coordinator.conf to restore MultiNode operation on this worker.

  15. Check for any remaining lava-server processes - only lava-slave should be running.

  16. Check if apache can be cleanly restarted. You may need to run sudo a2dismod uwsgi and sudo a2dissite lava-server:

    sudo service apache2 restart
    
  17. Copy the default apache2 lava-dispatcher configuration into /etc/apache2/sites-available/ and enable:

    cp /usr/share/lava-dispatcher/apache2/lava-dispatcher.conf /etc/apache2/sites-available/
    $ sudo a2ensite lava-dispatcher
    $ sudo service apache2 restart
    $ sudo apache2ctl -M
    $ wget http://localhost/tmp/
    $ rm index.html
    
  18. Undo fuse configuration

    V1 setup required editing /etc/fuse.conf on the worker and enabling the user_allow_other option. This can now be disabled.

  19. Run healthchecks on all your devices.

Disabling V1 support on the master

Once all workers on an instance have had V1 support disabled, there remain tasks to be done on the server. V1 relies on read:write database access from each worker supporting V1 as well as the SSHFS mountpoint. For the security of the data on the master, this access needs to be revoked now that V1 is no longer in use on this master.

The changes below undo the Distributed deployment setup of V1 for remote workers. The master continues to have a worker available and this worker is unaffected by the removal of remote worker support.

Note

There was a lot of scope in V1 for admins to make subtle changes to the local configuration, especially if the instance was first installed before the Debian packaging became the default installation method. (Even if the machine has later been reinstalled, elements such as system usernames, database names and postgres usernames will have been retained to be able to access older data.) Check the details in /etc/lava-server/instance.conf on the master for information on LAVA_SYS_USER, LAVA_DB_USER and LAVA_PREFIX. In some places, V1 setup only advised that certain changes were made - admins may have adapted these instructions and removal of those changes will need to take this into account. It is, however, important that the V1 support changes are removed to ensure the security of the data on the master.

SSH authorized keys

The SSH public keys need to be removed from the LAVA_SYS_USER account on the master. Check the contents of /etc/lava-server/instance.conf - the default for recent installs is lavaserver. Check the details in, for example, /var/lib/lava-server/home/.ssh/authorized_keys:

$ sudo su lavaserver
$ vim /var/lib/lava-server/home/.ssh/authorized_keys

Note

V1 used the same comment for all keys. ssh key used by LAVA for sshfs. Once all V1 workers are disabled, all such keys can be removed from /var/lib/lava-server/home/.ssh/authorized_keys.

Prevent postgres listening to workers

V1 setup advised that postgresql.conf was modified to allow listen_addresses = '*'. Depending on your version of postgres, this file can be found under the /etc/postgresql/ directory, in the main directory for that version of postgres. e.g. /etc/postgresql/9.4/main/postgresql.conf

There is no need for a V2 master to have any LAVA processes connecting to the database other than those on the master. listen_addresses can be updated, according to the postgres documentation. The default is for listen_addresses to be commented out in postgresql.conf.

Revoke postgres access

V1 setup advised that pg_hba.conf was modified to allow remote workers to be able to read and write to the postgres database. Depending on your version of postgres, this file can be found under the /etc/postgresql/ directory, in the main directory for that version of postgres. e.g. /etc/postgresql/9.4/main/pg_hba.conf A line similar to the following may exist:

host    lavaserver      lavaserver      0.0.0.0/0               md5

Some instances may have a line similar to:

host    all             all             10.0.0.0/8              md5

For V2, only the default postgres configuration is required. For example:

local   all             all                                     peer
local   all             all                                     peer
host    all             all             127.0.0.1/32            md5
host    all             all             ::1/128                 md5

Check the entries in your own instance (in this example, 9.4) using:

sudo grep -v '#' /etc/postgresql/9.4/main/pg_hba.conf

Restart postgres

For these changes to take effect, postgres must be restarted:

sudo service postgresql restart

Support for a V1 archive

After the 2017.10 release of LAVA, V1 jobs will no longer be supported. Beyond that point, some admins might want to keep an archive of their old V1 test data to allow their users to continue accessing it.

The recommended way to do that is to create a read-only archive instance for that test data, alongside the main working LAVA instance. Take a backup of the test data in the main instance, then restore it into the new archive instance.

To set up an archive instance:

  • Configure a machine to run Debian 9 (Stretch) or 8 (Jessie), which are the supported targets for LAVA 2017.10.

    Note

    Remember that rendering the V1 test data can still be very resource-heavy, so be careful not to configure an archive instance on a server or virtual machine that’s too small for the expected level of load.

  • Restore a backup of the database and /etc/lava-server/instance.conf on a clean installation of lava-server. Do not be tempted to optimise or delete data from this backup; this is completely unnecessary and may cause the deletion of V1 test data from the archive.

  • Make changes in the django admin interface:

    • First, disable all the configured workers - the archive instance will not be running any test jobs. These workers will only exist in the restored database and will have no relevance to the archived test data.

    • Remove permissions from all users except a few admins - this will stop people from attempting to modify any of the test data.

    • Retire all devices. This will prevent new V2 submissions being accepted whilst allowing the archive to present the V1 test data.

      Warning

      Do not simply delete the database objects for the devices - this may cause problems.

  • Make changes in /etc/lava-server/settings.conf (JSON syntax):

    • Set the ARCHIVED flag to True.
    • Add text in the BRANDING_MESSAGE (which will show on your LAVA instance home page) to inform users that this is an archived instance.
  • Install lava-server 2017.10 from the Archive repository, and ensure that the archive instance will not upgrade past that version using apt-mark hold. It’s also a good plan to stop any upgrades to lava-server’s direct dependencies python-django and python-django-tables2:

    $ sudo apt-mark hold lava-server python-django python-django-tables2
    

    This step is important for your archived data! Later releases will deliberately remove access to the test data which is meant to be preserved in this archive.

  • lava-server 2017.10 will make the dashboard objects read-only; new Filters, Image Reports and Image Reports 2.0 cannot be created and existing ones cannot be modified.

Important

The support for an archive of V1 test data will be removed in 2017.11, so be very careful of what versions are installed. 2017.11 will include more invasive changes to make V1 test data invisible - be very careful not to upgrade to that version if that data matters to you.