Configuration Server is Single Point of Failure, causing lots of freezes [was #7518]
When the configuration server is down or reacts only slowly (like it was on Beta Thursday), many actions in the lab cause freezes and nothing seems to work for the user, without any useful feedback for the user. It looks like the Lab is completely broken.
One of the problems is the way the conf client currently works: When a single value is requested and the value has not been requested yet by the users, it requests (only) this value from the web service. The value request often happens from the main thread, and it often cannot easily be delegated to the background as the requesting operation depends on the value.
As conf values are a small set of data, change rather seldom but are requested rather often, I suggest the following implementation:
- The lab requests all configuration variables with a single request and saves the results. Only on a force (e.g. when the request to an EPR retrieved via the Conf service fails) stuff is re-requested.
- On a user's first lab startup, a background job is started which retrieves all variables from the EPR and initializes the variables correspondingly. The background job should use the official Eclipse APIs and provide feedback to the users.
- The values retrieved by the background job are saved to the local disk using one of Eclipse's preferences mechanisms. When the lab starts the next time, it first reads the locally stored variables and tries to contact the configuration server only after that. Thus, if the user requests a service which needs a configuration variable and the conf service could not yet be reached, the lab does not freeze but simply tries the value from disk.
For the remaining case (no variables initialized yet, value is being retrieved), we should evaluate a method of providing feedback ("Contacting TextGridRep ...") to the users (without forcing every Conf user to implement this in their own code).