Document control

AreaEGI Federation Operations
Procedure status

FINAL

OwnerAlessandro Paolini 
ApproversOperations Management Board
Approval status

APPROVED

Approved version and date

v5, 29th Oct 2015

Statement

This procedure documents the steps for requesting a correction in the SAM test results and in the related availability/reliability statistics.

Next procedure reviewon demand

Procedure reviews

The following table is updated after every review of this procedure.

DateReview bySummary of resultsFollow-up actions / Comments

 

Alessandro Paolini copy from PROC10_Recomputation_of_SAM_results_or_availability_reliability_statistics in EGI Wiki




Table of contents

Overview

This procedure documents the steps for requesting a correction in the OPS VO test results and in the related availability/reliability statistics if applicable.

Figures are available trough ARGO web ui.

DISCLAIMER: This procedure is only applicable to EGI OPS test results. Procedures for the computation of VO-specific availability report are VO-specific and are out of this scope.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Who can submit a request?

Re-computations can be requested by:

  • site administrators
  • regional operations staff.

Re-computation policy

Starting from the 01 May 2012:

  • monitoring results can be recomputed only in the case of problems with the monitoring infrastructure itself.
  • No re-computations will be performed in case of issues with the deployed middleware (e.g. in case of documented bugs affecting the availability of a production service end-point), which will be consequently reflected in lower availability/reliability.

Some examples of possible issues justifying a re-computation request:

  • invalid proxy certificate used for submitting the monitoring probes in a Nagios instance;
  • problems with the Storage Element used for replica management tests resulting in errors on CE's metrics.


The deadline: 10 calendar days after the publication and announcement of the monthly Availability/Reliability reports for a given month X (typically the announcement will be distributed on the 1st day of month X+1).

According to the re-computation requests received, A/R reports will be regenerated only once for each month, after the 10th of month X+1.

Steps

Step# ResponsibleActionPrerequisites, if any
1Site administrator / ROD team

As soon as the problem is detected, please fill in the form in the re-computation page (login needed):

Provide an explanation for the request.

The submission of the form will inform the ARGO team and your request will be in pending status.


2ARGO team

Member of the staff validates the request.

You will be informed about the confirmation / rejection of the request by email .


3ARGO team

If the request is accepted - the recomputation will be triggered as soon as possible .

The status of the recomputation will be visible trough a web page (link given in the email of the previous step )