EGI Infrastructure operations oversight activity is provided by:
- EGI Foundation Operations team
- Regional Operators on Duty (ROD) teams
Oversight activity over the NGI infrastructures is needed for detecting problems, coordinating the diagnosis, and monitoring the problems during the entire lifecycle until resolution. Oversight of the NGI is based on monitoring of status of services operated by sites, opening of tickets and their follow up for problem resolution. EGI.org supports and actively controls the overall status of services and sites, opening of tickets for requesting problem fixing, and tackling of residual problems not successfully distributed to NGI’s.
EGI Foundation Operations team
EGI.eu Operations team is the central team responsible for EGI Production Infrastructure. It is also responsible to provide:
- Coordination of activities with the Operations Management Board and the User Community Board.
- Central Technical Support to site administrators, NGI operators and new user communities. This includes
- technical support to the EGI Foundation operations activities
- technical support to ROD teams through target training activities
- coordination of technical working groups
- technical support to new resource centres in their certification phase when requested by the Operations Centre because of lack of sufficient local expertise
- certification and technical support for new infrastructures being integrated by providing assistance and training about EGI operations services, policies and procedures, and developing documentation as needed
- Resource Allocation
- defining service management processes for resource allocation and other EGI.eu operations services
- training, communicating, adapting, enforcing these at an NGI level
- defining requirements for the operations tools that generate from the provisioning of these new services
EGI Foundation Operator on Duty (OD)
EGI Foundation Operator on Duty (OD) is a person in the central EGI Foundation Operations team primarily responsible for responding to tickets. This duty is rotated among the team using a rota and ensures that everyone in the team is given hands-on experience dealing with tickets and coordinating operations activities.
Duties at the start of the week
- Closing the ticket opened during the previous week, checking status and if there are some pending actions.
- Creating a new ticket (type: Weekly report) under the SDIS project to log the week's activities
- The summary must be: Weekly report about OD work - YYYY-MM-DD
- Due date: following Monday to allow to gather feedback on events that could happen over the week end
Duties during the week
- Ensuring a timely response to emails sent to: email@example.com
- Managing Security Vulnerability Handling tickets
- Managing Changes
- The following Jira board can be used to get an overview of the open tickets: Overview of Open CHM tickets
- Checking for new tickets in the EGI CHM Jira queue, especially changes that need the CAB to be convened urgently, and if so, inform the CHM Manager
- Checking for new tickets in the EOSC-hub CHM Jira queue especially for changes to EGI services that may have impact on other services. Be prepared to be engaged with the EOSC-hub CHM CAB to discuss such changes.
- Verifying the status of GGUS tickets
- Note that to be able to edit tickets, you need "GGUS supporter" role. See GGUS registration
- Review tickets marked as URGENT or have not been attended to in a timely manner.
- Ensuring that there are no tickets created more than 3 years ago in the system; those tickets are very likely unattended and need to be addressed properly; use this search to get very old tickets ordered by creation date (oldest first)
- in ISRM process there are two metrics (one for incidents older than 1 year, and one for service requests older than 3 years) that ideally should be zero. They are collected every 4 months and a report listing the tickets still in open status is created in the reports section of ISRM: follow-up the tickets in the list.
- Checking status of and managing ongoing tasks
- Within SDIS Jira project
- Check tasks that are overdue, make them progress when possible or ping people to get a status update, and update their due date.
- check tasks that have no due date, add one if possible
- check tasks that are not assigned to anybody and see if they can be worked on or should be closed
- when going through GGUS tickets or other activities it could be useful to create a new task in Ops so that Operators can discuss the task management privately
- When a tasks is being handled it should be moved to the In progress status, and once finished marked as Done.
- Checking status of and managing tasks within Ops support Asana board
- Tasks should be closed or moved to Jira.
- Within SDIS Jira project
Duties at the end of the week
- Review and update the status of the week in the report ticket created under the SDIS project
- Re-assign task to next OD
Regional Operators on Duty (ROD)
Regional Operators on Duty (ROD) is a team responsible for solving problems on the infrastructure within NGI according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each NGI and requires procedural knowledge on the process (rather than technical skills) for their work. Depending on how an NGI is organized there might be a number of members in the ROD team who work on duty roster (shifts on a daily or weekly basis), or there may be one person working as ROD on a daily basis and a few deputies who take over the responsibilities when necessary. This latter model is generally more suitable for small NGIs.
- No labels