C3 Training Strategy

This is a comprehensive guide for the C3 team. It includes information about daily tasks, documents to follow and maintain, and standard operating procedures (SOPs) for effective monitoring and resolution.


Roles and Responsibilities

  1. Daily Monitoring Tasks:

    • Monitor Build Status:

      • Trigger builds based on tickets and update their status.
      • Investigate build failures:
        • If the issue is within your SOPs, resolve it and update the ticket.
        • If unresolved, escalate it to the developer or DevOps team with a detailed report.
      • Ensure build pipelines are operational and nightly builds complete without errors.
    • Check Website Status:

      • Ensure all websites and services are running smoothly.
      • Use the monitoring dashboard to verify uptime.
      • Record any problems in the Incident Log Sheet and create tickets if needed.
    • Check Domain Certificates:

      • Verify all domain certificates are valid and not close to expiring.
      • Note certificates approaching expiration for renewal.
    • Monitor Production Web Servers:

      • Use monitoring tools to review server health and performance metrics like CPU, memory, disk usage, and network traffic.
      • Ensure servers are online and responsive, addressing any downtime promptly.
    • Monitor Server and Application Health:

      • Check server metrics such as CPU usage, memory, disk space, and network activity.
      • Ensure applications are functioning without errors.
    • Report Issues:

      • Create a ticket in the Elitical for any identified issues.
      • Include details like issue description, affected systems, severity, and steps taken so far.
      • Assign the ticket to the appropriate team and follow up until resolved.
    • Provide Technical Support:

      • Respond to technical support calls or emails, logging details in the Technical Call Log Sheet.
      • Perform first-level troubleshooting or escalate complex issues as needed.
    • Complete Routine Operations:

      • Perform standard tasks listed in the Operations Checklist.
      • Document completed tasks in the Daily Operations Sheet.
    • Incident Escalation and Resolution:

      • Follow up on escalated issues using the Escalation Sheet.
      • Work with internal teams to resolve critical incidents and update stakeholders.
  2. Documentation and Reporting:

    • Maintain an Attendance Sheet to log shift entries.
    • Update the Monitoring Sheet with daily checks in real-time and review it at the end of each shift.
    • Generate reports for requested services and document resolutions for recurring problems.
  3. Observability and Alerting:

    • Use tools like Grafana or Prometheus to monitor alerts.
    • Follow the Observability Document for predefined resolutions.
    • Update the document when resolving new issues.
  4. Task Review and Analysis:

    • Analyze assigned tasks and use SOPs to resolve them.
    • Document the steps taken for future reference.

Standard Operating Procedures (SOPs)

  1. Alert Resolution:

    • Check monitoring tools for alert details.
    • Refer to the Observability Document for solutions.
    • If no solution exists, resolve the problem and document the steps.
  2. Daily Monitoring:

    • Focus on website uptime, server health, and SSL certificates.
    • Perform regular checks during shifts to prevent issues.
  3. Escalation Workflow:

    • If an issue is beyond your expertise:
      • Collect logs and details.
      • Escalate to the appropriate team with a clear report.
  4. Documentation Updates:

    • Regularly update attendance, monitoring, and observability documents.
    • Ensure all documents are easy to access and understand.

Tools and Resources

  • Monitoring Tools:

    • Grafana
    • Prometheus
  • Key Metrics to Monitor:

    • Build status: Ensure all builds run successfully without errors.
    • Website uptime: Confirm all websites are online and accessible.
    • Resource utilization: Monitor CPU, memory, and disk usage of the nodes.
  • Documentation:

    • Observability Document for Alert Resolutions
    • Daily Monitoring and Attendance Sheets

Final Notes

  • Consistency in monitoring and documentation helps avoid critical issues.
  • Update documents during and after shifts to maintain accurate records.
  • Regular training and reviews ensure the team stays updated on processes and tools.