learn phase 1 session 1 Handout

Session 1: What is DevOps & Why It Matters


The Old World: Dev vs Ops

In traditional software companies, Development and Operations were completely separate teams with conflicting goals:

DevelopmentOperations
Ship features fastKeep systems stable
Move fast, break thingsDon’t touch what’s working
Measured by feature deliveryMeasured by uptime

The result? A wall between teams. Developers wrote code and threw it over to Ops. Ops had never seen the code. Deployments broke. Blame games followed. Customers waited.


What is DevOps?

DevOps is a culture and set of practices that brings Development and Operations together. It is NOT just a tool or a job title.

The goal: Deliver software faster AND more reliably - not one or the other, both.

The core idea: The team that builds the software is also responsible for running it in production. You build it, you run it.

The DevOps Lifecycle

Plan → Code → Build → Test → Release → Deploy → Operate → Monitor
  ↑                                                          |
  └──────────────────────────────────────────────────────────┘

This is a continuous loop, not a one-way street.


The CALMS Framework

CALMS captures what DevOps really means:

Culture

  • Shared responsibility between Dev and Ops
  • No more “that’s not my job”
  • Blameless environment - when things break, we learn together

Automation

  • If you do something manually more than twice, automate it
  • Deployments, testing, infrastructure setup - all automated
  • Reduces human error, increases speed

Lean

  • Work in small batches
  • Instead of releasing once a quarter with 500 changes, release daily with 5 changes
  • Small changes are easier to debug when something breaks

Measurement

  • You can’t improve what you don’t measure
  • Track: deployment frequency, lead time, failure rate, recovery time
  • Data-driven decisions, not gut feelings

Sharing

  • No knowledge silos
  • Documentation, shared runbooks
  • When something breaks, we share the learning (postmortems)

DevOps vs SRE vs Platform Engineering

These three terms are related but different:

DevOps

  • A cultural movement and philosophy
  • Says: “Dev and Ops should work together”
  • Broad principles and practices

SRE (Site Reliability Engineering)

  • Invented at Google
  • A specific discipline with concrete practices
  • Applies software engineering to operations problems
  • Key concepts: SLOs, error budgets, toil reduction
  • “What happens when you ask a software engineer to design an operations function”

Platform Engineering

  • Building an internal self-service platform for developers
  • Developers don’t need to understand every detail of infrastructure
  • They push code, the platform handles the rest

How They Relate

The Question It Answers
DevOpsHow should Dev and Ops work together? (the culture)
SREHow do we keep systems reliable? (the practice)
Platform EngineeringWhat do we build to scale this? (the product)

They don’t compete - they complement each other.


Key DevOps Principles

1. Automate Everything

  • Manual processes are slow, error-prone, and don’t scale
  • Automate builds, tests, deployments, infrastructure

2. Continuous Improvement

  • Always look for ways to improve
  • Measure → Identify bottleneck → Fix → Repeat

3. Fail Fast, Learn Fast

  • Small changes = small failures = easy to fix
  • Failures are learning opportunities, not blame opportunities

4. Infrastructure as Code

  • Treat infrastructure the same as application code
  • Version controlled, reviewed, tested, reproducible

5. Monitoring and Feedback

  • Know the state of your systems at all times
  • Fast feedback loops for developers

The DevOps Toolchain Overview

This is what we’ll cover in upcoming sessions:

StageToolsSession
Version ControlGit, GitHubSession 2
CI/CDJenkins, GitHub ActionsSession 3
Infrastructure as CodeTerraform, AnsibleSession 4
ContainersDockerSession 5
OrchestrationKubernetesSession 6
MonitoringPrometheus, GrafanaPhase 2

Real-World Example: How DevOps Changes Everything

Without DevOps

  1. Developer writes code for 3 weeks
  2. Sends it to QA team - they test for 1 week, find 20 bugs
  3. Developer fixes bugs for another week
  4. Sends to Ops team for deployment
  5. Ops deploys on a Saturday night (maintenance window)
  6. Deployment fails - wrong config, missing dependency
  7. Rollback. Start over.
  8. Total time: 6+ weeks for one release

With DevOps

  1. Developer writes a small change (few hours of work)
  2. Pushes code → automated tests run immediately
  3. Tests pass → automatically deployed to staging
  4. Quick review → deployed to production
  5. Monitoring confirms everything is healthy
  6. Total time: same day

The Numbers (Industry Benchmarks)

MetricTraditionalDevOps (Elite)
Deployment frequencyOnce per monthMultiple times per day
Lead time for changes1-6 monthsLess than 1 hour
Change failure rate46-60%0-15%
Recovery time1 week - 1 monthLess than 1 hour

Source: DORA (DevOps Research and Assessment) State of DevOps Reports


Toil: The Enemy of Productivity

Toil is work that is:

  • Manual
  • Repetitive
  • Automatable
  • Reactive (not proactive)
  • No lasting value
  • Scales linearly with growth

Examples of Toil

ToilAutomated Solution
Manually restarting crashed servicesAuto-restart with health checks
Resizing disks when fullAuto-scaling storage
Rotating passwords every quarterAutomated secret rotation
Manually scaling servers before sales eventsAuto-scaling policies
Running database backups every nightManaged automated backups
Checking if services are healthyAutomated monitoring + alerting

The rule: Keep toil below 50% of your time. If you spend more than half your time on toil, you can’t do the engineering work that eliminates future toil.


Platform Engineering: The Golden Path

Instead of every team figuring out deployments on their own:

Without a Platform:

  • Team A uses shell scripts to deploy
  • Team B uses Terraform
  • Team C clicks around in AWS console
  • Team D has no monitoring
  • Total chaos

With a Platform:

# Developer just writes this:
name: my-service
language: python
port: 8000
database: postgres
replicas: 3

The platform handles everything: CI/CD, containers, networking, monitoring, security - all built-in with company standards.

A good platform makes the right thing the easy thing.


Session 1 Key Takeaways

  1. DevOps is a culture first, tools second
  2. The CALMS framework: Culture, Automation, Lean, Measurement, Sharing
  3. DevOps, SRE, and Platform Engineering complement each other
  4. Small, frequent changes are safer than big, infrequent ones
  5. Automate repetitive work (toil) to focus on engineering
  6. You build it, you run it

Discussion Questions

Think about these before next session:

  1. What does the current dev-to-production flow look like at your workplace?
  2. Where are the biggest bottlenecks?
  3. What manual tasks do you do repeatedly that could be automated?
  4. How long does it take for a code change to reach production?

Next Session: Git in Practice - branching strategies, pull request workflows, and why Git is the foundation for everything in DevOps.