Session 1: What is DevOps & Why It Matters

The Old World: Dev vs Ops

In traditional software companies, Development and Operations were completely separate teams with conflicting goals:

Development	Operations
Ship features fast	Keep systems stable
Move fast, break things	Don’t touch what’s working
Measured by feature delivery	Measured by uptime

The result? A wall between teams. Developers wrote code and threw it over to Ops. Ops had never seen the code. Deployments broke. Blame games followed. Customers waited.

What is DevOps?

DevOps is a culture and set of practices that brings Development and Operations together. It is NOT just a tool or a job title.

The goal: Deliver software faster AND more reliably - not one or the other, both.

The core idea: The team that builds the software is also responsible for running it in production. You build it, you run it.

The DevOps Lifecycle

Plan → Code → Build → Test → Release → Deploy → Operate → Monitor
  ↑                                                          |
  └──────────────────────────────────────────────────────────┘

This is a continuous loop, not a one-way street.

The CALMS Framework

CALMS captures what DevOps really means:

Culture

Shared responsibility between Dev and Ops
No more “that’s not my job”
Blameless environment - when things break, we learn together

Automation

If you do something manually more than twice, automate it
Deployments, testing, infrastructure setup - all automated
Reduces human error, increases speed

Lean

Work in small batches
Instead of releasing once a quarter with 500 changes, release daily with 5 changes
Small changes are easier to debug when something breaks

Measurement

You can’t improve what you don’t measure
Track: deployment frequency, lead time, failure rate, recovery time
Data-driven decisions, not gut feelings

No knowledge silos
Documentation, shared runbooks
When something breaks, we share the learning (postmortems)

DevOps vs SRE vs Platform Engineering

These three terms are related but different:

DevOps

A cultural movement and philosophy
Says: “Dev and Ops should work together”
Broad principles and practices

SRE (Site Reliability Engineering)

Invented at Google
A specific discipline with concrete practices
Applies software engineering to operations problems
Key concepts: SLOs, error budgets, toil reduction
“What happens when you ask a software engineer to design an operations function”

Platform Engineering

Building an internal self-service platform for developers
Developers don’t need to understand every detail of infrastructure
They push code, the platform handles the rest

How They Relate

	The Question It Answers
DevOps	How should Dev and Ops work together? (the culture)
SRE	How do we keep systems reliable? (the practice)
Platform Engineering	What do we build to scale this? (the product)

They don’t compete - they complement each other.

Key DevOps Principles

1. Automate Everything

Manual processes are slow, error-prone, and don’t scale
Automate builds, tests, deployments, infrastructure

2. Continuous Improvement

Always look for ways to improve
Measure → Identify bottleneck → Fix → Repeat

3. Fail Fast, Learn Fast

Small changes = small failures = easy to fix
Failures are learning opportunities, not blame opportunities

4. Infrastructure as Code

Treat infrastructure the same as application code
Version controlled, reviewed, tested, reproducible

5. Monitoring and Feedback

Know the state of your systems at all times
Fast feedback loops for developers

The DevOps Toolchain Overview

This is what we’ll cover in upcoming sessions:

Stage	Tools	Session
Version Control	Git, GitHub	Session 2
CI/CD	Jenkins, GitHub Actions	Session 3
Infrastructure as Code	Terraform, Ansible	Session 4
Containers	Docker	Session 5
Orchestration	Kubernetes	Session 6
Monitoring	Prometheus, Grafana	Phase 2

Real-World Example: How DevOps Changes Everything

Without DevOps

Developer writes code for 3 weeks
Sends it to QA team - they test for 1 week, find 20 bugs
Developer fixes bugs for another week
Sends to Ops team for deployment
Ops deploys on a Saturday night (maintenance window)
Deployment fails - wrong config, missing dependency
Rollback. Start over.
Total time: 6+ weeks for one release

With DevOps

Developer writes a small change (few hours of work)
Pushes code → automated tests run immediately
Tests pass → automatically deployed to staging
Quick review → deployed to production
Monitoring confirms everything is healthy
Total time: same day

The Numbers (Industry Benchmarks)

Metric	Traditional	DevOps (Elite)
Deployment frequency	Once per month	Multiple times per day
Lead time for changes	1-6 months	Less than 1 hour
Change failure rate	46-60%	0-15%
Recovery time	1 week - 1 month	Less than 1 hour

Source: DORA (DevOps Research and Assessment) State of DevOps Reports

Toil: The Enemy of Productivity

Toil is work that is:

Manual
Repetitive
Automatable
Reactive (not proactive)
No lasting value
Scales linearly with growth

Examples of Toil

Toil	Automated Solution
Manually restarting crashed services	Auto-restart with health checks
Resizing disks when full	Auto-scaling storage
Rotating passwords every quarter	Automated secret rotation
Manually scaling servers before sales events	Auto-scaling policies
Running database backups every night	Managed automated backups
Checking if services are healthy	Automated monitoring + alerting

The rule: Keep toil below 50% of your time. If you spend more than half your time on toil, you can’t do the engineering work that eliminates future toil.

Platform Engineering: The Golden Path

Instead of every team figuring out deployments on their own:

Without a Platform:

Team A uses shell scripts to deploy
Team B uses Terraform
Team C clicks around in AWS console
Team D has no monitoring
Total chaos

With a Platform:

# Developer just writes this:
name: my-service
language: python
port: 8000
database: postgres
replicas: 3

The platform handles everything: CI/CD, containers, networking, monitoring, security - all built-in with company standards.

A good platform makes the right thing the easy thing.

Session 1 Key Takeaways

DevOps is a culture first, tools second
The CALMS framework: Culture, Automation, Lean, Measurement, Sharing
DevOps, SRE, and Platform Engineering complement each other
Small, frequent changes are safer than big, infrequent ones
Automate repetitive work (toil) to focus on engineering
You build it, you run it

Discussion Questions

Think about these before next session:

What does the current dev-to-production flow look like at your workplace?
Where are the biggest bottlenecks?
What manual tasks do you do repeatedly that could be automated?
How long does it take for a code change to reach production?

Next Session: Git in Practice - branching strategies, pull request workflows, and why Git is the foundation for everything in DevOps.