Agile Infrastructure

IT operations

differentiator

enabler

faster feedback

more flexible

Infrastructure renaissance

you can change faster

you can change easier

developers and operations can work together

Hero culture

Heroism is a virtue

Patching on live production system 5 am

Bad mistakes

NO! Heroism is not good for operations

BECAUSE everyone keeps for granted you'll keep machines 24/7

Different environments

it works in test environment and not in production

you cannot keep in sync testing and production environment

dev environment should not be very different than production environment

Done is deployed

Techniques

Version Control

Network configurations

System configurations

Applications configurations

Application code

Database schema

Documentation

preferably executable

Anything that matters

Configuration Management

put systems into a known state

audit and enforce consistency

manage server lifecycle

reason about services, instead of systems

apply dev-test-prod cycle to infrastructure

Build from source

automated provisioning and deployment of services

roll config changes forward (dev-test-prod)

dev, test and prod not out of sync

no one is editing config files; they are automatically pulled from svn

test from a known state

setup process

scaling

building infrastructure is not a big manual process

speed of thought

disaster recovery

One step deploy

one automated process from version control to line services

one process for devs, testers and it operation; across all environments

computers are really goot at running the same commands over and over

you DO NOT want to have manual scripts

having people deploy manually is immoral

manual deployment is error prone

lower the fixed cost of deploy

Monitoring

what does 'normal' looks like?

Feedback

You have to know how 'green' looks like to know how 'red' looks like

don't just look at the data when things are bad

need baseline chart, trends

test driven?

Continuous Integration

test new builds

assert services are running

run functional tests

Deploy early and often

there should be no ritual

the ceremony is waterfall process

Tag everything

who?

what?

when?

synchronization - get all machines sync'd

Correlate

have the same power as with failed tests in TDD - know exactly why something is wrong

Information radiators

share metrics

dev and ops see the same thing, in the same place

helps two groups (dev and ops) to have a conversation

Share the repository

keep configs in sync with application code

everyone knows where too look

everyone sees everyone else working

minimize surprise

ops can see the work devs are doing

conversations early in the process

get rid of ceremony, pagers, antagonism, etc.

boundary objects

Always ship trunk

Configuration drift

inconsistencies between machines (2, 4, 8, 100?)

mistakes

confusion

changes are painful

Agile for development; Waterfall for deployment ?!

Communication between dev team and ops team is facilitated by ticket system

Operations are stakeholder!

Non-functional requirements

Your site cannot be down - you are loosing money!

The mystery machine

The machine in the corner that everyone is afraid to turn off, but no one why it's on

Infrastructure is code

API driven abstraction (cloud computing, etc.)

Infrastructure is application

Fail happens

Questions

Can you afford to be down?

How long?

How fast can you be back up?

Fail safe

Try not to cause it

Practice makes perfect

"Out of the window" test

Fire drills

Go and unplug your system :D

Try different failure cases

Be confident

Culture

There is only us

Learning and respect

Work together

Manage flow

Planning for fires is hard

The best way to fight fires is never let them get started

You are not so special you think you are