Courses/Python/Automation & Scripting for DevOps and System Tasks
    Back to Course

    Lesson 37 โ€ข Advanced

    Automation & Scripting for DevOps and System Tasks

    Master Python automation for infrastructure management, deployment pipelines, monitoring, backups, and production system orchestration

    What You'll Learn

    • File and directory automation for logs, backups, and cleanup
    • System command execution with subprocess
    • Task scheduling and cron alternatives
    • Server health monitoring and metrics collection
    • API and webhook automation for CI/CD
    • Docker container automation and management
    • Kubernetes deployment automation
    • Automated backups and data rotation
    • Log processing and real-time monitoring
    • Zero-downtime deployment scripts
    • Infrastructure-as-Code patterns
    • Building custom orchestration tools

    Why Python for DevOps Automation?

    Python has become the de facto standard for DevOps automation, replacing shell scripts with safer, more maintainable solutions.

    FeatureBash ScriptsPython Automation
    Error handlingCryptic exit codestry/except with clear messages
    Cross-platformLinux/Mac onlyWorks everywhere
    API integrationRequires curl hacksNative requests/boto3
    MaintainabilityHard to read at scaleClean, testable code

    Common Use Cases

    • Deployment automation and orchestration
    • Server provisioning and configuration
    • Backup and disaster recovery
    • Log aggregation and analysis
    • Monitoring and alerting
    • Infrastructure health checks
    • Secret rotation and security hardening

    File & Directory Automation

    Every DevOps workflow involves managing files: rotating logs, cleaning temporary data, synchronizing directories, and organizing backups.

    Common Tasks

    • Cleanup - Remove old temporary files and logs
    • Log rotation - Compress and archive logs when they exceed size limits
    • Directory sync - Keep backup directories in sync
    • Backup management - Create and rotate backups automatically
    • File monitoring - Watch for changes and trigger actions

    Real-World Example

    A production CI server runs a cleanup script every hour to remove build artifacts older than 7 days, preventing disk space exhaustion. This same pattern applies to log management, cache cleanup, and temporary file handling.

    System Command Execution

    The subprocess module provides safe, controlled execution of system commands with proper error handling and timeout management.

    Best Practices

    • Always use lists - ["ls", "-la"] not "ls -la"
    • Set timeouts - Prevent hanging on unresponsive commands
    • Capture output - Capture stdout/stderr for logging and debugging
    • Check return codes - Non-zero means failure
    • Avoid shell=True - Prevents injection attacks

    Common Operations

    • Restarting systemd services
    • Checking service status
    • Running Docker and Kubernetes commands
    • Executing build and deployment scripts
    • Managing SSH connections

    Task Scheduling

    Modern DevOps requires more intelligent scheduling than traditional cron. Python provides flexible alternatives.

    ToolBest ForComplexity
    CronSimple, one-off scriptsLow
    APSchedulerIn-process schedulingMedium
    Celery BeatDistributed, high-volumeHigh

    Scheduling Options

    Traditional Cron

    0 2 * * * /usr/bin/python3 /scripts/backup.py

    Simple but limited

    APScheduler (Python)

    More powerful: retry on failure, parallel execution, event-based triggers, state management

    Celery Beat

    Distributed task queue with advanced scheduling capabilities

    Typical Scheduled Tasks

    • Daily database backups at 2 AM
    • Log rotation every 6 hours
    • Health checks every 5 minutes
    • Cleanup scripts at midnight
    • Certificate renewal checks weekly

    Server Health Monitoring

    Proactive monitoring prevents outages. Python can track system resources and alert teams before problems escalate.

    Metrics to Monitor

    • CPU usage - Alert on sustained high usage
    • Memory consumption - Prevent OOM kills
    • Disk space - Alert before running out
    • Network I/O - Detect unusual traffic patterns
    • Process health - Ensure critical services are running
    • System uptime - Track stability

    The psutil Library

    psutil is the standard for cross-platform system monitoring in Python:

    pip install psutil

    Provides CPU, memory, disk, network, and process information on Linux, macOS, and Windows.

    API & Webhook Automation

    Modern infrastructure is API-driven. Python integrates seamlessly with CI/CD systems, monitoring tools, and cloud platforms.

    Common Integrations

    • CI/CD triggers - GitHub Actions, GitLab CI, Jenkins
    • Alerting - Slack, PagerDuty, Discord webhooks
    • Monitoring - Datadog, Prometheus, Grafana APIs
    • Cloud providers - AWS, GCP, Azure management APIs
    • Container registries - Docker Hub, ECR, GCR

    Automation Patterns

    Event-driven deployment

    Git push โ†’ trigger pipeline โ†’ deploy

    Automated alerting

    High CPU โ†’ send Slack alert โ†’ scale infrastructure

    Self-healing systems

    Service down โ†’ restart automatically โ†’ notify team

    Docker Automation

    The Docker Python SDK enables comprehensive container lifecycle management from within Python scripts.

    Installation

    pip install docker

    Automation Tasks

    • Cleanup - Remove stopped containers and dangling images
    • Health checks - Monitor container health status
    • Auto-restart - Restart unhealthy containers
    • Log collection - Aggregate logs from all containers
    • Image management - Build, tag, and push images
    • Resource limits - Monitor and enforce CPU/memory limits

    Production Use Case

    A maintenance script runs nightly to clean up stopped containers and dangling images, preventing disk space issues. It also restarts any containers marked as unhealthy by Docker's health checks.

    Kubernetes Automation

    The Kubernetes Python client allows programmatic cluster management, enabling GitOps-style automation.

    Installation

    pip install kubernetes

    Automation Capabilities

    • Deployment management - Scale, update, rollback deployments
    • Pod operations - List, inspect, delete pods
    • ConfigMap/Secret updates - Rotate configurations safely
    • Health monitoring - Check pod and node health
    • Auto-scaling - Adjust replicas based on metrics
    • Resource cleanup - Remove completed jobs and old pods

    Advanced Patterns

    โ€ข Blue/green deployments - maintain two production environments

    โ€ข Canary releases - gradually roll out changes to a subset of users

    โ€ข Automatic rollback - revert on health check failure

    โ€ข Multi-cluster management - orchestrate across regions

    Backup Automation & Data Rotation

    Regular, automated backups are essential for disaster recovery. Python orchestrates the entire backup lifecycle.

    Backup Strategy

    • Database dumps - MySQL, PostgreSQL, MongoDB
    • File system backups - Compress and archive directories
    • Cloud sync - Upload to S3, Google Cloud Storage, Azure Blob
    • Rotation policy - Keep daily (7 days), weekly (4 weeks), monthly (12 months)
    • Verification - Test restore capability periodically
    • Encryption - Encrypt backups before storage

    3-2-1 Backup Rule

    3 copies of data โ€ข 2 different media types โ€ข 1 offsite copy

    Python scripts can implement this automatically: local disk, network storage, cloud backup.

    Log Processing & Automated Alerts

    Logs contain critical information about system health, security events, and errors. Automated analysis prevents issues from going unnoticed.

    Log Analysis Tasks

    • Error detection - Count and categorize errors
    • Pattern matching - Find security threats or anomalies
    • Performance analysis - Identify slow queries and requests
    • Real-time monitoring - Tail logs and alert immediately
    • Aggregation - Combine logs from multiple services
    • Visualization - Generate reports and dashboards

    Alert Triggers

    โ€ข Error threshold - Alert when error rate exceeds 1%

    โ€ข Security events - Failed login attempts, suspicious patterns

    โ€ข Performance degradation - Response time above threshold

    โ€ข Service crashes - Application or container restarts

    Zero-Downtime Deployment

    Production deployments must minimize or eliminate downtime. Python orchestrates sophisticated deployment strategies.

    Deployment Pipeline

    1. Pull latest code from Git

    2. Run test suite - abort on failure

    3. Build Docker image

    4. Push to container registry

    5. Update Kubernetes deployment

    6. Wait for health checks to pass

    7. Rollback automatically if unhealthy

    8. Send deployment notification

    Safety Mechanisms

    • Rolling updates - replace pods gradually
    • Health checks - verify each new pod before proceeding
    • Automatic rollback on failure
    • Smoke tests after deployment
    • Traffic shifting strategies

    Security Considerations

    Automation scripts often run with elevated privileges. Security must be a top priority.

    โœ“ Best Practices

    • Store secrets in environment variables or secret managers
    • Never hardcode passwords, API keys, or tokens
    • Use least-privilege principles for automation accounts
    • Validate all inputs before executing system commands
    • Avoid shell=True in subprocess calls
    • Keep logs free of sensitive information
    • Implement audit trails for all automation actions
    • Use encrypted connections for remote operations

    โœ— Security Anti-Patterns

    • Hardcoding credentials in scripts
    • Running automation as root unnecessarily
    • Accepting user input without validation
    • Logging sensitive data
    • Storing backups without encryption
    • Ignoring certificate validation

    Building Production-Ready Automation

    Professional automation systems require more than working code. They need reliability, observability, and maintainability.

    Essential Components

    • Logging - Comprehensive, structured logs with context
    • Error handling - Graceful failure and recovery
    • Monitoring - Track automation success/failure rates
    • Documentation - Clear runbooks and troubleshooting guides
    • Testing - Unit and integration tests for automation logic
    • Version control - Git for all automation scripts
    • Idempotency - Scripts can run multiple times safely

    The DevOps Loop

    Write automation โ†’ Test thoroughly โ†’ Deploy โ†’ Monitor โ†’ Learn from failures โ†’ Improve โ†’ Repeat

    Every automation failure is an opportunity to make the system more resilient.

    Complete DevOps Automation Examples

    Explore comprehensive automation scripts for file management, system monitoring, Docker, Kubernetes, and more

    Try it Yourself ยป
    Python
    # DevOps Automation & Scripting Examples
    
    # ============================================
    # 1. FILE & DIRECTORY AUTOMATION
    # ============================================
    
    import os
    import shutil
    import time
    from pathlib import Path
    from typing import List
    
    def cleanup_old_files(directory: str, days: int = 7):
        """Remove files older than specified days"""
        cutoff_time = time.time() - (days * 86400)
        
        for file_path in Path(directory).rglob('*'):
            if file_path.is_file() and file_
    ...

    Key Takeaways

    • Python is the industry standard for DevOps automation due to cross-platform support and rich libraries
    • Automate repetitive tasks: file cleanup, backups, deployments, monitoring
    • Use subprocess safely with timeouts and proper error handling
    • Modern scheduling goes beyond cron - build intelligent task runners
    • Monitor system health proactively with psutil and automated alerts
    • Docker and Kubernetes Python SDKs enable comprehensive container orchestration
    • Implement zero-downtime deployments with health checks and automatic rollback
    • Security is critical - never hardcode secrets, validate inputs, use least privilege
    • Production automation requires logging, monitoring, testing, and documentation
    • Build self-healing systems that detect and correct issues automatically

    ๐Ÿ“‹ Quick Reference โ€” DevOps Automation

    Tool / ModuleWhat it does
    pathlib.PathModern file and directory manipulation
    subprocess.run(cmd, check=True)Run shell commands from Python
    shutil.copy2 / shutil.rmtreeHigh-level file operations
    docker SDKManage Docker containers from Python
    psutilMonitor CPU, memory, and processes

    ๐ŸŽ‰ Great work! You've completed this lesson.

    You can now automate deployments, manage infrastructure, and build self-healing systems using Python's DevOps toolkit.

    Up next: Language Integration โ€” call C and Rust code from Python for maximum performance.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy Policy โ€ข Terms of Service