Google Data Enginner

Overview

This course consists of three modules:
    • Participants will be able to design, build, operationalize, secure & monitor data processing systems with an emphasis on security & compliance; scalability & efficiency; reliability & fidelity
    • Participants will be able to leverage, deploy and continuously train the pre-existing ML models
    • Participants will be able to ensure solution quality, design the data processing systems
Duration
4 Days

Pre-Requisites
Participants must have an understanding of how data works and what it can deliver for the organization

Course Outline

  • Theory, Practice, and Tests
  • Lab: Setting Up A GCP Account
  • Lab: Using the Cloud Shell
    • About this section
    • Compute Options
    • Google Compute Engine (GCE)
    • Practical’s: 
    • Lab: Creating a VM Instance
    • Editing a VM Instance
    • Creating a Virtual machine Instance Using the Command Line
    • Creating and Attaching a Persistent Disk
  • More GKE
  • Creating Kubernetes Cluster 
  • Deploying a WordPress Container
  • App Engine
  • Contrasting App Engine, Compute Engine & also Container Engine
  • Lab: Deploy and Run an Engine App
  • Storage Options
  • Quick Take
  • Cloud Storage
    • Lab: Working with Cloud Storage Buckets
    • Bucket and Object Permissions
    • Life cycle Management on Buckets
  • Fix for AccessDeniedException: 403 Insufficient Permission
    • Lab: Running A Program on a Virtual Machine Instance and Storing Results on Cloud Storage
  • Live Migration
  • Machine Types and Billing
  • Sustained Use and Committed Use Discounts
  • Rightsizing Recommendations
  • RAM Disk
  • Images
  • Startup Scripts and Baked Images
  • VPCs And Subnets
  • Global VPCs, Regional Subnets
  • IP Addresses
  • Lab: Working with Static IP Addresses
  • Routes
  • Firewall Rules
  • Practical’s: 
    • Lab: Working with Firewalls
    • Working with Auto Mode and Custom Mode Networks
    • Bastion Host
  • Lab: Working with Cloud VPN
  • Cloud Router
  • Lab: participants will learn Using Cloud Routers for Dynamic Routing
  • Dedicated Interconnect Direct & Carrier Peering
  • Shared-VPCs
  • Lab: Shared VPCs
  • VPC: Network Peering
  • Lab: VPC Peering
  • Cloud-DNS & Legacy Networks
  • Networking
  • Managed and Unmanaged Instance Groups
  • Types of Load Balancing
  • Overview of HTTP(S) Load Balancing
  • Forwarding Rules, Url Maps & Target Proxy 
  • Preview
  • Backend Service & Backends
  • Load Distribution & Firewall Rules
    • Lab: HTTP(S)-Load Balancing
    • Content-Based Load Balancing
  • SSL Proxy & TCP Proxy Load Balancing
    • Lab: SSL Proxy Load Balancing
  • Network Load Balancing
  • Internal Load Balancing
  • Autoscalers
    • Lab: Autoscaling with Managed Instance Groups

 

Ops & Security

  • Stack Driver
  • Stack Driver Logging
    • Stack driver Resource Monitoring
    • Stack driver Error Reporting & Debugging
  • Lab: Using-Deployment-Manager
  • Deployment Manager & Stackdriver
  • Cloud IAM: User accounts, Service accounts and API Credentials
  • Cloud IAM: Roles, Identity Aware Proxy, Best Practices
    • Lab: Cloud-IAM
  • Operations and Security
  • Practicals Migrating Data Using the Transfer Service gcloud init
  • Lab: Cloud Storage Versioning, Directory Sync
  • Cloud SQL
    • Lab: Creating A Cloud SQL Instance
    • Running Commands on Cloud SQL Instance
    • Bulk Loading of the Data into Cloud SQL Tables
  • More Cloud Spanner
  • Lab: Working with Cloud Spanner
  • BigTable Intro
  • Columnar Store
  • Denormalised
  • Column Families
  • BigTable Performance
  • Getting the HBase Prompt
  • Lab: BigTable demo
  • Datastore
  • Lab: Datastore demo
  • Data Flow Intro
  • Apache-Beam
    • Lab: Running A Python Data Flow Program
    • Running A Java Data Flow Program
    • Implementing Word Count in Dataflow Java
    • Executing the Word Count Dataflow
    • Executing MapReduce in Data-flow In Python
    • Executing MapReduce in Data-flow In Java
  • Data Proc
    • Lab: Creating & Managing A Dataproc Cluster
    • Creating A Firewall Rule to Access Dataproc
    • Running A PySpark Job on Dataproc
    • Running PySpark REPL Shell & Pig Scripts on Dataproc
    • Submitting A Spark Jar to Dataproc
    • Working with the Dataproc Using GCloud CLI
  • Pub Sub
    • Work with Pubsub on the Command Line
    • Work with PubSub by Using the Web Console
    • Setting Up a Pubsub Publisher by using the Python Library
    • Setting Up a Pubsub Subscriber by Using the Python Library
    • Publishing Streaming Data into Pubsub
    • Reading the Streaming Data from PubSub and Writing it to BigQuery
    • Executing A Pipeline to Read Streaming Data & also Write to BigQuery
    • Pubsub Source BigQuery Sink
  • Data Lab
    • Creating and Working on Datalab Instance
    • Importing and Exporting Data Using Datalab
    • Using the Charting API In Datalab
  • Taxicab Prediction by Setting up the dataset
  • Taxicab Prediction by Training and Running the model
  • The Vision is to Translate, NLP and Speech API
  • The Vision API for Label and also the Landmark Detection
  • Introducing the Hadoop Ecosystem
  • Hadoop
  • HDFS
  • MapReduce
  • Yarn
  • Hive
  • Hive vs. RDBMS
  • HQL vs. SQL
  • OLAP in Hive
  • Windowing Hive
  • Pig
  • Spark
  • Streams Intro
  • Microbatches
  • Window Types
  • Hadoop Ecosystem
  • Introduction
  • Theory, Practice, and Tests
  • Lab: Setting Up A GCP Account
  • Lab: Using the Cloud Shell