前言


知识点

1. AWS Cloud Overview

AWS Regions

  • A Region is a cluster of data centers

  • To reuse SSH keys in AWS Region
    • Generate a public SSH key from the private SSH key. Import the key to each AWS Region

AWS Availability Zone

  • 一个 Region 可以有 3 - 6 个 AZ, 每个 AZ 都是分开的


AWS Edge Locations

  • Deliver content to user with low latency (离用户越近, deliver 速度越快)


2. IAM

AWS IAM

  • Root Account: Created by default, don’t share with others
  • Users: People within the organization
  • Groups: Only contain Users, not other Groups (User 可以属于多个 Group)

  • AWS 需要至少 5 weeks 来生成 budget forecasts
  • 注意, User 默认无法 access AWS Billing and Cost Management console, 必须要 grant access 才行

IAM Permissions

  • 定义了 User 或者 Group 的 Permissions
    • 要遵守 Least Privilege Principle

As an AWS security best practice, you should not create an IAM user and pass the user’s credentials to the application or embed the credentials in the application


IAM Polices

  • Statements 里面必须包含: Effect, Principal, Action, Resource
    • Inline Policy 是 assign 给个人的
  • 注意, The only resource-based policy IAM support is Trust policy

IAM policies define permissions for an action regardless of the method that used to perform the operation.


IAM MFA

  • Protect Root Accounts and IAM Users
  • MFA options: Virtual MFA Device, U2F Sercurity Key (USB)


IAM CLI & SDK

  • 可以用 AWS Management Console, AWS CLI, AWS SDK 来访问 AWS
    • Set the DeleteOnTermination attribute to False using the command line
    • 如果考试遇到不会的就选 CLI


IAM Roles

  • 授权 AWS 服务去做某些事
  • 常见的 Role: EC2 Instance Role, Lambda Function Role 等等
  • IAM Instance Role: 为 EC2 实例分配 temporary credentials,以便实例能够安全地访问其他 AWS 服务

An IAM entity that defines a set of permissions for making requests to AWS services, and will be used by an AWS service


IAM Security Tools

  • IAM Credentials Report: 返回用户数据和 credentials
  • IAM Access Advisor: 显示当前用户被授权的服务和使用时间


3. EC2 Fundamentals

AWS EC2

  • EC2 = Elastic Compute Cloud = IaaS , 绑定 AZ
  • SSH 到 EC2 Instance 的 Public IP (记住 169, meta-data)
    • http://169.254.169.254/latest/meta-data/public-ipv4

  • User Data: EC2 启动的时候运行的代码
    • By default, user data runs only during boot cycle when first lauch instance
    • By default, scripts entered as user data are executed with root user privileges

  • 如果要 monitor 一个 EC2 Instance, aws ec2 monitor-instances --instance-ids {id}
  • 如果要解决 EC2 based web servers in high CPU utilization issue
    • Configure an SSL/TLS certificate on an Application Load Balancer via AWS Certificate Manager (ACM)
    • Create an HTTPS listener on the Application Load Balancer with SSL termination

EC2 Instance Type

  • 一共 7 种 EC2 Instance Type, 但 4 种用的最多
    • General Purpose: Balanced (平衡的)
    • Compute Optimized: High performance (处理 Batch, 视频和 HPC)
    • Memory Optimized: Process data in memory (处理 cache)
    • Storage Optimized: Read and write data on local storage (OLTP)


EC2 Security Groups

  • Control how traffic is allowed in and out of EC2 instance
    • Security groups only contain allow rules (EC2 出问题优先检查 Security Group)
  • 可以作为 Security Group 的 Inbound Rule 有 IP address, CIDR block, Security Group



EC2 Instance Purchasing Options

  • On-Demand Intances: Short workload, pay by second (短期内紧急使用)
  • Reserved: 1 - 3 years (72% discount, reserve capacity in an AZ)
    • Reserved Instance: Long workload (1 - 3 年, discount 随年份增加)
    • Convertible Reserved Instances (RI): Long workload with flexible instance (66% discount, 但是可以自由改变 Instance Type)
  • Saving Plans: 1 - 3 years, commitment to an amount of usage (72% discount, 要定一个使用上限, 超过上限就变成 On-Demand)
    • 还有 Compute Saving Plan, 66% discount, work with Lambda and Fargate
  • Spot Instance: Short workload, can lose instance (最便宜, 90% discount)
  • Dedicated Host: The physical server is yours (最贵, 可以控制 how Instance placed)
  • Dedicated Instance: Run hardware, but not lockdown to you (服务器大部分是你的)
  • Capacity Reservation: Reserve capacity in AZ (比如一天中只有几个小时需要 EC2)

  • Reserved Instance 有 Regional 和 Zonal, Zonal 提供 capacity reservation, Regional 不提供

4. EC2 Instance Storage

AWS EBS

  • Block-level storage, 类似 network drive, 绑定 AZ, provisioned capacity
    • 可以从一个 EC2 Instance 取下来装到另一个上
    • 记住 EBS 是绑定 AZ 的, 很重要

  • EBS - Delete on Termination attribute
    • By default, Root Volume will be deleted on termination (重要)
    • By default, other EBS volume will not be deleted on termination (重要)
    • Can be controled by AWS console / AWS CLI

  • 关于 EBS Encryption 有 2 个知识点
    • Encryption by default is a Region-specific setting. If you enable it for a Region, you cannot disable it for individual volumes or snapshots in that Region
    • A volume restored from an encrypted snapshot, or a copy of an encrypted snapshot is always encrypted

EBS Snapshots

  • Make a backup of EBS volume (备份 EBS)
    • 可以将这个 snapshot 用在其他 AZ 或者 Region 上

  • EBS Snapchot Archive: 便宜 75%
  • Recycle Bin: Setup rules to retain deleted snapshots (可以 recover)


AWS AMI

  • AMI: Amazon Machine Image, customization of an EC2 Instance
    • 可以更好的管理和启动 EC2 Instance
  • Built for a specific region and can copy across regions (AMI 跨域复制)
    • 注意, AMIs are built for a specific AWS Region

When the new AMI is copied from Region A into Region B, it automatically creates a snapshot in Region B (注意, 跨域 copy AMI 会产生 snapshot)



EC2 Instance Store

  • Block-level storage, 物理硬盘 (Physical drive, temporary storage, 可以处理数据库)
  • High random I/O performance, good for cache at low cost (重要)
  • EC2 Instance Store lose when stopped (Ephemeral, 数据在 Instance Store 关闭时消失)
    • 反正记住, Instance Store 不能复用, Instance Store 里的数据不会 preserve


EBS Volume Types

  • General Purpose (gp2 / gp3): Balanced, cost-effective
  • Provisioned IOPS (io1 / io2): High-performance, support Multi-Attach
    • Low latency, high throughput
    • 记住, io 的比 gp 的贵, 但是 io 的 IOPS 也比 gp 高
    • gp2 的 max IOPS 是 5.3 TiB
  • Hard Disk Drives (HDD): Data intensive (st1) 或者 less frequent access (sc1)
    • HDD 不可以用来创建 EC2


EBS Multi-Attach

  • Attach same EBS volume to multiple EC2 instance in same AZ (记住是相同的 AZ)
  • Only for io1 / io2 family (Provisioned IOPS SSD, 重要)


EBS RAID

  • RAID 0 (看重 I/O performance)
    • Use RAID 0 when I/O performance is more important than fault tolerance
    • 可以用来提高 EBS volume 的 performance
  • RAID 1 (看重 falut tolerance)
    • Use RAID 1 when fault tolerance is more important than I/O performance


AWS EFS

  • NFS (network file system), 可以 mount 到多个 EC2 Instance
    • Scale automatically, highly availiable, no capacity planning
  • EC2 Instances can access files on an EFS file system across AZs, Regions and VPCs
    • 重要, 可以从不同的 AZ, Region, VPC 来 access
    • 可以用 VPC Security Group 或者 IAM Policy 来 control access to EFS
  • EFS Infrequent Access (如果遇到 POSIX compliant file storage)
    • 因为 EFS 只能和 Linux Instance (POSIX) 一起用, Windows 不行

  • EFS 有 2 种 Performance Mode (set at EFS creation time)
    • General Purpose: Default (web server)
    • Max I/O: High latency, throughput, parallel (big data, media process)
  • EFS 有 3 种 Throughput Mode (重要)
    • Bursting: 很快
    • Provisioned: 设置上限 (high throughput, 处理大量文件 migrate 时候使用)
    • Elastic: 根据 workload 来 scale

EFS 并不是永远最贵的, 但 S3 永远是最便宜的。
在性能方面,AWS EBS 由于可以提供高 IOPS,速度最快。但是在共享和可弹性伸缩的文件系统方面,AWS EFS 是最好的选择。


5. ELB & ASG

Scalability & High Availability

  • Vertical (竖直) Scale: Increase the size of the Instance (增加大小)
  • Horizontal (水平) Scale: Increase the number of the Instance (增加数量)
  • High Availability: Running application in at least 2 AZ (Disaster recovery, 重要)

high availability + high scalability 是 ALB + ECS 或者 API Gateway + Lambda


AWS ELB

  • Foward traffic to multiple EC2 Instances (流量控制)
    • High availability across AZs, handle failures of downstream instances
  • Health Check: 可以知道 EC2 Instance 是否是 Healthy 的
    • ELB 在 public subnet, ASG 在 private subnet (重要)
  • ELB cannot distribute traffic for targets deployed in different region
  • 使用 ELB 的原因, 有 2 点
    • Separate public traffic from private traffic
    • Build a highly available system
  • Elastic Load Balancing use ports 1024-65535

  • Application Load Balancer (HTTP / HTTPS, Layer 7)
  • Network Load Balancer (TCP / UDP, Layer 4)
    • NLB provide static DNS and static IP, ALB only provide static DNS
  • Gateway Load Balancer (Security, Layer 3, 要特别注意)

  • 如果问到 ELB route traffic 到一个 instance 或者 AZ 更多, 有 2 种可能性
    • Instances of a specific capacity type aren’t equally distributed across AZ
    • Sticky sessions are enabled for the load balancer

Application Load Balancer (ALB)

  • Deal with Layer 7 (HTTP / HTTPS, WebSocket)
  • Route base on Path /Home or Query ?Platform=Mobile
  • ALB adds an additional header called X-Forwarded-For contains the client’s IP
  • 可以 configure ALB to redirect HTTP to HTTPS (重要)
  • ALB 无法被 assign Elstaic IP, NLB 可以
  • 如果 ALB 没有 register 任何 target, 会造成 HTTP 503: Service unavailable
  • 可以使用 ALB access logs 来 analyze incoming requests for latencies and IP address
  • You cannot specify publicly routable IP addresses to an ALB
  • An ALB has three possible target types: Instance, IP, and Lambda

  • Routing tables to different target groups
    • Route based on path in URL
    • Route based on hostname in URL
    • Route based on Query String, Headers, Client IP

To forward all requests to the website so that the requests will use HTTPS, a solutions architect can create a listener rule on the ALB that redirects HTTP traffic to HTTPS.


Network Load Balancer (NLB)

  • Deal with Layer 4 (TCP / UDP), 很快, 处理有大量 request 的情况 (比如银行和游戏)
    • NLB best suited for use-cases involving low latency and high throughput workloads
  • One static IP per AZ or use Elastic IP
  • NLB supports HTTP health checks as well as TCP and HTTPS

If you specify targets using an instance ID, traffic is routed to instances using the primary private IP address specified in the primary network interface for the instance.


Gateway Load Balancer (GWLB)

  • Deal with Layer 3 (Security, 3rd Party)
  • GWLB allows perform inline inspection of traffic from multiple spoke VPCs in a simplified and scalable fashion
  • Use GENEVE protocol on port 6081


Sticky Sessions (Session Affinity)

  • Same client is always redirect to the same instance behind a load balancer
  • Make sure user doesn’t loss session data (比如用户登录)

  • Application-based Cookies: Don’t use AWSALB, AWSALBAPP, AWSALBTG
  • Duration-based Cookies: AWSALB for ALB, AWSELB for CLB


Cross Zone Load Balancing

  • ELB distributes traffic evenly across all registered EC2 instances in all AZs
    • 所有 Instance 分到的 traffic 一定是一样的
    • 如果没有 enable, 那么 AZ 内部就是平均分
  • ALB enabled by default, NLB disabled by default (重要)


SSL Certificates

  • Allow traffic between client and load balancer encrypted in transit (HTTPS)
    • Manage SSL certificates using ACM (AWS Certificate Manager)
  • 如果问到需要管理很多个 SSL 或者 TSL, 选择 SNI (重要)
  • 用来 deploy SSL/TLS 的 AWS 服务是 AWS Certificate Manger 和 IAM (重要)

  • Could be used in ALB & NLB, use SNI to make it work
  • SNI: Load multiple SSL certificates onto one web server (Server Name Indication)

Server Name Indication (SNI) allows you to expose multiple HTTPS applications each with its own SSL certificate on the same listener.


Deregistration Delay (Connection Draining)

  • Waiting for existing connections to complete
  • Connection Draining enables the load balancer to complete in-flight requests made to instances that are de-registering or unhealthy


AWS ASG

  • ASG 可以和 ELB 一起用, 来 scale in 或者 out 基于服务器压力 (ASG 是免费的)
  • ASG 可以保证最少或最多有多少 EC2 Instance 在工作 (min, max)
    • ASG 是在一个 Region 里面去根据 availiable 的 AZ 进行 scale
    • ASG 没有办法 span across multiple Regions
  • 当一个 Instance 是 unhealthy 的时候, ASG 会 terminate 这个 Instance
    • ASG 在一个 AZ 里最早 terminate 的是那个最早被 launch 的 Instance
    • 遇到 unbalanced 的情况, ASG 是先 launch new instance 然后 terminate old instance
    • 在 unhealthy 的情况下是先 terminate unhealth Instance 然后 create new instance
  • 如果问到在某个节日前去 scale, 那就是 ASG Scheduled Action

  • 可以和 ELB 一起用, 来 scale out 或者 in 基于服务器压力

  • Launch Template: 可以作为 template 来生成 EC2 Instance
    • Launch Template 是管要生成什么 Instance 的 (eg. Spot, On-Demand)
    • Launch Configuration 是管生成的 Instance 的信息的 (eg. AMI, security group)
  • 如果要更新 Launch Template, 需要删掉旧的然后使用新的 (注意)

  • 可以和 CloudWatch Alarm 一起根据服务器压力 scale out 或 scale in
    • 需要一个 CloudWatch custom metric

  • ASG Scaling Cooldowns
    • After a scaling activity, you are in the cooldown period (default 300 seconds)

  • ASG 不 terminate Instance 的情况
    • The health check grace period for the instance has not expired
    • The instance maybe in Impaired status
    • The instance has failed the Elastic Load Balancing (ELB) health check status
  • 如果 ASG 没有 replace unhealthy EC2 Instance, 需要 change health check type from EC2 to ELB using configuration file

Scaling Policies

  • Target Tracking Scaling: 根据 CPU 用量或者 SQS 长度来 scale
  • Simple / Step Scaling: 和 CloudWatch Alarm 关联
  • Scheduled Actions: 比如在感恩节那天进行 scale
  • Predictive Scaling: Forecast load and schedule scaling ahead

  • 注意, 对于 EC2 Scaling Policy 来说, 一定不可能超过 maximum capacity (重要)

ASG Instance Refresh

  • Update launch template and then re-creating all EC2 Instance


6. AWS Fundamentals

AWS RDS

  • 使用 RDS 的好处 (对比将数据库存在 EC2 上)
    • Automated provisioning, auto-scaling, backup
    • Read Replicas (最多 15 个) 和 Multi-AZ
  • RDS Read Replica: If master database is encrypted, the replica is also encrypted
    • 注意, Read Replica 提高的是 read, 而不是 scale 数据库的 storage (而且 Replica 贵), 处理类似 report 或者 analytics 的就是 Read Replica
    • Read Replica 不是提高 High Availability, Multi-AZ 才是
    • Cross-Region Read Replica: 可以作为 DB backup (Replica in different Region)
  • RDS Multi-AZ: RDS create a primary DB Instance and synchronously replicates the data to a standby instance in a different AZ (High Availability)
    • RDS Multi-AZ update 是 standby 和 primary 一起 update (会有 downtime)
  • Support MySQL, PostgreSQL, MariaDB, Oracle, MS SQL Server, and Amazon Aurora

  • RDS - Storage Auto Scaling (重要)
    • For application with unpredictable workloads

  • RDS - Disaster Recovery
    • Create a Read Replica in different region and enable Multi-AZ on Read Replica
  • IAM Database Authentication: 可以用来访问 RDS
  • 可以从 RDS Read Replica 迁移到 Aurora Read Replica, 这个做法可以 minimal change
    • 注意, 这只适用于 Read Replica 的情况下, 正常情况就直接开 auto scaling
  • RDS OS update 是先 update standby, 然后把 standby promote 成 primary, 然后 update 原来的 primary, 最后原来的 primary 变成新的 standby (重要)

Read Replicas vs. Multi AZ

  • Read Replicas: Scalability, Async Replication
    • Within AZ, Cross AZ or Cross Region
  • Read Replicas in Same Region is free (Replication in AZs), Cross Region is not free
  • You can’t create encrypted Read Replicas from an unencrypted RDS DB instance.

  • 用 Production databse 处理日常, 用 Read Replica 来建一个副本处理数据分析

  • Multi AZ: High Availability, Disaster Recovery, Sync Replication
    • 在 Multi-AZ 的情况下, 当 RDS 数据库 goes down, CNAME 会更新指向 standby
    • Multi AZ 不需要你去更改 SQL connection string



AWS Aurora

  • Support PostgreSQL and MySQL (不需要管理 storage)
  • Aurora 没有 standby database, Aurora Replica 可以作为 failover target
  • 一般来说, Aurora 比 RDS 快

  • High Availiability & Read Scaling (Aurora Read Replica)
    • One Master Write, Multiple Read Replicas (Up to 15 Replica), 一共 16 个
    • 如果问到和 Aurora 有关, 且关于处理网络请求激增的问题, 选 Aurora Replica
    • Replica Tier 越高, failover 时就会先被 promote
  • 如果问到关于 Aurora 的 Auto-Scaling, 那就是 Aurora Serverless
  • 如果问到 Aurora 而且是 test database, 选择 Aurora Database Cloning


RDS Security

  • At-rest encryption: AWS KMS (Key Management Service)
  • In-flight encryption: AWS TLS Certificates
  • IAM Authentication: IAM roles to connect to database (重要)


RDS Proxy

  • Have a proxy for access RDS or Aurora, 缓解数据库压力, minimize open connections
    • Allow apps to pool and share DB connections (提高 DB efficiency)
    • Reduced RDS & Aurora failover time by up 66% (重要)
    • Enforce IAM authentication for DB
  • Serverless, auto scaling, highly avaliable (Multi-AZ)
  • Not publicly accessible (Need to use VPC)


AWS ElastiCache

  • Manage Redis or Memcached (ElastiCache 可以解决数据库高频读取的问题, 重要)
    • ElastiCache 不支持 relational database, 也不能用 SQL queries
  • Cache are in-memory database with high performance and low latency (compute-intensive)
  • Cache helps reduce load off of database for read intensive workload (read heavy)
  • Help to make application stateless (AWS 负责大部分工作)
  • 处理 S3 的是 CloudFront, 而不是 ElastiCache (记住)
  • 支持 multi-threading 的是 ElastiCache for Memcached
  • 关于 ElastiCache for Redis cluster 的 2 个知识点
    • All the nodes in a Redis cluster must reside in the same region
    • While using Redis with cluster mode enabled, you cannot manually promote any of the replica nodes to primary

  • DB Cache: Get data from ElastiCache
    • If not avaliable, get from RDS and store in ElastiCache

  • User Session: Write Session data into ElastiCache
    • User in another instance could still be logged in

  • Redis vs Memcahed
    • Redis: Read Replicas, High Availiablity, Backup and restore (非常重要)
    • Memcached: Not High Availiablity, No Backup, risk to lose data, Multi-threaded

The maximum number of Read Replicas you can add in an ElastiCache Redis Cluster with Cluster Mode Disabled is 5.


ElastiCache Strategies

  • Lazy Loading / Cache-Aside / Lazy Population
    • Pros: Only requested data is cached (重要, only requested data)
    • Cons: Cache miss penalty results in 3 round trips & stale data (数据可能不是最新的)

  • Write Through (write 耗时, 但是 read 的 latency 很低)
    • Pros: Data in cache is never stale, write penalty with 2 calls (数据一定是最新的)
    • Cons: Missing data until data is added / updated in the DB

  • Cache Evictions & Time-to-live (TTL)
    • Cache Evictions: 这里指的是缓存达到上限的情况, 需要用 TTL 去按时清理缓存


MemoryDB for Redis

  • Redis-compatible, durable, in-memory database service (为 Redis 服务)
  • Durable in-memory data storage with Multi-AZ transactional log


7. Route 53

What’s DNS

  • Translate hostname into IP address: www.google.com = 172.217.18.36


AWS Route 53

  • User can update the DNS records (High available, scalable)
  • Route 53 is a Domain Registrar

  • Route 53 Records:
    • Domain Name, Record Type, Value, Routing Policy, TTL

  • Route 53 Record Types
    • A: IPv4
    • AAAA: IPv6
    • CNAME: Map hostname to another hostname
    • NS: Name Servers for the Hosted Zone

  • Route 53: Hosted Zones
    • Public Hosted Zones: Public domain name
    • Private Hosted Zones: Private domain name (VPC)

注意, DNS hostnames and DNS resolution are required settings for private hosted zones

  • GoDaddy + Route 53 (As DNS Service Provider)
    • Create a Public Hosted Zone and update the 3rd party Registrar NS records
  • Route 53 Health Check
    • 如果 ELB 出问题, 就去找 Route 53 Health Check

Route 53 TTL

  • TTL: Time To Live (生存时间)
    • How long the value should be cached (后端更新不代表 cache 更新了)


CNAME vs Alias

  • CNAME: Point a hostname to another hostname
    • acme.example.comzenith.example.org 或者 example.comexample.net
  • Alias: Point a hostname to an AWS resource (对象是 CloudFront, S3 这些)
    • covid19survey.comwww.covid19survey.com
    • app.mydomain.comapp.amazonaws.com


Health Checks

  • Health Checks are only for public resources
  • Health Check 检测: Endpoints, 其他 Health Checks, CloudWatch Alarms (重要)


Routing Policy (Simple)

  • Define how Route 53 responds to DNS queries (一共 7 种 Routing Policies)

  • Simple: Route traffic to a single resource
    • If multiple values returned, a random one is chosen by client


Routing Policy (Weighted)

  • Weighted: Control the % of the request go to each resource


Routing Policy (Latency)

  • Latency: Redirect to the resource that has the least latency close to us
  • Latency is based on traffic between users and AWS Regions


Routing Policy (Failover)

  • Failover: Have a Primary Instance and a Secondary Instance (重要)
    • When failover, switch to Secondary Instance
  • 注意, 当问到 failover 的时候, 选择的是 active-passive failover routing policy
    • 没有什么 active-active, 只有 active-passive


Routing Policy (Geolocation)

  • Geolocation: Routing based on user location
    • Need to create a “Default” record
    • Use cases: website localization, restrict content distribution 等等


Routing Policy (Geoproximity)

  • Geoproximity: Route traffic to resource based on geolocation
    • But have the ability to shift more traffic to resource based on bias (重要)
    • 可以做到 route more traffic or less


Routing Policy (Traffic Flow)

  • Visual editor to manage complex routing decision trees
    • Configuration can be saved as Traffic Flow Policy
    • Can be applied to different Route 53 Hosted Zone


Routing Policy (IP-Based)

  • IP-Based: Routing based on client’s IP address
    • Provide a list of CIDRs for your client


Routing Policy (Multi-Value)

  • Multi-Value: Route traffic to multiple resources
    • Not a subsitute for ELB


8. VPC Fundamentals

AWS VPC

  • VPC: Virtual Private Cloud, all AWS account have a default VPC
  • CIDR should not overlap, max CIDR size in AWS is /16, min CIDR size in AWS is /28
  • 要让 VPC 使用 custom domain 需要 enableDnsHostnamesenableDnsSupport
  • 可以创建 Shared Service VPC, 这样每个 VPC 都可以 access 到需要的 services (重要)

  • Each Amazon EC2 instance that you launch into a VPC has a tenancy attribute (重要)
    • 可以在 dedicated 和 host 之间互相切换


VPC Subnet

  • Allow you to partition your network inside the VPC (VPC 内划分网络)
    • Can have Public Subnet and Private Subnet
  • AWS reserves 5 IP addresses in each subnet (first 4 & last 1, 重要)
  • Subnet is always associated with Route Table (重要)


Internet Gateway (IGW) & Route Table

  • IGW allow resource (eg. EC2 Instance) in a VPC connect to internet (重要)
    • 相当于让 VPC 里的 resource 可以连上网
    • IGW 需要 Route Table (重要)
  • 如果 IGW 出问题
    • Need to also have Route Table (首先检查 Route Table, 因为 IGW 需要)
    • 检查 Security Group 是否允许通过
  • 处理 Network Address Translation 的就是 Internet Gateway
  • Internet Gateway 无法直接在 private subnet 里面使用 (重要)
  • NAT 针对的是 Subnet 层面, IGW 针对的是 VPC 层面 (重要)



NAT Gateway (NATGW)

  • AWS managed NAT, AZ specific, use Elastic IP (针对 IPv4, Egress 是 IPv6)
    • Allow EC2 Instance in Private Subnet to connect to the Internet
    • Requires an IGW, NATGW 是处于 public subnet 的 (注意)
  • NAT 针对的是 Subnet 层面, IGW 针对的是 VPC 层面

  • Resilient within single AZ, Multiple-AZ need multiple NATGW
    • 每个 AZ 都要一个 NATGW, 用来 fault-tolerance (容错能力)


NACL & Security Groups

  • The request has to go over NACL before go to Subnet (Subnet level)
    • NACL is stateless (inbound outbound 都要检测)
  • The request has to go over Security Group before go to EC2 Instance (Instance level)
    • Security Group (SG) is stateful (inbound accepted = outbound accepted)
  • 反正记住 NACL 是 Subnet level (stateless), Security Group 是 Instance level (stateful)

  • Network Access Control List (NACL)
    • NACL control traffic from and to subnets (eg. block IP)
    • One NACL per subnet, each subnet have default NACL (注意)
      • Default NACL accept every inbound and outbound
      • 但是 custom NACL 默认是 deny inbound 和 outbound 的 (注意)
    • NACL Rules have number, higher precedence with lower number (越优先数字越低)


VPC Flow Logs

  • Capture information about IP traffic going into interfaces (监视 IP 流量的)
    • Monitor & troubleshoot connectivity issues (网络连接问题)


VPC Peering

  • VPC Peering: Privately connect two VPCs using AWS network (只适合少量 VPC, 重要)
    • 每一对 VPC 都要 VPC Peering, 和 S3 Replication 类似
  • Can create VPC Peering between VPCs in different AWS accounts / regions
  • Need to update Route Table in each VPC subnet to make sure they can communicate
    • 出问题就检查 Route Table


VPC Sharing

  • VPC Sharing: Allows multiple AWS accounts to create resources (eg. EC2, RDS) into shared and centrally-managed AWS VPC (重要)
    • 遇到 centrally managed 的就是 VPC Sharing
    • 而且是 owner 需要 share one or more subnet (注意)


  • VPC Endpoints allow private access to AWS services within a VPC (重要)
    • 用 Private Network 连接 AWS 服务, 和 On-Premise 没有关系
    • Interface Endpoint 支持大部分 AWS 服务 (付费)
    • Gateway Endpoint 只支持 S3 和 DynamoDB (很容易考, 免费)
  • 注意, VPC Gateway Endpoint 专门处理 S3 和 DynamoDB (不要去选择 NAT 或者 IGW)

  • 两种 Types of Endpoints (Interface / Gateway)
    • Interface Endpoints: Supports most AWS services (付费)
    • Gateway Endpoints: Must be used as a target in a route table (免费, 重要)
      • Only support S3 and DynamoDB (只支持 S3 和 DynamoDB)


Site-to-Site VPN (VPN Connection)

  • Site-to-Site VPN: Connect AWS to Corporate Data Center over public internet (重点, public internet, On-Premise to AWS)
    • Need a Virtual Private Gateway (VGW) on VPC (AWS 方)
    • Need a Customer Gateway (CGW) on DC (On-Premise 方)
  • 对比 DX, Site-to-Site VPN 没有提供 low latency, and high throughput connection
  • Site-to-Site VPN 有 encrypted network connectivity between On-Premise and VPC

  • 需要 VPN Gateway 和 Customer Gateway 来连接 VPC 和 On-Premise


Direct Connect (DX)

  • Provide a dedicated private connection from a remote network to VPC (重点, private internet, On-Premise to AWS)
    • 需要设置 VGW, 如果给 DX 找 backup, 那么就选择 Site-to-Site VPN (这两个服务类似)
    • 对比 Site-to-Site VPN, DX provide low latency, and high throughput connection
    • DX 不支持 encrypted network connectivity, Site-to-Site VPN 可以, 但是如果需要 encrypt DX connection 可以和 AWS VPN 一起用 (注意)
    • DX 的 set up 时间很长 (所以 DX 不算什么 quick solution)
  • All private, no public network involve (注意, 如果问到 all Private 就是 DX)
    • Direct Connect does not involve the Internet
  • 可以 access public resources (S3) 和 private resources (EC2)

  • 两种 Connection Type (带宽选项, Dedicated 最快)
    • Dedicated Connections: 1Gbps 到 100Gbps
    • Hosted Connections: 50Mbps 到 10 Gbps

  • Direct Connect Gateway (连接许多 VPC 在不同的 Region)
    • Direct Connection to one or more VPC in many different regions

  • PrivateLink vs DX

    • AWS PrivateLink provides a connection between VPCs (Virtual Private Clouds) and AWS services while bypassing the public Internet. It is a private network connection that securely transfers data without leaving the AWS network

    • AWS Direct Connect is a dedicated, private connection between the customer’s on-premises infrastructure at a data center and an AWS location. The main features of the connection are ultra-fast data transfer rates, low latency, and improved security since it bypasses the public Internet


9. S3

AWS S3

  • S3 Buckets (S3 是 Global 的, 但是 Bucket 是 Regional 的)
    • Store objects (files) in “buckets”
    • Buckets must have globally unique name (名字必须独特)
    • Buckets are defined at region level

  • Objects (files) have a key, max object size is 5TB
  • 如果上传超过 5 GB, 就要用 Multi-Part Upload (还有 S3 Transfer Acceleration)

  • S3 sync command: Uses the CopyObject APIs to copy objects between S3 buckets
  • S3 always return the lastest version of the object (重要)
  • S3 没办法加密 metadata
  • S3 是 serverless 的
  • 如果遇到需要处理 static content 的, 就一定是 S3 + CloudFront (重要)
  • 如果问到 S3 而且是关于图片上传, 选择 S3 Event Notification, 不是 EventBridge (重要)

  • 如果发现 S3 的传输功能出问题, 那么就给 S3 bucket 加上 prefix (重要)
  • 注意, S3 和 database 是不沾边的, S3 不是数据库
  • 如果要 control access to data stored on AWS S3, 可以用下面 2 点
    • Query String Authentication, Access Control List (ACLs)
    • Bucket policies, Identity and Access Management (IAM) policies
  • In order to get object access log, the object owner also need to be bucket owner
  • S3 Object Ownership is a S3 bucket setting that control ownership of new objects that are uploaded to a bucket (重要)

By default, an Amazon S3 object is owned by the AWS account that uploaded it. This is true even when the bucket is owned by another account (重要)

  • 关于 S3 data consistency model, 有 2 点
    • A process deletes an existing object and immediately tries to read it. Amazon S3 will not return any data as the object has been deleted
    • If you delete a bucket and immediately list all buckets, the deleted bucket might still appear in the list

S3 Bucket Policy

  • User-Based Security
    • IAM Policies: Which API calls should be allowed for a user from IAM (重要)
  • Resource-Based Security
    • Bucket Policies: Bucket rules (比如让 object public)
  • 注意, IAM Policy 的 precedence 是比 S3 Bucket Policy 要优先的

  • S3 Bucket Policies (JSON based policies)
    • Resoruces: buckets and objects
    • Effect: Allow / Deny
    • Actions: API to Allow or Deny
    • Principal: The account or user to apply the policy to


S3 Website

  • Host 在 S3 上面的网站一般是以下两种后缀 (dash Region 或者 dot Region)
    • s3-website-Region: http://bucket-name.s3-website-Region.amazonaws.com
    • s3-website.Region: http://bucket-name.s3-website.Region.amazonaws.com


S3 Versioning

  • Enabled at the bucket level (Default is null)
  • Best pratice to version the buckets (可以 roll back, 可以防止误删, 重要)
    • 可以防止 accidental deletion of objects
  • Once version-enable a bucket, it can never return to an unversioned state (重要)
  • Versioning 是针对这个 bucket 里面的所有 object, 所以不能指定 folder 来 versioning


S3 Replication (CRR & SRR)

  • CRR: Cross Region Replication (compliance, lower latency access)
  • SRR: Same Region Replication (create test environment)
    • Must enable Versioning
    • The Copying is asynchronous
    • Must give IAM permissions to S3

  • Only new objects are replicated, existing objects need S3 Batch Replication
    • 也就是说只有新的 object 会被 Replicate, 老的 object 要 S3 Batch Replication
  • No “chaining” in replication
    • 比如把 A 复制到 B 和 C, 需要 A 复制到 B 和 A 复制到 C
  • S3 lifecycle actions are not replicated with S3 Replication


S3 Storage Classes

  • 一共 7 种 Storage Class: General + 2 IA + 3 Glacier + Intelligent

  • S3 Standard - General Purpose
    • Used for frequently accessed data (最常见)

  • S3 Infrequent Access (Standard IA & One Zone IA)
    • For data is less frequent access but require rapid access when needed (重要)
    • Lower cost than Standard
    • 从 Standard 转到 One Zone IA 最少要 30 天
    • One Zone IA 不是 High Availability 的选择 (注意, 因为 AZ 会 down)
    • 如果遇到 access is always required 就不能选 One Zone IA, 因为 AZ 可能会 down

  • S3 Glacier Storage Classes (Archive 专用)
    • Low-cost object storage for archiving / backup
    • Price: storage + retrieval
    • S3 Glacier Instant Retrieval: Glacier 里最快但也最贵
    • S3 Glacier Flexible Retrieval: 三种模式 (Expedited 加急, Standard, Bulk 批量)
    • S3 Glacier Deep Archive: 存的时间最久, 两种模式 (Standard, Bulk), 48 小时 retrieval

  • S3 Intelligent Tiering
    • Move objects automatically between Access Tiers based on usage
    • 就是自动帮你把 Object 移到不同的 Storage Class 里
    • Intelligent Tier 是在 Standard 和 Standard IA 下面的, 不能从下往上转移


10. AWS CLI, SDK, IAM Roles & Policies

EC2 Instance Metadata (IMDS)

  • Allow EC2 Instance to “learn about themselves” without using an IAM Role
  • AWS CLI 用 Instance Metadata 去拿 temporary credentials (在 Python 环境下)
    • Metadata = info about the EC2 Instance (没法拿到 IAM Policy)
    • Userdata = launch script of the EC2 Instance

  • IMDSv2 vs. IMDSv1
    • IMDSv1 access http://169.254.169.254/latest/meta-data (不推荐)
    • IMDSv2 is more secure and done in two steps


MFA with CLI

  • To use MFA with CLI, you must create a temporary session
    • Run the STS GetSession API call

  • 如果是 check IAM permission, 用 AWS CLI --dry-run

AWS SDK

  • Perform actions on AWS directly from your application code
  • If not configure a default region, us-east-1 will be chosen by default


AWS Limits (Quotas)

  • API Rate Limits
    • For Intermittent Errors: implement Exponential Backoff
    • For Consistent Errors: request an API throttling limit increase (通过 AWS 申请)
  • Service Quotas / Limits
    • Request a service limit increase by opening a ticket
    • Request a service quota increase by Service Quotas API

  • Exponential Backoff
    • If get ThrottlingException intermittently, use exponential backoff
    • It’s a retry mechanism included in AWS SDK API calls
    • Only retry for 5XX errors


AWS CLI Credentials Provider Chain

  • The CLI will look credentials in order
    • Command Line Options -> Environment Variables -> EC2 Instance Profile
  • The credentials chain could give priority to the environment variables


AWS Sigv4

  • AWS SigV4 是对 AWS 请求进行签名的身份验证方法,确保安全和完整性 (2 种选择)
    • HTTP Header: Signature in Authorization header
    • Query String: Signature in X-Amz-Signature


11. Advanced S3

S3 Lifecycle Rules

  • Transition objects between storage classes (比如从 IA 到 Glacier)
    • Transition 需要 Lifecycle Rules (重要)
  • 比如从 Snowball 到 Glacier 就需要 Lifecycle Rule

  • Transition Actions: Configure objects to transition to another storage class (转移)
  • Expiration Actions: Configure objects to expire after some time (删除)

  • S3 Analytics: Help decide when to transition objects to the right storage class


S3 Event Notifications

  • Automatically react to certain event happened in S3 (比如图片上传)
    • Need to have IAM Permissions
    • S3 Event Notification 的 destination 是 SQS, SNS, Lambda (记住)
  • 大部分和 S3 事件相关的都是 S3 Event Notification, 而不是 EventBridge (重要)

  • If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent

S3 Performance

  • For Upload 上传 (重要)
    • Multi-Part Upload: 当上传文件大于 5 GB, 可以 parallelize uploads
    • S3 Transfer Acceleration: 将文件传到 AWS edge location, 可以和 Multi-Part 一起用
    • S3 Transfer Acceleration (S3TA) can speed up content transfers to and from S3
    • 注意, S3 Transfer Acceleration 没办法 copy object between buckets

  • For Download 下载
    • S3 Byte-Range Fetches: Parallelize GETs, retrieve partial data (拿部分数据, 重要)


S3 Select & Glacier Select

  • S3 Select 使用 SQL 来做 server-side filtering, 过滤从而减少数据量
  • Less network transfer, less CPU cost client-side


S3 Object Tags & Metadata

  • S3 User-Defined Object Metadata: Must begin with x-amz-meta-
  • S3 Object Tags: Useful for fine-grained permission or analytics purpose
  • You cannot search the object metadata or object tags (重要)


12. S3 Security

S3 Encryption

  • Server-Side Encryption (SSE-S3 免费, SSE-KMS 要钱, SSE-C)
  • Client-Side Encryption (如果用户已经有 encryption method)

  • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
    • Enabled by default for new buckets & objects
    • SSE-S3 的 header 是 x-amz-server-side-encryption:AES256
    • 使用 SSE-S3, 每一个 Object 都是由 unique key 来 encrypt 的
    • SSE-S3 使用 256-bit Advanced Encryption Standard (AES-256)
    • SSE-S3 是没有 automatic key rotation 的, 要用 SSE-KMS (重要)

  • Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
    • User control + audit key usage in CloudTrail (可以管理 key rotation)
    • SSE-KMS 的 header 是 x-amz-server-side-encryption:aws:kms

  • Server-Side Encryption with Customer-Provided Keys (SSE-C)
    • 用户自己有 keys, AWS will not store the key, HTTPS must be used (重要)
    • 当问题中提到用户需要使用自己的 key, 但是打算在 AWS 端做 encryption
    • S3 will reject any requests made over HTTP when using SSE-C (重要)

  • Client-Side Encryption
    • User fully manages the keys and encryption cycle
    • 数据在送到 AWS 之前就是加密好的

  • Encryption in Transit (SSL/TLS)
    • 可以用 aws:SecureTransport 来 enforce (不过有 SSL 基本上就是 HTTPS)


S3 CORS

  • CORS: Cross-Origin Resource Sharing (跨域问题)

  • S3 CORS (重要)
    • A client makes a cross-origin request on S3 bucket, need the correct CORS headers


S3 MFA Delete

  • MFA Delete: 防止不小心删除文件, 用户需要先验证身份 (重要)
    • To use MFA Delete: Versioning must be enabled
  • Only the bucket owner (root user) can enable/disable MFA Delete (重要)


S3 Access Logs

  • Any request made to S3 will be logged to another S3 bucket (数据分析)
    • 可以作为 Data Analysis 的工具


S3 Pre-signed URL

  • S3 Pre-signed URL 允许临时访问特定 S3 对象的 URL


S3 Access Points

  • Access Points simplify security management for S3 Buckets (根据用户属性)
  • Each Access Point: DNS name + Access Point Policy


S3 Object Lambda

  • Use AWS Lambda to change object before retrieve by caller (在到达 caller 之前)
    • 用途: 可以给图片打水印, 可以用来 Invoke remove PII 的 Lambda 方法 (重要)
  • 需要 S3 Access Point 和 S3 Object Lambda Access Points


13. CloudFront

AWS CloudFront

  • Content Delivery Network (CDN), 可以 serve static & dynamic content
    • Improve read performance, content is cached at the edge
  • 可以防止 DDoS protection (搭配 AWS Shield), 可以根据 content type 来 route, 可以指定 primary & secondary origins 来做 high availiability & failover
  • CloudFront Origins: S3 Bucket (OAC) 或者 Custom Origin (HTTP)
    • OAC: Origin Access Control (重要, 如果涉及到 S3 的问题)
  • 如果问到 CloudFront 而且需要 encryption, 选择 field level encryption (不是 KMS)
    • Field-level encryption allows you to enable your users to securely upload sensitive information to your web servers.

  • CloudFront vs S3 Cross Region Replication
    • CloudFront: 使用 global edge network, 有 TTL, 用于 static content
    • S3 Cross Region Replication (CRR): 每个 Region 都要设置, 但是没有 TTL, 用于 dynamic content (updated in near real-time, 只读)
    • 问到 S3 并且有关 high available 和 low latency 的就是 S3 Cross Region Replication

  • CloudFront 允许 Proxy methods 和 Dynamic content 跳过 regional edge cache
  • 除了 WAF 可以去 block IP, 还可以使用 OAI (origin access identity, 重要)
    • OAI 也可以用来 secure communication between CloudFront & S3
  • 可以用 CloudFront signed URLs 和 CloudFront signed cookies 来 restrict access to documents (比如 subscription)
  • Can configure CloudFront to require HTTPS from clients (可以用 CloudFront 要求 client 必须使用 HTTPS)
  • 关于 CloudFront with Origin Groups 的 2 个点
    • CloudFront routes all incoming requests to the primary origin, even when a previous request failed over to the secondary origin
    • CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET, HEAD or OPTIONS

CloudFront Caching

  • The cache lives at each CloudFront Edge Location
  • CloudFront identifies each object in the cache using Cache Key
  • 如果遇到 CloudFront 处理 language 配置出问题, 那么可以用 Query string forwarding and caching (这道题很长)

  • CloudFront Cache Key
    • By default, consists of hostname + resource portion of the URL
    • Could add other elements to the Cache Key using CloudFront Cache Policies

  • Cache Policy: 可以使用 HTTP Headers, Cookies, Query Strings (重要)

  • Origin Request Policy
    • Values that you want to include in origin requests without including in the Cache Key


CloudFront Cache Invalidation

  • 让 CloudFront 跳过 TTL 立即更新
    • 比如你更新了后端, 但是 CloudFront 不会立即更新, 需要 CloudFront Invalidation 来跳过 TTL 来立即更新


CloudFront Cache Behaviors

  • 根据不同的 URL 结构做不同的 Cache Behaviors
  • Maximize cache hits by separating staic and dynamic distributions


CloudFront Geo Restriction

  • Allowlist & Blocklist (针对 IP, 国家进行访问限制)
    • CloudFront Geo Restriction 没办法和 VPC 一起使用


CloudFront Signed URL / Cookies

  • 类似 S3 Pre-Signed URL
    • Signed URL: access to individual files (单个文件)
    • Signed Cookies: access to multiple files (多个文件)
  • Recommend to use Trusted Key Group than CloudFront Key Pair (重要)

  • CloudFront Signed URL vs. S3 Pre-Signed URL
    • Access CloudFront 的就是 CloudFront Signed URL
    • Access S3 的就是 S3 Pre-Signed URL

  • 与 CloudFront signer 有关的 2 个知识点
    • When create a signer, the public key is CloudFront and private key is used to sign a portion of URL
    • When use the root user to manage CloudFront key pairs, you can only have up to two active CloudFront key pairs per AWS account
  • CloudFront Key Pair 只能由 root user 来创建

CloudFront Advanced Concepts

  • Price Classes (一共 3 种)
    • Price Class All: all regions - best performance
    • Price Class 200: most regions, but excludes the most expensive regions
    • Price Class 100: only the least expensive regions

  • CloudFront - Origin Groups
    • Increase high-availability and do failover
    • Origin Group: one primary and one secondary origin

  • CloudFront - Field Level Encryption
    • Protect user sensitive information through application stack (重要)


CloudFront Real Time Logs

  • Get real-time requests received by CloudFront sent to Kinesis Data Stream


AWS Global Accelerator

  • 利用 AWS internal network to route application
    • Provide 2 global static anycast IPs (重要)
    • Improve availability and performance of the applications (globally)
  • Global Accelerator improves performance for applications over TCP or UDP
  • Global Accelerator has automatic failover
  • Global Accelerator is more expensive as it adds an extra layer of infrastructure (对比 CloudFront 不是一个 cost-effective 的选择)

  • AWS Global Accelerator vs CloudFront
    • CloudFront: Content is served at edge, 比如图片和视频 (cacheable), Dynamic content such as API acceleration & dynamic site delivery
    • Global Accelerator: 适合 TCP 或者 UDP, 比如游戏和 IoT (static IP)

AWS Global Accelerator is a network service that can provide a global traffic management solution. By creating a standard accelerator in AWS Global Accelerator, you can guide user traffic to the endpoint closest to them, thereby improving the performance and availability of the application.


14. ECS, ECR & Fargate

AWS ECS

  • Elastic Container Service (Manage Docker containers on AWS)
  • ECS 有两种 Launch Type: EC2 和 Fargate (Fargate 是 serverless 的)
  • EC2 Launch Type: Need Provision, 需要 ECS Agent (重点)

  • Fargate Launch Type: Serverless, No Provision (重点)

  • IAM Roles for ECS (重要, 是 IAM)
    • EC2 Instance Profile: 只针对 EC2 Launch Type, used by ECS agent
    • ECS Task Role: 每个 Task 都有自己的 Role (负责)

  • ECS 可以和 Load Balancer 一起用 (ALB / NLB)
  • ECS 可以和 EFS 一起用 (Fargate + EFS = Serverless, 管理文件)
  • 如果 terminate ECS 出现 synchronization issue, 那么 ECS 当前的状态一定是 STOPPED
  • 如果 ECS cluster launch 出问题, 那么先检查 ecs.config 有没有问题
  • 可以用 awslogs log driver 来送 ECS 的 log 到 CloudWatch, 如果 ECS 是 Fargate 类型的, 还需要 logConfiguration parameter 在 task definition 里

Amazon ECS with EC2 launch type is charged based on EC2 instances and EBS volumes used. Amazon ECS with Fargate launch type is charged based on vCPU and memory resources that the containerized application requests


ECS Auto Scaling

  • Automatically increase / decrease the number of ECS tasks
    • Target Scaling: Scale based on target value for CloudWatch metric
    • Step Scaling: Scale based on CloudWatch Alarm (重要)
    • Scheduled Scaling: Scale based on specific time (重要)

  • Auto Scaling EC2 Instances
    • Accommodate ECS Service Scaling by adding underlying EC2 Instances
    • Auto Scaling Group Scaling / ECS Cluster Capacity Provider


ECS Rolling Updates

  • Control how many tasks can be started and stopped when update from v1 to v2


ECS Task Definitions

  • Task definitions are metadata in JSON form to tell ECS how to run Docker container
  • Environment Variable: Hardcoded, SSM Parameter Store, Secrets Manager, S3

  • ECS Load Balancing - EC2 Launch Type
    • Get a Dynamic Host Port Mapping if define only the container port in task definiton
    • Must allow EC2 Instance’s Security Group any port from the ALB’s Security Group (重要)

  • ECS Load Balancing - Fargate
    • Each task has a unique private IP, only define the container port

  • ECS Data Volumes (Bind Mounts)
    • Share data between multiple containers in the same Task Definition
    • Work for both EC2 and Fargate


ECS Task Placements

  • For EC2 Launch Type, determine where to place the EC2, CPU, Memory 等等
  • ECS Task Placement Strategies could mixed together (Binpack, Random 和 Spread)

  • ECS Task Placement Strategies - Binpack
    • Place tasks based on the least available amount of CPU or memory
    • This minimize the number of instances in use (cost saving, 重要)

  • ECS Task Placement Strategies - Random
    • Place the task randomly
    • To enable random host port, set host port = 0 or empty, which allow multiple containers of same type to launch on the same EC2 container instance
    • 如果问到第一个 container 运行成功, 但第二个运行失败, 那就是 host port 的问题

  • ECS Task Placement Strategies - Spread
    • Place the task evenly based on the specified value

  • ECS Task Placement Constraints
    • distinctInstance: Place each task on a different container instance
    • memberOf: Place task on instances that satisfy an expression


AWS ECR

  • Elastic Container Registry (ECR, 管理 Docker Image, 不是 Docker Container)
    • Store and manage Docker images on AWS

  • How to use CLI to pull and push image to ECR
    • $(aws ecr get-login --no-include-email)
    • docker pull 1234567890.dkr.ecr.eu-west-1.amazonaws.com/demo:latest


AWS CoPilot

  • CLI tool to build, release, and operate production-ready containerized apps
    • Run your apps on AppRunner, ECS, and Fargate


AWS EKS

  • Elastic Kubernetes Service (Manage Kubernetes clusters on AWS)
    • Support 2 deployment mode: EC2 & Fargate (和 ECS 一样)

  • EKS Data Volumes (需要 StorageClass)
    • Leverages a Container Storage Interface (CSI) compliant driver
    • 支持 EBS, EFS, FSx


15. Elastic Beanstalk

AWS Elastic Beanstalk

  • Elastic Beanstalk is a developer centric view of deploying an application on AWS
  • Automatically handles capacity provisioning, load balancing, scaling 等等

  • Web Server Tier vs. Worker Tier (重要)
    • 一个管 Web Server, 另一个处理 Process (SQS, SNS 等等)
  • 对于 Worker 或者 Web environment 来说, 需要一个 cron.yaml 文件

  • 如果选择 deploy 到 Elastic Beanstalk 上, 并且省钱, 选择 Single Instance Mode
  • 如果需要用类似 Rust 这种不支持的 runtime, 可以用 Docker image 去跑 Elastic Beanstalk
  • 如果 deploy 失败, 那么 EB 会 replace the failed instances with instances running in the application version from the most recent successful deployment (重要)
  • 如果要 setup HTTPS on Beanstalk, 要在 .ebextensions folder 弄一个 config 来管 Load Balancer
  • 如果要 migrate Beanstalk environment across Accounts
    • Create a saved configuration download it to your local machine. Make the account-specific parameter changes and upload to the S3 bucket in another account. From Elastic Beanstalk console, create an application from ‘Saved Configurations’

Beanstalk Deployment Modes

  • 一共 6 种 Beanstalk Deployment Modes
  • 这里面 Immutable 和 Traffic Splitting 可能会导致 EC2 burst balance lost (重要)
  • 注意, Rolling 和 Rolling with additional batches 都会导致 reduce availability
  • All at once (直接全部更新, 但会有 downtime)
    • Fastest, but have downtime
  • Beanstalk 用来 avoid downtime 的选择 blue/green deployment (重要)

  • Rolling (一批一批的更新)
    • Update a few instances at a time (bucket), then move onto the next bucket

  • Rolling with additional batches (类似 Rolling, 但是老版本仍然运行, 有部分额外费用)
    • Like rolling, but spins up new instances to move the batch, old application still available

  • Immutable (和 quick roll back 有关)
    • Spin up new instances in a new ASG, deploys version to these instances, and then swaps all the instances when everything is healthy
    • 如果需要 maintain at least the full capacity of the application and minimal impact of fail deployment, 选择 Immutable

  • Blue Green (会先创造一个新的 environment)
    • Create a new environment and switch over when ready

  • Traffic Splitting (分一部分 traffic 到新环境)
    • Canary testing, send a small % of traffic to new deployment


Beanstalk Lifecycle Policy

  • Elastic Beanstalk can store at most 1000 application versions
  • To phase out old application versions, use a lifecycle policy (重要)


Beanstalk Extensions

  • A zip file containing code must be deployed to Elastic Beanstalk
    • In the .ebextensions/ in the root directory (必须存在 .ebextensions 的文件夹下)
    • .config extensions (必须以 .config 为文件后缀)


Beanstalk with CloudFormation

  • Under the hood, Elastic Beanstalk relies on CloudFormation
  • CloudFormation is used to provision other AWS services


Beanstalk Cloning

  • Clone an environment with the exact same configuration
  • Useful for deploying a test version of your application


16. AWS CloudFormation

AWS CloudFormation

  • A declarative way of outlining AWS Infrastructure (相当于有一个模板帮你生成你想要的)
    • 需要什么 (eg. EC2 Instance, S3), CloudFormation 给你 create 什么
    • CloudFormation 是 IaaS (Infrastructure as a Service)
  • Can leverage existing templates on the web (可以用网上已经有的 template)

  • Templates must be uploaded in S3 and referenced in CloudFormation (重要)
  • To update a template, need to re-upload a new version of the template (重要)
    • 一般来说, 先在本地更新 CloudFormation template, 然后上传到 S3 并在 CloudFormation console 里面应用

  • Deploying CloudFormation Templates (2 种方法)
    • Manual way: Edit templates in Applcation Composer or code editor
    • Automated way: Edit templates in a YAML file

  • 注意, 依赖 CloudFormation 来 provision resource 的是 Elastic Beanstalk 和 SAM
  • CloudFormation 用来 upload Lambda 和 CloudFormation template 的是 cloudformation package and cloudformation deploy
  • 如果要 declare Lambda function in CloudFormation
    • Write the AWS Lambda code inline in CloudFormation in the AWS::Lambda::Function block as long as there are no third-party dependencies
    • Upload all the code as a zip to S3 and refer the object in AWS::Lambda::Function block

CloudFormation Resources

  • Resources represent different AWS Components that will be created and configured
    • 不需要定义 Resource 生成的 order
    • Resource 是 mandatory 的 (重要)
  • 形式 service-provider::service-name::data-type-name


CloudFormation Parameters

  • Parameters are a way to provide inputs to AWS CloudFormation Template
    • Can reuse the templates, also don’t need to re-upload the template
    • Parameter type 有 AWS::EC2::KeyPair::KeyName, CommaDelimitedList, String

  • Parameters Settings (记下面 2 个)
    • AllowedValues: 给用户选项, 比如选择 t2.micro, t2.small, t2.medium
    • NoEcho: 把 password 这种 secret 的东西从 log 里面移除

  • Pseudo Parameters
    • Can be used anytime and enabled by default
    • AccountId, Region, StackId, StackName, NotificationARNs, NoValue


CloudFormation Mappings

  • Mappings are fixed variables within CloudFormation template
    • Used to differentiate between environments (dev vs. prod), regions, AMI 等等


CloudFormation Outputs & Exports

  • Outputs declares optional ouputs values that can import into other stacks (重要)
    • Outputs 可以申明资源让我们可以在其他 Stack 里面用 (前提是先 export 它们)
    • Exported output name 在 Region 内必须是 unique 的 (重要)
  • 如果我们要在第二个 template 里面去 leverage 前一个 template 的资源, 要用到 Fn::ImportValue (重要)
  • 注意, ImportValue 是用来 reference value from other stack, Export 才是用来 export value to other stacks 的

  • 注意, 对于 CloudFormation 来说, All of the imports must be removed before you can delete the exporting stack or modify the output value, 所以如果遇到 Stack B, C 需要 reference Stack A, 那么 Stack A 一定是最后被删除的

CloudFormation Conditions

  • Conditions are used to control the creation of resources or outputs based on a condition
    • 常见的有 Environment, AWS Region, Parameter value
  • 可以和 Conditions associate 有 Resources, Conditions, Outputs
  • 在 CloudFormation template 里面代表 invalid section 的是 Dependencies


CloudFormation Intrinsic Functions

  • Fn::Ref: Reference Rarameters or Resources (只能在 Parameter 或者 Resource 里用)
  • Fn::GetAtt: Returns the value of an attribute from a resource in the template
  • Fn::FindInMap: Return a named value from a specific key (注意, 就 3 参数)
  • Fn::ImportValue: Import values that are exported in other stacks
  • Fn::Base64: Convert String to it’s Base64 representation
  • Condition Functions: Fn::And/Equals/If/Not/Or


CloudFormation Rollbacks

  • Stack Creation Fails 和 Stack Update Fails 都会去 rollback, 但 rollback 也可能失败
  • Rollback 之后如果需要重新更新 Stack, 要先删除之前的 Stack, 然后 create 新的 Stack


CloudFormation Service Role

  • IAM role that allow CloudFormation to create/update/delete stack resources
    • 可以用 Stack Servie Role 去给 developer permission 去管理 resources (重要)
  • Achieve least privilege principle, must have iam::PassRole permission


CloudFormation DeletionPolicy

  • DeletionPolicy = Delete (默认)
    • Control what happens when the CloudFormation template is deleted or when a resource is removed from a CloudFormation template
  • Delete won’t work on S3 if bucket is not empty

  • DeletionPolicy = Retain (相当于申明什么要被保存)
    • Specify on resources to preserve in case of CloudFormation deletes

  • DeletionPolicy = Snapshot (会在删除前生成一个备份)
    • Create one final snapshot before deleting the resource


CloudFormation Stack Policy

  • A Stack Policy is a JSON document that defines the update actions that are allowed on specific resources during Stack updates
  • Protect resources from unintentional updates (重要)


CloudFormation Termination Protection

  • To prevent accidental deletes of CloudFormation Stacks, use TerminationProtection


CloudFormation Custom Resources

  • Used to define resources that are not supported or outside CloudFormation
  • Defined in template with Custom::MyCustomResourceTypeName
  • 可以处理之前提到的 S3 bucket 无法删除的问题


CloudFormation StackSets

  • Create, update, delete stacks across multiple accounts and regions
    • 注意, StackSets 是给多个账户, 或者多个 Region 去创造 CloudFormation stack
    • 只有 Administrator 才可以创建 StackSets
  • When update a stack set, all associated stack instance will be updated (重要)


17. SQS, SNS, Kinesis

AWS SQS

  • 用来 decouple applications (比如处理视频, 属于多对多模型)
  • SQS scale automatically (unlimited throughput, unlimited message, can retry)
  • SQS 最多只能保存 message 14 天, SQS 一个 message 最大只有 256 KB (重要)
  • SQS 一次最多只能 retrieve 10 个 message (重要)
  • Standard SQS 允许将 S3 作为 event notification destination, SQS FIFO 则不行
  • 处理 parallel 选 SQS 而不是 SNS
  • 处理 decouple microservice (但是没有 3rd party 的) 就是 SQS
  • 处理 high-throughput request-reponse message pattern 的, 选择 Temporary Queue
  • 遇到要处理 workflows that take a long time to complete, 选择 Dedicated worker environment

  • SQS - Producing Messages
    • Message is persisted in SQS until a consumer delete it (重要)

  • SQS - Consuming Messages
    • 可以有很多 consumer 来同时 parallel 处理 messages (重要)
    • Consumer delete messages after processing them (处理完就删掉)

  • 当遇到 SQS 而且需要 priority 的时候, Create two Amazon SQS standard queues, Set up Amazon EC2 instances to prioritize polling

  • 关于 SQS CreateQueue API 的 2 个知识点
  • The visibility timeout value of the queue is in seconds, default is 30 seconds
  • You can’t change the queue type after create it
  • 如果遇到 ApproximateNumberOfMessagesVisible 这个问题, 可以用 backlog per instance metric with target tracking scaling policy 来解决

SQS Message Visibility Timeout

  • Visibility Timeout is high: Consumer crash, re-process take time
  • Visibility Timeout is low: Get duplicate (防止 read duplicate, 增加 timeout, 重要)
  • Use the ChangeMessageVisibility API call to increase the visibility timeout (重要)
  • 题目大部分针对的都是 Visibility Timeout low 的情况


SQS Dead Letter Queues (DLQ)

  • Fail to process message within Visibility Timeout, message goes back to queue
    • 问题在于这个 message 可能本身就是无法处理的
    • 遇到 SQS message process failure, 选择 DLQ 来解决 (重要)
  • Set a threshold of how many times a message can go back to queue
    • After MaximumReceives threshold exceeded, message goes to DLQ
    • DLQ 很适合用来做 Debug
  • DLQ of a FIFO queue must also be a FIFO queue, same with Standard (形式得相同)

  • DLQ - Redrive to Source (相当于从 DLQ 修复完返回原来的 Queue)
    • Feature to help consume messages in DLQ to understand what is wrong
    • When the code is fixed, we can redrive the message from DLQ to source queue


SQS Delay Queues

  • Delay a message up to 15 minutes (使用 DelaySeconds 参数)
  • 遇到需要 SQS 去 postpone the delivery of new messages, 那么就是 Delay Queue


SQS Long Polling

  • Decrease latency & decrease API call (减少 API 请求, 更好的 performance, 重要)
    • Minimize the cost of using SQS (省钱)
    • Long Polling 不能处理 SQS duplicate, 还是要用 Visibility Timeout


SQS Extended Client

  • Handle send message greater than 256 KB (重要)
    • 这证明 SQS 的 message 也是有 size 的限制


SQS FIFO Queues

  • 按 order 传递 message (SNS 也可以做到)
    • By default, FIFO queues support up to 300 messages per second (重要)
  • 如果没有 GroupID, 那只能有 1 个 consumer, 如果有 GroupID, 可以有多个 consumer (重要)

  • 把 SQS 转成 FIFO queue
    • Delete the existing SQS and recreate it as FIFO queue
    • Make sure the name of the FIFO queue ends with .fifo suffix
    • Make sure the throughput for the FIFO queue not exceed 3000 meesage / second

SQS FIFO Queues Advanced

  • De-duplication: (使用 MessageDeduplicationID)
    • Send 2 same message within 5 minutes interval, the second will be refused

  • Message Grouping (使用 MessageGroupID)
    • Same MessageGroupID in the entire FIFO queue, then only have 1 consumer
    • Different MessageGroupID for a subset of message, have different consumers
    • 对于 FIFO queues 来说, MessageGroupID 可以保证 same message group 是有 order 的


AWS SNS

  • Send one message to many receivers (可以给用户发 email)
    • Publisher & Subscriber 模型 (属于一对多模型)
    • Subscriber: SQS, Lambda, Kinesis Data Firehose, HTTPS endpoints, Email
  • SNS Message Filtering: Could filter message based on topic (重要)


Fan Out Pattern

  • Push once in SNS, receive in all SQS (由 SNS 传递 message 给 SQS 来接收, 重要)
    • Fully decoupled, no data loss (重要, 因为它 decouple)
    • 可以用于 message filtering (信息过滤, 根据不同的 filter policy)
  • Kinesis 也可以使用 Fan Out Pattern (使用 Shard)


AWS Kinesis

  • Process, ingest, buffer streaming data in real-time (处理实时数据)
  • Kinesis Data Streams: Capture, process, store data streams
  • Kinesis Data Firehose: Load data streams into AWS data stores (S3, Redshift)
  • Kinesis Data Analytics: Analyze data streams with SQL or Apache Flink

Kinesis Agent cannot write to Amazon Kinesis Firehose for which the delivery stream source is already set as Amazon Kinesis Data Streams

  • Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams

Kinesis Data Streams

  • Have the ability to reprocess & replay stream data (处理数据, 重要)
    • 对于 injest data 来说, Kinesis Data Firehose 比 Kinesis Data Streams 好
    • 处理 real-time data stream, 比如 clickstreams, transactions, media (金融数据)
    • Consumer 有 Lambda, Kinesis Firehose, Kinesis Data Analytics
    • Kinesis Data Streams 有助于每秒从多个来源连续收集数 GB 的数据
    • Kinesis Data Streams 最多保存数据 1 年
  • Once data is inserted in Kinesis, it can’t be deleted (没法删除)
  • 如果问到 Kinesis Data Stream 加上 SQL, 那么答案里一定有 Kinesis Data Analytics

  • Data share the same partition goes the same shard (有顺序的)

  • Capacity Mode (一共 2 种)
    • Provisioned mode: Choose a number of shards provisioned (设置上限)
    • On-demand mode: No need to provision or manage capacity (自动 scale)


Kinesis Producers

  • Put data records into data streams
  • Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent

  • Use Hash function to determine which message should go to which Shard

  • 如果遇到 ProvisionedThroughputExceededException 问题
    • Over producing to a shard, get ProvisionedThroughputExceededException
    • 解决方案 1: Use highly distributed partition key
    • 解决方案 2: Use error retry and exponential backoff mechanism
    • 解决方案 3: Increase shards
    • 解决方案 4: Decrease the frequency or size of the requests


Kinesis Consumers

  • Get data records from data streams and process them
  • Consumers: Lambda, Kinesis Data Analytics, Kinesis Data Firehose, KCL 等等

  • Kinesis Consumers - Custom Consumer (重要)
    • Shared Fan-out Consumer: pull, low number of consuming applications, 省钱
    • Enhanced Fan-out Consumer: push, multiple consuming application, 贵
    • Enhanced Fan-out Consumer 可以 increase read throughput (重要)

  • Kinesis Consumers - AWS Lambda (重要)
    • Supports Classic & Enhanced fan-out consumers
    • Read records in batches, can configure batch size and window


Kinesis Client Library (KCL)

  • A Java library that helps read record from a Kinesis Data Stream
  • When using Kinesis Client Library, each shard is to be read-only by one KCL instance
  • Progress is checkpointed into DynamoDB (need IAM access)


Kinesis Operations

  • Shard Splitting: Deal with hot shard
    • Used to increase the Stream capacity, divide a hot shard

  • Merging Shards: Group two shards with low traffic
    • Decrease the Stream capacity and save costs


Kinesis Data Firehose

  • Kinesis Data Firehose load streaming data into data stores and analytics tools
    • 对于 injest data 来说, Kinesis Data Firehose 比 Kinesis Data Streams 好
    • Managed service, auto scaling, serverless, support data transformation
    • Kinesis Data Firehose 提供将数据流传进数据存储或者数据分析的功能
  • Near Real Time, 而且 Firehose 只支持一个 consumer (dump data in a single data repo)
    • 但是不要被 Near Real Time 忽悠, 要根据题目选择 Data Stream 或者 Data Firehose

  • Firehose 的对象是 S3, Redshift 这种 (Serverless, 专门处理 log 数据)
    • Firehose 不支持 DynamoDB 的

  • Kinesis Data Streams vs Firehose
    • Data Streams: Write custom code, real-time, have data storage, have replay
    • Firehose: Fully managed, near real-time, no data storage, no replay


Kinesis Data Analytics

  • Real-time analytics on Kinesis Data Streams & Firehose using SQL (重要)
    • Fully managed, auto-scaling, serverless
    • 以 Kinesis Data Streams & Firehose 作为目标使用 SQL 进行数据分析

  • Amazon Managed Service for Apache Flink
    • Use Flink to process and analyze streaming data


Data ordering for Kinesis

  • Ordering data into Kinesis
    • Same key (Partition) will always go to the same shard (有顺序)
    • 适用于需要处理大量 data 的情况

  • Ordering data into SQS
    • 相比之下 SQS 只有 FIFO (GroupID), 不然就没有 ordering
    • 适用于需要 dynamic number of consumers 的情况


18. AWS Mointoring & Audit

CloudWatch Metrics

  • CloudWatch Metrics 用于收集、存储和分析 AWS 资源和应用程序的性能指标
  • 可以用 PutMetricData 来 push custom metric data 到 CloudWatch 里
  • 记住, 如果是处理第三方的 API, 用的一定是 CloudWatch custom metrics (重要)

  • CloudWatch - EC2 Detailed Monitoring (重要)
    • With detailed monitoring, you get data every 1 minute


CloudWatch Custom Metrics

  • Define your own custom metrics to CloudWatch
  • High-Resolution Custom Metrics can have a minimum resolution of 1 second (重要)
  • Accepts metric data points two weeks in the past and two hours in the future


CloudWatch Logs

  • A place to store application logs in AWS (S3, Kinesis, Lambda, OpenSearch)
    • CloudWatch Logs 可以自己定义 expire 的时间, 默认是 never expire
    • Log Retention Policy defined at Log Groups level (重要)

  • CloudWatch Logs Insights (query engine)
    • Search and analyze log data stored in CloudWatch Logs (查询)

  • CloudWatch Logs Subscriptions
    • Get a real-time log events from CloudWatch Logs for processing and analysis
    • Subscription Filter: Filter which logs are events delivered to destination (重要)
    • Cross-Account Subscription: Send log events to resources in different AWS account


CloudWatch Agent

  • CloudWatch Logs Agent: Old version, only send to CloudWatch Logs
  • CloudWatch Unified Agent: 可以传递更多信息, 比如 CPU, RAM 等等
    • CloudWatch Unified Agent 可以把 EC2 的 log 传递给 CloudWatch (重要)
  • 可以把 CloudWatch Agent 装在 on-premise 上来收集信息


CloudWatch Logs Metric Filter

  • CloudWatch Logs Metric Filter 用于从日志数据中提取指标,以监控特定事件和模式
  • Filters only publish the metric data points for events that happen after the filter was created (只有 filter 生成后的 event 才会被 filter)


CloudWatch Alarms

  • CloudWatch Alarms are used to trigger notifications for any metric
  • CloudWatch Alarms 主要对象是 EC2, ASG, SNS (重要)
  • High resolution custom metrics 的情况下 triggered 最快是 10 秒 (重要)

  • Composite Alarms (监视当前所有 Alarm 的情况)
    • Composite Alarams are mointoring the states of multiple other alarms

  • 可以用 set-alarm-state CLI 来控制 CloudWatch Alarms, 这样是 cost-effective 的

CloudWatch Syntheics

  • Configurable script that monitor your APIs, URLs, Websites
    • Can run once or on a regular schedule


AWS EventBridge

  • Schedule CRON jobs CRON means jobs on a repeating schedule
  • React to events from SaaS application (AWS services)
    • 如果提到 3rd party application, 考虑 EventBridge

  • Event Bus
    • 可以 archive rvents 并且 replay archived events, good for debugging

  • Schema Registry: Generate code in advance for how data is structured in event bus
    • Analyze events in Event Bus and infer schema

  • Resource-based Policy
    • Manage permissions (allow / deny) for a specific Event Bus


AWS X-Ray

  • Automated Trace Analysis & Central Service Map Visulization
    • AWS X-Ray 是一个分布式跟踪系统,用于分析和调试生产和分布式应用程序的性能问题
    • X-Ray 可以 cross-account tracing and visualization (重要, tracing 和 visualization)
  • 如果问到要检查 microservices 5XX 的报错, 选择 X-Ray service, 而不是 CloudTrail service

  • How to enable X-Ray (两种方法)
    • Import the AWS X-Ray SDK in your code
    • Install the X-Ray daemon or enable X-Ray AWS Integration

  • X-Ray Troubleshooting (EC2 要在 Instance 上跑 X-Ray Daemon)
    • EC2: IAM Role has proper permission 和 Instace running X-Ray Daemon (重要)
    • Lambda: IAM Role RayWriteOnlyAccess 和 Enable Lambda X-Ray Active Tracing

  • 如果要在 Farget deployed 的 Docker 上跑 X-Ray daemon
    • Deploy the X-Ray daemon agent as a sidecar container
    • Provide the correct IAM task role to the X-Ray container
  • 如果要验证 X-Ray daemon 在 ECS 上跑, 用 AWS_XRAY_DAEMON_ADDRESS

X-Ray Advance

  • Instrumentation: Measure of product’s performance and write trace information
    • 检测应用程序涉及发送传入和传出请求以及应用程序内其他事件的跟踪数据

  • X-Ray Concepts (记下面 2 个)
    • Sampling: Decrease the amount of requests sent to X-Ray, reduce cost (省钱)
    • Annotations: Index traces and use with filters (重要, 有 filter)

  • X-Ray Sampling Rules (可以控制 record data 的数量, 重要)
    • With sampling rules, you control the amount of data that you record
    • X-Ray records first request each second, and five percent for additional requests
    • 上面重要的点: first request each second, addition requests 5 percent


X-Ray APIs

  • X-Ray Write APIs
    • PutTraceSegments: Upload segment documents to AWS X-Ray
    • PutTelemetryRecords: Used by AWS X-Ray daemon to upload telemetry
    • GetSamplingRules: Retrieve all sampling rules

  • X-Ray Read APIs: 全都都带有 Get
    • GetServiceGraph: Main graph
    • BatchGetTrace: Retrieve a list of traces specified by ID
    • GetTraceSummaries: Retrieve IDs and annotations for trace in a time frame
    • GetTraceGraph: Retrieve a service graph for one or more trace IDs


X-Ray with Beanstalk

  • 可以在 Elastic Beanstalk 使用 X-Ray daemon (用 console 或者 config file)
    • 需要 .ebextensions/xray-daemon.config 文件


AWS Distro for OpenTelemetry

  • AWS supported distribution of open-source project OpenTelemetry


AWS CloudTrail

  • AWS CloudTrail 是一项日志记录服务,记录和监控 AWS 账户中的 API 调用和相关活动
  • CloudTrail 是 Global Service, 如果东西被误删, 第一时间看 CloudTrail
  • CloudTrail 可以和 EventBridge 一起用去 Intercept API Calls

  • CloudTrail Events (一共 3 种)
    • Management Events: Performed on resources in AWS account
    • Data Events: S3 object-level activity
    • CloudTrail Insigths Events: Detect unusual activity (安全)

  • CloudTrail Insights (重要, 监控 unusual activity)
    • CloudTrail Insights to detect unusual activity in account

  • CloudTrail Event Retention (重要, 和 S3, Athena 有关)
    • Keep events to S3 after they stored 90 days and use Athena to analyze


19. Lambda

AWS Lambda

  • Virtual functions, serverless, limited by time (short execution of 15 min)
    • Run on-demand, scaling is automated
  • Could be Event-Driven, could handle CRON job
    • Use EventBridge to trigger Lambda every hour
  • Lambda 是有 account quota (配额限制) 的, 需要联系 AWS 来提高上限
  • Lambda environment variable 最大限制是 4KB, 没有数量的限制

  • 可以用 Lambda 去 connect private subnets in a VPC in your account (重要)
  • 大部分题目都是通过增加 memory 来 improve CPU-bound Lambda function peformance (重要)
  • API caching 是用来 reduce calls 的, 要 faster initialization 选择 provisioned concurrency

Lambda Synchronous Invocations

  • Synchronous: CLI, SDK, API Gateway, Application Load Balancer
    • Results is returned right away
    • Error handling must happen client side


Lambda with ALB

  • Expose a Lambda function as an HTTPs endpoint, use ALB or API Gateway
    • The Lambda function must be registered in a Target Group (重要)

  • ALB Multi-Header Values
    • HTTP headers and query string parameters are shown as arrays


Lambda Asynchronous Invocations

  • Asynchronous: S3, SNS, CloudWatch, EventBridge
    • The events are placed in a Event Queue
    • Lambda attemps to retry on errors (3 tries total)


Lambda Event Source Mapping

  • Work with Kinesis Data Streams, SQS (FIFO), DynamoDB Streams (重要)
    • Lambda function is invoked synchronously, 所以 SNS 就不能用 Event Source Mapping

  • Streasm & Lambda (Kinesis & DynamoDB)
    • An event source mapping creates an iterator for each shard, process items in order
    • Low traffic: Use batch window
    • High traffic: Multiple batches in parallel

  • Streasm & Lambda Error Handling
    • By default, if function returns an error, the entire batch is reprocessed until the function succeeds, or the items in the batch expire

  • Queue & Lambda (SQS & FIFO)
    • Event Source Mapping will poll SQS (Long Polling)
    • If use FIFO queues, Lambda will scale up to the number of active message groups


Lambda Event and Context Objects

  • 简单来说就是 lambda_hander 里面的参数 event 和 context
  • event: JSON document contains data for the function to process
  • context: Provides information about the invocation, function, runtime environment


Lambda Destinations

  • Can configure to send result to a destination (重要)
    • Asynchronous invocations: Define destinations for successful and failed event
    • Event Source Mapping: For discarded event batches
  • 注意, Lambda Destionation 可以是 DLQ, 这样可以直接用 SQS (重要)


Lambda Permissions

  • Lambda Execution Role (重要, 大部分 Lambda 的问题都和 Execution Role 有关)
    • Grant Lambda function permissions to AWS services (IAM Role)
    • Best practice: Create one Lambda Execution Role per function

  • Lambda Resource Based Policies
    • Use resource-based policies to give other accounts and AWS services permission to use your Lambda resources (感觉 Resource Base Policy 都是给跨账号权限的)


Lambda Environment Variables

  • 存 secrets 和 api keys 的
  • 可以用来 inject dynamic variables into Lambda function (重要)
  • 注意, 像 token 这种就不能放在 Environment Variable, 要在 deployment package .zip


Lambda Logging & Monitoring

  • Lambda execution logs are stored in AWS CloudWatch Logs
  • Lambda metrices are displayed in AWS CloudWatch Metrices

  • Lambda Tracing with X-Ray
    • Enable in Lambda configuration (Active Tracing)


CloudFront Functions & Lambda@Edge

  • Execute logic at the edge (Edge Function, serverless)
  • 两种方法: CloudFront Functions, Lambda@Edge (它们支持的语言不同)

  • CloudFront Functions
    • For high-scale, latency-sensitive CDN customizations (millions 级别的)
    • Used to change Viewer requestes and responses, short execution time

  • Lambda@Edge
    • Scales to 1000s of requests/second (thousands 级别的)
    • Used to change Viewer requestes and responses, longer execution time


Lambda in VPC

  • 正常情况下, Lambda 无法访问 VPC (所以要 Lambda in VPC)
  • 需要 VPC ID, Subnet 和 Security Groups, Lambda will create an ENI

  • Lambda in VPC - Internet Access (2 种方法)
    • Lambda function in your VPC does not have internet access
    • Deploying a Lambda function in public subnet does not give internet access
  • Need to deploy Lambda function in private subnet and give a NAT Gateway / Instance
  • Also can use VPC endpoints to privately access AWS services without NAT


Lambda Function Performance

  • Lambda Function Configuration
    • RAM: If application is CPU-bound (computation heavy), increase RAM
    • Timeout: Default 3 seconds, maximum is 900 seconds (15 minutes)

  • Lambda Execution Context
    • The execution context is a temporary runtime environment that initializes any external dependencies of your lambda code
    • The execution context includes the /tmp directory (重要)
    • /tmp directory 的最大容量是 10240 MB, default 是 512 MB

  • Initialize outside the handler (重要)
    • 防止 DB connection 被多次建立


Lambda Layers

  • Create Custom Runtime & Externalize Dependencies to re-use them


Lambda File Systems Mounting

  • Lambda functions can access EFS file systems if they are running in a VPC
    • Leaverage EFS Access Points


Lambda Concurrency

  • Can set a reserved concurrency to limit the number of concurrent execution
  • Each invocation over the concurrency limit will trigger a Throttle
    • If need a higher limit, open a support ticket
  • 对于处理 Asynchronous 的情况, Lambda 会把 event 返回到 queue 里
  • 可以 configure Auto Scaling to manage Lambda provisioned concurrency on a schedule
    • 比如要应对圣诞节这种情况
  • 如果是要解决 Lambda exceed concurrency limits, 选择 reserved concurrency

  • Cold Starts & Provisioned Concurrency
    • Cold Start: First request served by new instances has higher latency than rest
    • Provisioned Concurrency: Allocate concurrency before the function is invoked
    • Provisioned Concurrency 解决了 Cold Start 的问题 (重要)


Lambda External Dependencies

  • If Lambda function depends on external libraries
    • You need to install the packages alongside your code and zip together
  • 注意, 这里是把 functions 和 dependencies zip 在一起, 而不是分开 zip (重要)


Lambda Container Image

  • Deploy Lambda function as container images (也就是说 Lambda 和 Docker 是可以一起用的)
  • Pack complex dependencies, large dependencies in a container
  • To deploy a container image to Lambda, the container image must implement Lambda Runtime API
  • AWS Lambda service does not support Lambda functions that use multi-architecture container image


Lambda Versions and Aliases

  • Lambda Versions (重要)
    • When work on a Lambda function, work on $LATEST
    • When publish a Lambda function, create a version

  • Lambda Aliases (重要)
    • Aliases are pointers to Lambda function versions
    • Aliases enable Canary deployment by assigning weights to Lambda functions
    • Aliases cannot reference aliases


Lambda with CodeDeploy

  • CodeDeploy can help automate traffic shift for Lambda aliases
    • Linear: Grow traffic every N minutes until 100%
    • Canary: Try X percent then 100%
    • AllAtOnce: Immediate


Lambda Function URL

  • Dedicated HTTPS endpoint for Lambda function (不用 API Gateway 的做法)
    • 形式 https://<url-id>.lambda-url.<region>.on.aws


Lambda with CodeGuru

  • Gain insights into runtime performance of your Lambda function
  • CodeGuru creates a Profiler Group for your Lambda function


20. DynamoDB

AWS DynamoDB

  • NoSQL database, with replication across multiple AZs, auto scaling, no provision
  • 两种 Class: Standard & Infrequent Access (IA)
  • 如果问到 DynamoDB 而且是处理 Email 的, 就是 DynamoDB Stream
  • 如果问到 DynamoDB 而且是处理 unpredictable 数据的时候, 选择 On-Demand table
  • By default, DynamoDB tables are encrypted with AWS owned key (重要)
  • DynamoDB 没办法用 resource policy, 所以必须要用 IAM role + AssumeRole (重要)

  • Each table has a Primary Key, must be decided at creation time (2 种选择)
    • Partition Key (HASH): Partition Key must be unique for each item
    • Partition Key + Sort Key (HASH + RANGE): The combination must be unique
  • Item 的最大大小是 400 KB

  • DynamoDB 有 2 种 backup 方法, On-demand 和 Point-in-time recovery, 但它们只能把数据写到 S3 里, 没办法从 backup 里面去 access S3 buckets
  • 不要用 DynamoDB 去保存图片, 因为 DynamoDB 最大 item 大小是 400 KB
  • 如果要给 DynamoDB ready-only access, 用 IAM Role AmazonDynamoDBReadOnlyAccess
  • 如果要 reduce DynamoDB latency, 可以用下面 2 点
    • Consider using Global tables if your application is accessed by globally distributed users
    • Use eventually consistent reads in place of strongly consistent reads whenever possible

DynamoDB WCU & RCU

  • Read / Write Capacity Mode (2 种 Mode)
    • Provisioned Mode: 自己定义需要多少 RCU 和 WCU (默认)
    • On-Demand Mode: 自动 scale 需要的 RCU 和 WCU (贵)
  • RCU 和 WCU 没有关联吗, 可以只加 RCU 不加 WCU

  • R/W Capacity Modes - Provisioned
    • Read Capacity Units (RCU): throughput for reads
    • Write Capacity Units (WCU): throughput for writes
    • Throughput can be exceeded temporarily using Burst Capacity

  • Write Capacity Units (WCU)
    • One Write Capacity Unit represents one write per second for an item up to 1KB
    • 非常重要, 记住 WCU 是 1 比 1 的, size 需要 round up 到整数
    • 例子: 6 items with item size 4.5KB: 6 * 5 / 1 = 30 WCU

  • Strongly Consistent Read vs. Eventually Consistent Read (重要)
    • Eventually Consistent Read: might get stale data because of duplication (默认)
    • Strongly Consistent Read: will always get the correct data (贵)

  • Read Capacity Units (RCU)
    • One Read Capacity Unit represents one Strongly Consistent Read per second, or two Eventually Consistent Read per second, for an item up to 4 KB in size (重要)
    • 非常重要, 对于 Strongly Consistent Read 来说是 1 比 4 的, size 要 round up 到 4 的倍数
    • 非常重要, 对于 Eventually Consistent Read 来说是 2 比 4 的, size 要 round up 到 4 的倍数
    • 例子: 10 Strongly Consistent Reads with item size 4 KB: 10 * 4 / 4 = 10 RCU
    • 例子: 16 Eventually Consistent Reads with item size 12 KB: 16 /2 * 12 / 4 = 24 RCU

  • DynamoDB - Partitions Internal
    • Data is stored in partitions, use Partition Key to know which partition should go
    • WCUs and RCUs are spread evenly across partitions

  • DynamoDB - Throttling (重要)
    • Exceeded provisioned RCUs or WCUs, get ProvisionedThroughputExceededException
    • 原因: Hot Keys, Hot Paritions, Very large items (重要)
    • 解决方案: Exponential backoff, Distribute partition keys, use DynamoDB DAX (重要)

  • R/W Capacity Modes - On-Demand
    • Read / writes automatically scale up / down with workloads
    • Read Request Units (RRU): throughput for reads (same as RCU)
    • Write Request Units (WRU): throughput for writes (same as WCU)


DynamoDB Basic Operations

  • 注意, DynamoDB UpdateItem 是更新 attribute 或者新创建一个 (如果不存在)
  • Writing Data
    • PutItem: Create a new item or fully replace an old item
    • UpdateItem: Edit an existing item’s attributes or add new items if not exist (注意)
    • Conditional Writes: Accept a write/update/delete only if condition met

  • Reading Data
    • GetItem: Read based on Primary key
    • Query: Return items based on KeyConditionExpression, FilterExpression
    • Scan: Scan the entire table and then filter out data (可以 parallel, 重要)

  • Deleting Data
    • DeleteItem: Delete an individual item
    • DeleteTable: Delete the whole table and all its items

  • DynamoDB Batch Operations
    • Allow yout to save in latency by reducing the number of API cals
    • BatchWriteItemBatchGetItem

  • Table Cleanup (2 种方法)
    • Scan + DeleteItem 或者 Drop Table + Recreate Table
  • Copying a DynamoDB Table (3 种方法)
    • AWS Data Pipeline 或者 Backup + Restore 或者 Scan + PutItem or BatchWriteItem


DynamoDB Conditional Writes

  • For PutItem, UpdateItem, DeleteItem, BatchWriteItem
  • Specify a conditional expression to determine which item should be modified


DynamoDB Indexes

  • Local Secondary Index (LSI, 重要)
    • Alternative Sort Key for your table (注意这里是 Partition Key)
    • Must be defined at table creation time (重要)
    • Attribute Projections: Contain some or all attributes of the base table

  • Global Secondary Index (GSI, 重要)
    • Alternative Primary Key from the base table (注意这里是 Primary Key)
    • Must provision RCUs & WCUs for the index
    • Can be added or modified after table creation

  • Indexes and Throttling
    • GSI: If writes are throttled on the GSI, then the main table will be throttled
    • LSI: Use WCUs and RCUs of the main table, no throttling consideration
    • 重要, 是 GSI 会造成 throttling, LSI 不用考虑 throttling


DynamoDB PartiQL

  • Use a SQL-like syntax to manipulate DynamoDB tables (重要)


DynamoDB Optimistic Locking

  • A strategy to ensure an item hasn’t changed before update or delete it (重要)
  • DynamoDB Optimistic Locking 是 Concurrency Model, 而且它用 Conditional Writes (重要)


DynamoDB DAX

  • DynamoDB Accelerator (DAX, 可以处理 cache, 但是 DAX 不是 relational 的)
    • Help solve read congestion by caching (microseconds latency, 缓存)
    • DAX 不支持 SQL query caching
    • 可以提高 DynamoDB 的 performance (提高的是 read 而不是 write, 重要)


DynamoDB Stream

  • DynamoDB Stream: 目标是 Lambda 和 Kinesis Data Streams, 没有 SQS (注意)
    • Ordered stream of item-level modifications in table (重要)
    • 用来处理 Stream 的, 可以 Invoke Lambda function (比如发邮件)

  • DynamoDB Streams are made of shards, like Kinesis Data Streams
  • Records are not retroactively populated in a stream after enabling it


DynamoDB TTL

  • Automatic delete items after an expiry timestamp (定时删除 DynamoDB 里的 item)
  • Expired items deleted within 48 hours of expiration


DynamoDB CLI

  • projection-expression: Select a subset of attribute to retrieve (subset of attribute)
  • filter-expression: Filter items to retrieve a subset of the items (subset of items)
  • Minimize the items returned in CLI: max-items & starting-token
  • 注意, 如果问到和 attribute 有关的就是 projection-expression


DynamoDB Transcations

  • Coordinated operations to multiple items across one or more tables (金融, 游戏)
  • Provides Atomcity, Consistency, Isolation, and Durability (ACID)
  • Consume 2 times of WCUs & RCUs (注意, DynamoDB Transcation 消耗 2 倍 WCU 和 RCU)
  • Two operations: TransactGetItemsTransactWriteItems

  • DynamoDB Transactions Capacity Computation
    • 3 Transcational writes with item size 5 KB: need 3 _ 5 / 1 _ 2 = 30 WCUs
    • 5 Transcation reads with items size 5 KB: need 5 _ 8 / 4 _ 2 = 20 RCUs


DynamoDB Session State

  • To use DynamoDB to store the session state
  • ElastiCache is in-memory, DynamoDB is serverless (both are key/value pair)


DynamoDB Write Sharding

  • A strategy that allow better distribution of items evenly across partitions
    • To solve the issue of Hot Partition
  • Add a suffix to Partition Key value


DynamoDB Other Features

  • Backup and Restore: Point-in-time recovery (PITR) like RDS
  • Global Tables: Multi-region, fully replicated, high performance
  • DynamoDB Local: Develop and test app locally with out accessing internet

  • DynamoDB Fine-Grained Access Control
    • Assign IAM Role to users with a Condition to limit their API access DynamoDB


21. API Gateway

AWS API Gateway

  • Invoke Lambda function, expose REST API (stateless client-server communication)
  • Lambda + API Gateway = No infrastructure to manage
  • API Gateway 可以防止 API overwhelmed by too many requests (防抖)
  • API Gateway Caching (可以 improve latency, 重要)
    • With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API
  • 注意, API Gateway 不 support STS, 可以用 Congito User Pool, IAM permissions with sigv4 和 Lambda Authorizer

  • API Gateway Endpoint Types
    • Edge-Optimized (默认): For global clients, API Gateway live in one region
    • Regional: For client in same region
    • Private: Can only be accessed from VPC using ENI

  • API Gateway 可以用 Cognito User Pools, AWS IAM roles and policies 和 Lambda Authorizer 来进行 authenticate

API Gateway Stages

  • Making changes in the API Gateway does not mean they’re effective (重要)
    • You need to make a “deployment” for them to be in effect

  • API Gateway - Stage Variables
    • Stage variables are like environment variables for API Gateway
    • Use case: Create a stage variable to indicate the Lambda alias (重要)
    • 如果要把 test promote 成 prod, 而且当前 prod 存在, 直接 update stage variable 就行


API Gateway Canary Deployment

  • Choose the % of traffic the canary channel receives
    • Canary deployment: 逐步发布新版本,先小规模测试,再全面推广的策略


API Gateway Integration Tyeps

  • MOCK
    • API Gateway returns a response without sending the request to backend
  • HTTP / AWS
    • Configure both the integration request and integration response
    • Set up data mapping using mapping templates for request & response

  • AWS_PROXY (Lambda Proxy)
    • Incoming request from the client is the input to Lambda
    • No mapping template, headers, query string parameters are passed as arguments

  • HTTP_PROXY
    • No mapping template, the HTTP request is passed to the backend


API Gateway Mapping Templates

  • Mapping templates can be used to modify request / responses
    • 可以用来屏蔽 output data 中的某些字段
  • Modify query string parameters, modify body content, add headers


API Gateway OpenAPI

  • Common way to define REST APIs, using API defintion as code
  • Request Validation: reduce unnecessary calls to the backend


API Gateway Caching

  • Caching reduces the number of calls made to the backend
  • Caches are defined per stage, default TTL is 300 sec, max TTL is 3600 sec (重要)
  • Client can invalidate the cache with header: Cache-Control:max-age=0 (重要)


API Gateway Usage Plans & API Keys

  • Make an API availiable as an offering to your customers (重要)
  • Usage Plan: who can access, how much and how fast can access, use API keys to identify
  • API Keys: use with usage plans to control access

  • To configure a usage plan
    • Create one or more APIs, configure methods to require API key, deploy API to stage
    • Generate or import API keys to distribute to application developer
    • Create the usage plan with desired throttle and quota limits
    • Associate API stages and API keys with the usage plan


API Gateway Monitoring

  • CacheHitCount & CacheMissCount: efficiency of the cache
  • IntegrationLatency & Latency: 用来测 timeout issue
  • 4XX means Client errors, 5XX means Server errors, 429 是 too many requests


API Gateway CORS

  • CORS must be enabled when you receive API calls from another domain
  • CORS 也可以用来限制一些 domain 来防止它们 access API


API Gateway Security

  • IAM Permissions (for within your AWS account)
    • Authentication is IAM, Authorization is IAM Policy
    • Create an IAM policy authorization and attch to User or Role
    • If need Cross Account Access, use Resource Policies

  • Cognito User Pools (for your own user pool, 重要)
    • Authentication is Cognito User Pools, Authorization is API Gateway Methods
    • API Gateway verifies identity automatically from AWS Cognito

  • Lambda Authorizer (use 3rd party tokens, 重要)
    • Authentication is External, Authorization is Lambda function
    • Token-based authorizer, JWT or Oauth
    • Lambda Authorizer 也可以用来 implement authorization scheme that uses request parameters to determine the caller’s identity (重要)


API Gateway WebSocket API

  • WebSocket APIs are often used in real time applications such as chat or trading
    • WebSocket is two ways commmunication
  • 形式 wss://[uniqueid].execute-api.[region].amazonaws.com/[stage-name]

  • WebSocket API - Routing
    • Request a route selection expression to select the field on JSON to route from


22. AWS CICD

AWS CodeCommit

  • A version control tool that helps to understand the changes happened to the code
  • CodeCommit is a Private Git repository (就是个 Github, 但是私人的)
  • 如果问到 migrate repository 到 CodeCommit 使用 HTTPS, 那么用到的是 Git credentials generated from IAM (重要)

  • CodeCommit 可以使用 Git credentials, SSH Keys, AWS Access Keys
  • Data in AWS CodeCommit repositories is encrypted in transit and at rest

AWS CodePipeline

  • CodePipeline 是一种自动化 CICD 服务, 用于构建、测试和部署应用程序和基础设施
    • 如果看到 orchestrate CICD, 就是 CodePipeline
  • 可以 create one CodePipline for entire flow and add a manual approval step

  • Each pipeline stage can create artifacts
  • Artifaces stored in an S3 bucket and passed on to the next stage


AWS CodeBuild

  • CodeBuild 是一种完全托管的持续集成服务, 用于自动化构建、测试和生成软件包
  • CodeBuild 算是 Jenkins 的替代品
  • Build instructions: Code file buildspec.yml (重要) or insert manually in Console
  • Automated tests on application before the deployment process (遇到需要 test 的就是 CodeBuild, 重要)
  • CodeBuild scales automatically, 所以遇到什么 scaling 和 run build parallel 都不用担心
  • 如果需要 troubleshoot CodeBuild, 可以 run CodeBuild locally using CodeBuild Agent

  • buildspec.yml file must be at the root of your code (重要)
  • 记住, CodeBuild 可以用 KMS key 来加密 build artifacts (重要)
  • 可以使用 CodeBuild timeouts 去防止过长的 building process
  • We bundle dependencies in the source code during the build stage of CodeBuild


AWS CodeDeploy

  • CodeDeploy 是一种自动化部署服务, 用于在各种计算环境中部署应用程序代码
  • Automated rollback in case failed deployment (可以自动 roll back 如果部署失败)
    • If a rollback happens, CodeDeploy redeploys the last good revision
  • A file named appspec.yml defines how the deployment happens
  • 如果问到 CodeDeploy 和 EC2, 那么就只有 in-placeblue/green
    • blue/green 可以 re-route traffic from original environment to new environment
  • 如果遇到要 archive number of applicaiton revision, 用 CodeDeploy Agent
  • 如果 CodeDeploy 失败并且 rollback, A new deployment of the last known working version of the application is deployed with a new deployment ID

  • CodeDeploy - EC2 / On-premises Platform (需要 Agent)
    • Perform in-place deployment or blue/green deployments (非常重要)
    • Must run the CodeDeploy Agent on the target instances (重要)
    • Order of Lifecycle Events: ApplicationStop, DownloadBundle, BeforeInstall, Install, AfterInstall, ApplicationStart, ValidateService
    • 四种 deployment speed: AllAtOnce, HalfAtATime, OneAtATime, Custom

  • CodeDeploy - Lambda Platform
    • Automate traffic shift for Lambda aliases

  • CodeDeploy - ECS Platform
    • Automate the deployment of a new ECS Task Definition

  • 如果要 deploy application to different EC2 Instance at different time, 用
    CodeDeploy Deployment Groups

AWS CodeStar (CodeCatalyst)

  • Quickly create CICD-ready projects for EC2, Lambda, Elastic Beanstalk
  • One dashboard to view all your componenets (重要, 和 dashboard 有关)


AWS CodeArtifact

  • CodeArtifact is a secure and cost-effective artifact management for software development (类似 npm)
  • Developers and CodeBuild can retrieve dependencies straight from CodeArtifact

  • CodeArtifact Resource Policy (Resource Policy 就是允许别人访问)
    • Authorize another account to access CodeArtifact


AWS CodeGuru

  • An ML-powered service for code reviews and performance recommendations (重要)
  • CodeGuru Reviewer: Automated code reviews for static code analysis (development)
  • CodeGuru Profiler: Recommendations about application performance (production)


AWS Cloud9

  • Cloud-based IDE, 类似 VSCode


23. AWS Serverless Application Model (SAM)

AWS SAM

  • SAM = Serverless Application Model
  • A framework for developing and deploying serverless applications
  • 可以将 AWS SAM 模板直接部署到 AWS CloudFormation
  • 使用 AWS Serverless Application Repository (SAR) 来共享 SAM 和其他 AWS 账户 (重要)
    • 如果想找 pre-built serverless applications 就用 SAR
  • Develop the SAM template locally => upload the template to S3 => deploy your application to the cloud (注意)

  • SAM Recipe
    • Transform Header for SAM template: AWS::Serverless-2016-10-31
    • 注意, Transform 对于 SAM 来说是 mandatory 的
    • SAM supports the following resource types: Function, Api, SimpleTable (重要)
    • Upload SAM 到 AWS 用到的是 sam deploy

  • SAM Accelerate sam sync
    • A set of features to reduce latency while deploying resources to AWS


SAM Policy Templates

  • List of templates to apply permissions to Lambda Functions
    • S3ReadPolicy: Give read only permission to objects in S3
    • SQSPollerPolicy: Allow to poll an SQS queue
    • DynamoDBCrudPolicy: create, read, update, delete


SAM Local Capabilities

  • SAM Local 用于本地模拟、测试和调试无服务器应用程序, 支持 Lambda 和 API Gateway
  • Locally start AWS Lambda (记住这里都是 SAM CLI + AWS Toolkits)
  • Locally invoke Lambda Function

  • Locally start an API Gateway Endpoint
  • Generate AWS Events for Lambda Functions


24. Cloud Deployment Kit (CDK)

AWS Cloud Development Kit

  • Define your cloud infrastructure using a familiar language (Java, Python)
  • Can deploy infrastructure and application runtime code together

  • CDK 和 SAM 的不同 (虽然它们都用 CloudFormation)
    • SAM 主要关注的是 Serverless 和 Lambda, 而且只能用 JSON 或者 YAML 的形式
    • CDK 可以用在所有 AWS services 上, 可以用所有编程语言

  • 注意, CDK 才会提供 app template, 而不是 CloudFormation (重要)

25. Cognito

AWS Cognito

  • AWS Cognito 是一种用户身份管理服务, 用于安全地添加用户注册、登录和访问控制功能
    • Cognito User Pools: Sign in for app, integraet with API Gateway & ALB
    • Cognito Identity Pools: Integrate with Cognito User Pools as identity provider
  • CUP + CIP = authentication + authorization

  • Cognito Sync: Enable cross-device syncing of application related user data

Cognito User Pools (CUP)

  • 与 API Gateway 和 Application Load Balancer 集成 (重要)
    • 注意, 是和 ALB 集成, 而不是 CloudFront (重要)
  • After a successful login using Cognito User Pools, it sends a JWT token (重要)
  • Adaptive Authentication
    • Block sign-ins or require MFA if the login appears suspicious (重要)


Cognito Identity Pools (CIP)

  • Get identities for users so they obtain temporary AWS credentials
  • 如果需要 integrate user-specific file upload and download feature, 使用 IAM policy with AWS Cognito identity prefix to restrict users to use their own folders in S3


26. Step Functions & AppSync

AWS Step Functions

  • AWS Step Functions 是一种可视化工作流服务, 用于协调和管理分布式应用程序和微服务
  • 有两种 Workflows: Standard 和 Express (Express 里有 Sync 和 Async)
    • Standard 适合 long-running, durable, and auditable workflow
    • Express 适合 high event rates and short duration workflow
  • Task States: Do some work in the state machine (比如 invoke Lambda function)

  • Parallel State: Begin parallel branches of execution

  • A Task state ("Type": "Task") represents a single unit of work performed by a state machine (重要)

Step Functions Error Handling

  • Any state can encounter runtime errors for various reasons
  • Use Retry and Catch in the State Machine to handle error instead of application code


Step Functions Wait for Task Token

  • Allow to pause Step Functions during a Task until a Task Token is returned
  • Append .waitForTaskToken to the Resource field


Step Functions Activity Tasks

  • Task performed by an Activity Worker (Acitvity Worker 可以在 EC2, Lambda 上跑)
  • After Activity Worker complete its work, it sends a response back to Step Functions


AWS AppSync

  • AppSync is a managed service that use GraphQL (重要)
  • Retrieve data in real-time with WebSocket or MQTT on WebSocket (重要)


AWS Amplify

  • A set of tools to create mobile and web applications (就类似 firebase, 重要)
    • Authentication: leverage AWS Cognito
    • Datastore: leverage AWS AppSync and DynamoDB


27. Advanced Identity

AWS STS

  • AWS STS (Security Token Service) 是一种用于临时生成访问密钥以访问 AWS 资源的服务
    • Allow to grant limited and temporary access to AWS up to 1 hour
  • 注意, 遇到 decode-authorization-message 的就是 STS
  • STS 的有效期从 15 分钟到 1 小时, 有效期过后需要 renew


Advanced IAM

  • Evaluation of Policies (重要)
    • 先查是不是 Deny, 是就直接 Deny,不是就查是不是 Allow, 是就直接 Allow, 否则直接 Deny

  • IAM Policies & S3 Bucket Policies
    • 需要同时 evaluate IAM 和 S3 的 Polciy, 只要两个里面没有 Deny 就行

  • Dynamic Policies with IAM (重要)
    • Give IAM policy to a specific user, using ${aws:username} (dynamic variable)


IAM Access Analyzer

  • IAM Access Analyzer simplifies inspecting unused access to guide you toward least privilege (重要, 可以 remove 一些不需要的 IAM Roles)
  • IAM Access Analyzer also lets you identify unintended access to your resources and data, which is a security risk (减少 security issues)


AWS Directory Services (AD)

  • AWS Directory Services (AD) 是一种托管服务, 用于在 AWS 云中运行 Microsoft Active Directory,简化身份验证和资源管理
    • AWS Managed Microsoft AD: AD in AWS, supports MFA (重要,AD 在 AWS )
    • AD Connector: Redirect to on-premise AD, supports MFA (重要, AD 在 on premise)
    • Simple AD: AD-compatible managed directory on AWS


28. AWS Security & Encryption

Encryption 101

  • Encryption in flight
    • Data is encrypted before sending and decrypted after receive (重要)
  • Server-side encryption
    • Both encryption and decryption happen on the server (重要)
  • Client-side encryption
    • Data is encrypted by client and never decrypted by the server (重要)

IAM policy to enforce SSL request to objects stored in S3: aws:SecureTransport


AWS KMS

  • Anytime you hear encryption for an AWS service, it’s most likely KMS
  • AWS managed encryption keys for us, 比如 EBS, S3, RDS, SSM (不需要自己创建)
  • KMS Keys are scoped per Region, Automatic Key Rotation is 1 year
  • 注意, KMS 并不适合保存 secret, 加密的东西不一定是 secret

  • Types of KMS Keys (3 种): AWS Owned, AWS Managed, CMK (Custom Managed)
    • KMS stores the CMK and receives data from client, which it encrypt and send back
  • 两种 Key 的形式: Symmetric (Single) & Asymmetric (Public & Private)

  • KMS Key Policies
    • KMS Key Policy 可以用来管理 KMS CMK (重要)
    • Custom KMS Key Policy: Define who can access the key (Cross Account Access)

Deleting an AWS KMS key in AWS Key Management Service (AWS KMS) is destructive and potentially dangerous. Therefore, AWS KMS enforces a waiting period. (Pending state)


KMS Envelope Encryption

  • KMS Encrypt API call has a limit of 4 KB (重要)
  • If encrypt > 4 KM, we need to use Envelope Encryption (GenerateDataKey API)
  • Envelope Encryption 是 reference data as file within the code

  • KMS Symmetric API Summary
    • Encrypt: encrypt up to 4KB of data through KMS
    • GenerateDataKey: generates a unique symmetric data key (DEK)
    • GemerateDataKeyWithoutPlaintext: generate a DEK to use later
    • Decrypt: decrypt up to 4 KB of data


KMS Limits

  • KMS Request Quotas: Exceed a request quota, get a ThrottlingException
  • Request a Requst Quotas increase through API or AWS support


AWS CloudHSM

  • AWS CloudHSM 是一种托管硬件安全模块服务, 用于安全存储和管理加密密钥
    • We have to manage the encryption keys entirely


SSM Parameter Store

  • SSM Parameter Store 是一种安全的存储服务, 管理配置数据和密钥, 如密码和数据库字符串
    • 比起 Secrets Manager 有更广的用途, 比如 URLs, AMI IDs, License keys 等等
    • SSM 没有 Automatic key rotation (非常重要)
  • Have built-in verion tracking (每次 edit secret 都会被记录, 重要)

  • 可以将 secret 储存成 SecureString 在 SSM Parameter Store 里 (重要)

AWS Secrets Manager

  • Store secrets, integrated with RDS & Aurora (非常重要, 存数据库 secrets 的)
    • 是给 confidential information (like database credentials, API keys) 用的
    • 比起 SSM Parameter Store, Secrets Manager 支持 Key 的轮换 (90 天)
    • 注意, Secrets Manager 的 Key Rotateion 是 90 天, KMS 的 Key Rotateion 是一年
  • 比起 KMS, Secrets Manager 更适合去保存 secret, 比如 database credential, 而且 Secrets Manager 也有 Automatic key rotation

  • Multi-Region Secrets (和 Multi-Region Key 类似)
    • Replicate Secrets across multiple AWS Regions (disaster recovery)


CloudWatch Logs Encryption

  • Encrypt CloudWatch Logs with KMS keys
    • Encryption is enabled at the log group level, by associating a CMK
  • associate-kms-key: if log group already exists (重要, CloudWatch 存在的情况)
  • create-log-group: if the log group doesn’t exist


AWS Nitro Enclaves

  • Process highly sensitive data in an isolated compute environment


29. Other AWS Services

AWS SES (Simple Email Service)

  • Managed service to send email securely (发邮件的)


AWS OpenSearch

  • With OpenSearch, you can search any field, even partially matches
    • 原来叫做 ElasticSearch (就是做查询的)


AWS Athena

  • Athena is an query service that analyze data in Amazon S3 using standard SQL
    • Athena 支持 SQL query 去处理 S3 数据的
    • Athena cannot be used to analyze data in real time (没办法实时处理数据, 重要)
  • Use Athena to process logs, perform ad-hoc analysis, and run interactive queries (重要)
  • Use columnar data for cost-saving (省钱, less scan)

  • Federated Query
    • Allow to run SQL queries across data stored on AWS or On-Premise


AWS MSK

  • Managed Apache Kafka on AWS (Have Serverless)
    • Kinesis 的代替 (同样处理 Stream data), 但是针对 Apache Kafka


AWS Certificate Manager (ACM)

  • AWS Certificate Manager (ACM) 是一种管理 SSL/TLS 证书的服务, 用于安全地保护和管理网站和应用程序的通信
    • 如果是 third party SSL 就没办法使用 automatic certificate rotation
  • 可以用 EventBridge 来检查 ACM Certificates 是否过期 (过期 Invoke SNS)


AWS Macie

  • Use ML to protect sensitive data (PII) in AWS (用 ML 保护敏感信息)


AWS AppConfig

  • Configurate, validate, and deploy dynamic configurations


附录

请勿随意修改, 谢谢