前言

一些我整理的资料:


知识点

1. AWS Cloud Overview

  • Global Services: IAM, Route53, CloudFront, WAF 等等

AWS Regions

  • A Region is a cluster of data centers
  • 考虑 Compliance (合规), Proximity (临近用户), Available Services 和 Pricing


AWS Availability Zone (AZ)

  • 一个 Region 可以有 3 - 6 个 AZ (每个 AZ 都是分开的)
  • To coordinate Availability Zones across accounts, you must use the AZ ID (注意)
    • A 在 us-west-2a 和 B 在 us-west-2a 是不一样的


AWS Edge Locations

  • Deliver content to user with low latency (离用户越近, deliver 速度越快)


2. IAM

AWS IAM

  • Root Account: Created by default, 不要和别人分享
  • Users: People within the organization
  • Groups: Only contain Users, not other Groups (User 可以属于多个 Group)


IAM Permissions

  • 定义了 User 或者 Group 的 Permission
  • 记得遵守 Least Privilege Principle


IAM Polices

IAM policies define permissions for an action regardless of the method that you use to perform the operation.

  • Statements 里面必须包含: Effect, Principal, Action, Resource
    • Inline Policy 是 assign 给个人的
  • 注意, The only resource-based policy IAM support is Trust policy


IAM MFA

  • Protect Root Accounts and IAM Users
  • MFA options: Virtual MFA Device, U2F Sercurity Key (USB)


IAM CLI & SDK

  • 可以用 AWS Management Console, AWS CLI, AWS SDK 来访问 AWS
  • AWS CLI, AWS SDK 由 Access Keys 保护
    • Set the DeleteOnTermination attribute to False using the command line
    • 如果考试遇到不会的就选 CLI


IAM Roles

  • 授权 AWS 服务去做某些事 (比如 EB 需要 EC2 Role 和 Service Role)
  • 常见的 Role: EC2 Instance Role, Lambda Function Role 等等


IAM Security Tools

  • IAM Credentials Report: 返回用户数据和他们的 credentials (凭据)
  • IAM Access Advisor: 显示当前用户被授权的服务和使用时间


IAM Extra

  • Trust Policy (唯一的 resource-based policy)
    • Trust policies define which principal entities can assume the role.
  • AWS Organizations Service Control Policies (SCP, 针对 Organization 的)
    • SCPs are JSON policies that specify the maximum permissions for an organization or organizational unit (OU)
  • Access Control List (ACL, 管理别人的 account 的)
    • Access control lists (ACLs) are service policies that allow you to control which principals in another account can access a resource
  • Permissions Boundary (管理一个 account 最多有什么权限)
    • A permissions boundary is an advanced feature for using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity.

3. EC2 Fundamentals

AWS EC2

  • 属于 Infrastructure as Service (IaaS) , 绑定 AZ
  • User Data: EC2 启动的时候运行的代码 (比如往 EC2 Instance 里面装 Appache)
    • By default, user data runs only during boot cycle when first lauch instance
    • By default, scripts entered as user data are executed with root user privileges



EC2 Instance Type

  • 一共 7 种 EC2 Instance Type, 但 4 种用的最多
    • General Purpose, Compute Optimized, Memory Optimized, Storage Optimized
  • General Purpose: Balanced (平衡的)

  • Compute Optimized: High performance (处理 Batch, 视频和 HPC)

  • Memory Optimized: Process data in memory (处理 In memory database, cache)

  • Storage Optimized: Read and write data on local storage (OLTP)


EC2 Security Groups

  • Control how traffic is allowed into or out of EC2 instance (类似防火墙)
    • Can be attached to multiple instances (可以复用)
    • Locked down to Region / VPC
  • 可以作为 Security Group 的 Inbound Rule 有
    • IP address, CIDR block, Security Group



EC2 Instance Purchasing Options

  • On-Demand Intances: Short workload, pay by second (短期内紧急使用)
  • Reserved: 1 - 3 years (72% discount, reserve capacity in an AZ)
    • Reserved Instance: Long workload (1 - 3 年, discount 随年份增加)
    • Convertible Reserved Instances (RI): Long workload with flexible instance (66% discount, 但是可以自由改变 Instance Type)
  • Saving Plans: 1 - 3 years, commitment to an amount of usage (72% discount, 要定一个使用上限, 超过上限就变成 On-Demand)
    • 还有 Compute Saving Plan, 66% discount, work with Lambda and Fargate
  • Spot Instance: Short workload, can lose instance (最便宜, 90% discount)
  • Dedicated Host: The physical server is yours (最贵, 可以控制 how Instance placed)
  • Dedicated Instance: Run hardware, but not lockdown to you (服务器大部分是你的)
  • Capacity Reservation: Reserve capacity in AZ (比如一天中只有几个小时需要 EC2)


EC2 Spot Instance

  • 只能在 open, active 或者 disabled 的状态下 cancel Spot Instance
  • 需要先 cancel Spot Instance, 然后再 terminate (Cancel 不代表 Terminate)
  • Spot Instance 有 one-time 和 persistent 两种 request type

  • Spot Fleets: Automatically request Spot Instance with lowest price (Spot + On-Demand)
    • 拿到最便宜的 Spot Instances
  • Spot Fleets are set to maintain target capacity (可以保证一定数量的 Instance)


4. EC2 SAA Level

Elastic IP

  • Public IP: Unique across the whole web
  • Private IP: Unique across the private network
  • Elastic IP: Fixed Public IP for EC2 instance (不会改变)
    • 为什么要 Elastic IP, 因为 EC2 的 Public IP 会在重启后改变
    • 而且 Elastic IP 可以节省 EC2 的开支, Private IP 也可以 (在 private internet 条件下)


EC2 Placement Groups

  • Cluster: Low-latency, single AZ, 网速快但 AZ 可能 fail (处理 HPC)
  • Spread: High availability, critical application, 风险小但有限制, 每个 AZ 最多 7 个实例
    • 处理 small number of critical instances that need seperate from each other
  • Partition: Partition instances across many partitions within an AZ, 7 partitions per AZ, up to 100 instances, span multiple region (Big Data, 风险小, 限制低)

  • Cluster: Low-latency, single AZ, 网速快但 AZ 可能 fail (适合 HPC)

  • Spread: High availability, critical application, 风险小但有限制, 每个 AZ 最多 7 个实例

  • Partition: Spreads instances across many partitions within an AZ, 7 partitions per AZ, up to 100 instances, span multiple region (Big Data, 风险小, 限制低)


Elatic Network Interfaces (ENI)

  • Virtual network card in a VPC
  • 处理 EC2 Instance failover (故障转移), 绑定 AZ (不可以 attach 到其他 AZ 上)


EC2 Hibernate

  • EC2 Hibernate: Make EC2 instance boot faster (不是 stoped, 而是 hibernated)
  • EC2 Instance Root Volume type must be an EBS volume


5. EC2 Instance Storage

AWS EBS

  • Block-level storage, 类似 network drive, 绑定 AZ, provisioned capacity
    • By default, Root Volume will be deleted on termination (重要)
    • By default, other EBS volume will not be deleted on termination (重要)
  • 可以从一个 EC2 Instance 取下来装到另一个上
  • EBS 是绑定 AZ 的 (重要)


EBS Snapshots

  • Make a backup of EBS volume (备份 EBS)
  • 可以将这个 snapshot 用在其他 AZ 或者 Region 上

  • EBS Snapchot Archive: 便宜 75%
  • Recycle Bin: Setup rules to retain deleted snapshots (可以 recover)


AWS AMI

  • AMI: Amazon Machine Image (Customization of an EC2 Instance)
    • 可以更好的管理和启动 EC2 Instance
  • Built for a specific region and can copy across regions (AMI 跨域复制)

When the new AMI is copied from Region A into Region B, it automatically creates a snapshot in Region B (注意, 跨域 copy AMI 会产生 snapshot)



EC2 Instance Store

  • Block-level storage, 物理硬盘 (Physical drive, temporary storage)
  • High random I/O performance, good for cache & buffer at low cost (重要)
  • EC2 Instance Store lose when stopped (Ephemeral, 数据在 Instance Store 关闭时消失)

  • You can’t detach an Instance Store volume from one instance and attach it to a different instance
  • If you create an Amazon Machine Image (AMI) from an instance, the data on its instance store volumes isn’t preserved

EBS Volume Types

  • General Purpose (gp2 / gp3): Balanced, cost-effective
  • Provisioned IOPS (io1 / io2): High-performance, support Multi-Attach
    • 记住, io 的比 gp 的贵, 但是 io 的 IOPS 也比 gp 高
  • Hard Disk Drives (HDD): Data intensive (st1) 或者 less frequent access (sc1)
    • HDD 不可以用来创建 EC2
  • 上面这些都没有超过 300,000 IOPS


EBS Multi-Attach

  • Attach same EBS volume to multiple EC2 instance in same AZ (记住是相同的 AZ)
  • Only for io1 / io2 family (Provisioned IOPS SSD, 重要)


EBS Encryption

  • EBS Encryption (所有环节都是 encrypted 的)
    • Data at rest is encrypted & Data in flight is encrypted & Snapshot is encrypted

  • 首先 create snapshot, 然后 encrypt snapshot, 然后创建一个新的 EBS Volume (从 encrypted snapshot), 然后将这个 volume 加入到原来的 Instance 上


EBS RAID

  • RAID 0 (看重 I/O performance)
    • Use RAID 0 when I/O performance is more important than fault tolerance
  • RAID 1 (看重 falut tolerance)
    • Use RAID 1 when fault tolerance is more important than I/O performance


AWS EFS

  • NFS (Network file system), 可以 mount 到多个 EC2 Instance
    • Scale automatically, no capacity planning
  • EC2 Instances can access files on an EFS file system across AZs, Regions and VPCs
    • 重要, 可以从不同的 AZ, Region, VPC 来 access
  • 可以用 VPC Security Group 或者 IAM Policy 来 control access to EFS
  • EFS Infrequent Access (如果遇到 POSIX compliant file storage)

  • NFS 有 2 种 Performance Mode
    • General Purpose: Default (web server)
    • Max I/O: High latency, throughput, parallel (Big Data, media process, 重要)
  • NFS 有 3 种 Throughput Mode (重要)
    • Bursting: 很快
    • Provisioned: 设置上限 (high throughput, 处理大量文件 migrate 时候使用)
    • Elastic: 根据 workload 来 scale

EFS 适合存储大型文件, 从价格的角度来看, 存 100 GB 的文件, EBS > EFS > S3 (价格)
EFS 并不是永远最贵的, 但 S3 永远是最便宜的


6. ELB & ASG

Scalability & High Availability

  • Vertical (竖直) Scale: Increase the size of the Instance (但是有 hardware limit)
  • Horizontal (水平) Scale: Increase the number of the Instance
  • High Availability: Rrunning application in at least 2 AZ (Disaster recovery, DR, 重要)


AWS ELB

  • Foward traffic to multiple EC2 Instances (流量控制)
    • High Availability across AZs, handle failures of downstream instances
  • Health Check: 可以知道 EC2 Instance 是否是 Healthy 的
    • ASG 是用 EC2 based health check, ALB 用的是 ALB based health check
    • ELB 应该在 public subnet, ASG 应该在 private subnet (重要)
  • ELB cannot distribute traffic for targets deployed in different region

  • Application Load Balancer (HTTP / HTTPS, Layer 7)
  • Network Load Balancer (TCP / UDP, Layer 4)
    • NLB provide static DNS and static IP, ALB only provide static DNS
  • Gateway Load Balancer (Security, Layer 3, 要特别注意)


Application Load Balancer (ALB)

  • Deal with Layer 7 (HTTP / HTTPS)
  • Route base on Path (/Home) or Query (?Platform=Mobile)
  • ALB adds an additional header called “X-Forwarded-For” contains the client’s IP
  • 可以 configure ALB to redirect HTTP to HTTPS (重要)
  • ALB 无法被 assign Elstaic IP

  • Routing tables to different target groups
    • Route based on path in URL
    • Route based on hostname in URL
    • Route based on Query String, Headers, Client IP

  • ALB Target Groups:
    • EC2 Instance, ECS, Lambda, IP Address

To forward all requests to the website so that the requests will use HTTPS, a solutions architect can create a listener rule on the ALB that redirects HTTP traffic to HTTPS.


Network Load Balancer (NLB)

  • Deal with Layer 4 (TCP / UDP), 很快, 处理有大量 request 的情况
  • One static IP per AZ or use Elastic IP
  • NLB supports HTTP health checks as well as TCP and HTTPS
  • 注意, NLB best suited for use-cases involving low latency and high throughput workloads, 比如银行, 游戏

If you specify targets using an instance ID, traffic is routed to instances using the primary private IP address specified in the primary network interface for the instance.


Gateway Load Balancer (GWLB)

  • Deal with Layer 3 (Security, 3rd Party)
  • GWLB allows perform inline inspection of traffic from multiple spoke VPCs in a simplified and scalable fashion
  • Use GENEVE protocol on port 6081


Sticky Sessions (Session Affinity)

  • Same client is always redirect to the same instance behind a load balancer
  • Make sure user doesn’t loss session data (比如用户登录)

  • Application-based Cookies: Don’t use AWSALB, AWSALBAPP, AWSALBTG
  • Duration-based Cookies: AWSALB for ALB, AWSELB for CLB


Cross Zone Load Balancing

  • ELB distributes traffic evenly across all registered EC2 instances in all AZs
    • 所有 Instance 分到的 traffic 一定是一样的
    • 如果没有 enable, 那么 AZ 内部就是平均分
  • ALB enabled by default, NLB disabled by default (重要)


SSL Certificates

  • Allow traffic between client and load balancer encrypted in transit (HTTPS)
    • Manage SSL certificates using ACM (AWS Certificate Manager)
  • 如果问到需要管理很多个 SSL 或者 TSL, 选择 SNI (重要)

  • Could be used in ALB & NLB, use SNI to make it work
  • SNI: Load multiple SSL certificates onto one web server (Server Name Indication)


Connection Draining

  • Waiting for existing connections to complete

Connection Draining enables the load balancer to complete in-flight requests made to instances that are de-registering or unhealthy


AWS ASG

  • ASG 可以和 ELB 一起用, 来 scale out 或者 in 基于服务器压力 (免费)
  • ASG 可以保证最少或最多有多少 EC2 Instance 在工作 (min, max)
    • ASG 是在一个 Region 里面去根据 availiable 的 AZ 进行 scale
  • 当一个 Instance 是 unhealthy 的时候, ASG 会 terminate 这个 Instance
    • ASG 在一个 AZ 里最早 terminate 的是那个最早被 launch 的 Instance
    • 遇到 unbalanced 的情况, ASG 是先 launch new instance 然后 terminate old instance
    • 但是在 unhealthy 的情况下是先 terminate unhealth Instance 然后 create new
  • 如果问到在某个节日前去 scale, 那就是 ASG scheduled action

  • 可以和 ELB 一起用, 来 scale out 或者 in 基于服务器压力

  • Launch Template: 可以作为 template 来生成 EC2 Instance
    • Launch Template 是管要生成什么 Instance 的 (eg. Spot, On-Demand)
    • Launch Configuration 是管生成的 Instance 的信息的 (eg. AMI, security group)
  • 如果要更新 Launch Template, 需要删掉旧的然后使用新的 (注意)

  • 可以和 CloudWatch Alarm 一起根据服务器压力 scale out 或 scale in

  • ASG 不 terminate Instance 的情况
    • The health check grace period for the instance has not expired
    • The instance maybe in Impaired status
    • The instance has failed the Elastic Load Balancing (ELB) health check status

Scaling Policies

  • Dynamic Scaling
    • Target Tracking Scaling: 根据 CPU 用量或者 SQS 长度来 scale
    • Simple / Step Scaling: 和 CloudWatch Alarm 关联
    • Scheduled Actions: 比如在感恩节那天进行 scale

  • Predictive Scaling
    • Forecast load and schedule scaling ahead


7. AWS Fundamentals

AWS RDS

  • 使用 RDS 的好处 (对比将数据库存在 EC2 上)
    • Automated provisioning, auto-scaling, backup
    • Read Replicas (最多 15 个) 和 Multi-AZ
  • RDS RIs: 类似 EC2 RIs
  • RDS Read Replica: If master database is encrypted, the replica is also encrypted
    • 注意, Read Replica 提高的是 read, 而不是 scale 数据库的 storage (而且 Replica 贵), 处理类似 report 或者 analytics 的就是 Read Replica
    • Read Replica 不是提高 High Availability, Multi-AZ 才是
    • Cross-Region Read Replica: 当 disaster 发生, 可以作为 DB backup
  • RDS Multi-AZ: RDS create a primary DB Instance and synchronously replicates the data to a standby instance in a different AZ (High Availability)
    • RDS Multi-AZ update 是 standby 和 primary 一起 update (会有 downtime)
  • Support MySQL, PostgreSQL, MariaDB, Oracle, MS SQL Server, and Amazon Aurora

  • Storage Auto Scaling (自动 scale)
    • For application with unpredictable workloads

  • Disaster Recovery
    • Create a Read Replica in different region and enable Multi-AZ on Read Replica
  • IAM Database Authentication: 可以用来访问 RDS
  • 可以从 RDS Read Replica 迁移到 Aurora Read Replica, 这个做法可以 minimal change

Read Replicas vs. Multi AZ

  • Read Replicas: Scalability, Async Replication
    • Within AZ, Cross AZ or Cross Region
  • Read Replicas in Same Region is free (Replication in AZs), Cross Region is not free

  • 用 Production databse 处理日常, 用 Read Replica 来建一个副本处理数据分析

  • Multi AZ: High Availability, Disaster Recovery, Sync Replication
    • 在 Multi-AZ 的情况下, 当 RDS 数据库 goes down, CNAME 会更新指向 standby



RDS Custom

  • Could access the underlying database and OS
    • 只有 Oracle 和 Microsoft SQL Server 可以


AWS Aurora

  • Support PostgreSQL and MySQL (不需要管理 storage)
  • Aurora 没有 standby database, Aurora Replica 可以作为 failover target
  • 一般来说, Aurora 比 RDS 快

  • High Availiability & Read Scaling (Aurora Read Replica)
    • One Master Write, Multiple Read Replicas (Up to 15 Replica), 一共 16 个
    • 如果问到和 Aurora 有关, 且关于处理网络请求激增的问题, 选 Aurora Replica
    • Replica Tier 越高, failover 时被 promote
  • 如果问到关于 Aurora 的 Auto-Scaling, 那就是 Aurora Serverless
  • 如果问到 Aurora 而且是 test database, 选择 Aurora Database Cloning



Aurora Advanced

  • Aurora Replicas Auto Scaling: 和 EC2 的 Auto Scaling 类似

  • Aurora Custom Endpoints: 可以在上面跑数据分析

  • Aurora Serverless: Automated database instantiation and auto scaling (重要)

  • Aurora Global: Cross Region replication takes less than 1 second
    • 同时保证至少有一个 Replica 可以用 (重要)
    • Designed for globally distributed application, with fast local reads with low latency

  • Aurora Machine Learning: ML prediction

  • Aurora Database Cloning
    • 可以快速 access Aurora DB, 比 snapshot & retore 快
    • 当需要制作 test database 的时候选 Aurora Database Cloning


Backup & Monitoring

  • RDS Backups (有两种)
    • Automated backups: 每天自动 backup
    • Manual DB Snapshots: 手动 backup
  • 注意, 如果是为了省钱, 可以在用完数据库后制作数据库 snapshot, 然后 terminate 数据库, 在需要的时候再把数据库 snapshot restore 回去

  • Aurora Backups (和上面一样)
    • Automated backups: 每天自动 backup
    • Manual DB Snapshots: 手动 backup

  • RDS & Aurora Restore options
    • 最基本的就是用 snapshot 来 restore
    • 如果是从 On-Premise, 可以做一个 backup 然后存到 S3, 再从 S3 来 restore


RDS Security

  • At-rest encryption: AWS KMS (Key Management Service)
  • In-flight encryption: AWS TLS Certificates
  • IAM Authentication: IAM roles to connect to database


RDS Proxy

  • Have a proxy for access RDS or Aurora, 缓解数据库压力, minimize open connections
    • Allow apps to pool and share DB connections (提高 DB efficiency)
    • Reduced RDS & Aurora failover time by up 66%
    • No code change (重要)
  • Serverless, auto scaling, Highly Avaliable (Multi-AZ)
  • Not publicly accessible (Need to use VPC)


AWS ElastiCache

  • Manage Redis or Memcached (ElastiCache 解决的是数据库高频读取的问题, 重要)
    • ElastiCache 也是 in-memory data store, 而且支持 SQL query caching
  • Cache are in-memory database with high performance and low latency (compute-intensive)
  • Cache helps reduce load off of database for read intensive workload (read heavy)
  • Help to make application stateless (AWS 负责大部分工作)
  • 处理 S3 的是 CloudFront, 而不是 ElastiCache (记住)
  • 支持 multi-threading 的是 ElastiCache for Memcached

  • DB Cache: Get data from ElastiCache
    • If not avaliable, get from RDS and store in ElastiCache

  • User Session: Write Session data into ElastiCache
    • User in another instance could still be logged in

  • Redis vs Memcahed
    • Redis: Read Replicas, High Availiablity, Backup and restore (非常重要)
    • Memcached: Not High Availiablity, No Backup, Have Risk to Lose Data, Multi-threaded


ElastiCache SAA

  • ElastiCache supports IAM Authentication for Redis
  • Memcached supports SASL-based authentication
  • IAM Database Authentication does not support Oracle

  • Patterns for ElastiCache
    • Lazy Loading, Write Through, Session Store

  • ElastiCache Use Case
    • Gaming Leaderboards (用 Redis 的 Sorted Sets, 重要)
    • Manage and store session data


8. Route 53

What’s DNS


AWS Route 53

  • User can update the DNS records (High available, scalable)
  • Route 53 is a Domain Registrar

  • Route 53 Records:
    • Domain Name, Record Type, Value, Routing Policy, TTL

  • Route 53 Record Types
    • A: IPv4
    • AAAA: IPv6
    • CNAME: Map hostname to another hostname
    • NS: Name Servers for the Hosted Zone

  • Route 53: Hosted Zones
    • Public Hosted Zones: Public domain name
    • Private Hosted Zones: Private domain name (VPC)

注意, DNS hostnames and DNS resolution are required settings for private hosted zones

  • GoDaddy + Route 53 (As DNS Service Provider)
    • Create a Public Hosted Zone and update the 3rd party Registrar NS records
  • Route 53 Health Check
    • 如果 ELB 出问题, 就去找 Route 53 Health Check

Route 53 TTL

  • TTL: Time To Live (生存时间)
    • How long the value should be cached (后端更新不代表 cache 更新了)


CNAME vs Alias

  • CNAME: Point a hostname to another hostname
    • 从 acme.example.com 到 zenith.example.org 或者 example.com 到 example.net
  • Alias: Point a hostname to an AWS Resource (对象是 CloudFront, S3 这些)



Health Checks

  • Health Checks are only for public resources
  • Health Check 检测: Endpoints, 其他 Health Checks, CloudWatch Alarms


Routing Policy (Simple)

  • Define how Route 53 responds to DNS queries
  • 一共 7 种 Routing Policies

  • Simple: Route traffic to a single resource
    • If multiple values returned, a random one is chosen by client


Routing Policy (Weighted)

  • Weighted: Control the % of the request go to each resource


Routing Policy (Latency)

  • Latency: Redirect to the resource that has the least latency close to us
  • Latency is based on traffic between users and AWS Regions


Routing Policy (Failover)

  • Failover: Have a Primary Instance and a Secondary Instance (重要)
    • When failover, switch to Secondary Instance
  • 注意, 当问到 failover 的时候, 选择的是 active-passive failover routing policy
    • 没有什么 active-active, 只有 active-passive


Routing Policy (Geolocation)

  • Geolocation: Routing based on user location
    • Need to create a “Default” record
    • Use cases: website localization, restrict content distribution 等等


Routing Policy (Geoproximity)

  • Geoproximity: Route traffic to resource based on geolocation
    • But have the ability to shift more traffic to resource based on bias (重要)
    • 可以做到 route more traffic or less


Routing Policy (IP-Based)

  • IP-Based: Routing based on client’s IP address
    • Provide a list of CIDRs for your client


Routing Policy (Multi-Value)

  • Multi-Value: Route traffic to multiple resources
    • Not a subsitute for ELB


9. Beanstalk

Instantiate App

  • EC2 Instance
    • Golden AMI: Install all applications beforehand and launch EC2 Instance
    • User Data: Dynamic configuration (记住是 dynamic)
  • RDS Database: Restore from a snapshot (Data & Schema ready)
  • EBS Volume: Restore from a snapshot


Elastic Beanstalk

  • 只需要负责 Code, 其他都由 AWS 负责

  • Web Server Tier vs. Worker Tier
    • 一个管 Web Server, 另一个处理 Process (SQS, SNS 等等)


10. AWS S3

AWS S3

  • S3 Buckets (S3 是 Global 的, 但是 Bucket 是 Regional 的)
    • Store objects (files) in “buckets”
    • Buckets must have globally unique name (名字必须独特)
    • Buckets are defined at region level

  • Objects (files) have a key, max object size is 5TB
  • The key is the FULL path (prefix + object name)
  • 如果上传超过 5 GB, 就要用 Multi-Part Upload (还有 S3 Transfer Acceleration)

  • S3 sync command: Uses the CopyObject APIs to copy objects between S3 buckets
  • S3 always return the lastest version of the object (重要)
  • S3 没办法加密 metadata
  • S3 是 serverless 的
  • 如果遇到需要处理 static content 的, 就一定是 S3 + CloudFront (重要)
  • 如果问到 S3 而且是关于图片上传, 选择 S3 Event Notification, 不是 EventBridge (重要)

  • 如果发现 S3 的传输功能出问题, 那么就给 S3 bucket 加上 prefix
  • 注意, S3 和 Database 是不沾边的, S3 不是数据库

By default, an Amazon S3 object is owned by the AWS account that uploaded it. This is true even when the bucket is owned by another account (重要)


S3 Bucket Policy

  • User-Based Security
    • IAM Policies: Which API calls should be allowed for a user from IAM (重要)
  • Resource-Based Security
    • Bucket Policies: Bucket rules (比如让 Object public, 重要)
    • Object Access Control List (ACL)
    • Bucket Access Control List (ACL)
  • Encryption: Encrypt objects in S3 with encryption keys

  • S3 Bucket Policies (JSON based policies)
    • Resoruces: buckets and objects
    • Effect: Allow / Deny
    • Actions: API to Allow or Deny
    • Principal: The account or user to apply the policy to


S3 Versioning

  • Enabled at the bucket level (Default is null)
  • Best pratice to version the buckets (可以 roll back, 可以防止误删, 重要)
    • 可以防止 accidental deletion of objects
  • Once version-enable a bucket, it can never return to an unversioned state (重要)


S3 Replication (CRR & SRR)

  • CRR: Cross Region Replication (compliance, lower latency access)
  • SRR: Same Region Replication (create test environment)
    • Must enable Versioning
    • The Copying is asynchronous
    • Must give IAM permissions to S3

  • Only new objects are replicated, existing objects need S3 Batch Replication
    • 也就是说只有新的 object 会被 Replicate, 老的 object 要 S3 Batch Replication
  • No “chaining” in replication
    • 比如把 A 复制到 B 和 C, 需要 A 复制到 B 和 A 复制到 C


S3 Storage Classes

  • 一共 7 种 Storage Class: General + 2 IA + 3 Glacier + Intelligent

  • S3 Standard - General Purpose
    • Used for frequently accessed data (最常见)

  • S3 Infrequent Access (Standard IA & One Zone IA)
    • For data is less frequent access but require rapid access when needed (重要)
    • Lower cost than Standard
    • 从 Standard 转到 One Zone IA 最少要 30 天
    • One Zone IA 不是 High Availability 的选择 (注意, 因为 AZ 会 down)
    • 如果遇到 access is always required 就不能选 One Zone IA, 因为 AZ 可能会 down

  • S3 Glacier Storage Classes (Archive 专用)
    • Low-cost object storage for archiving / backup
    • Price: storage + retrieval
    • S3 Glacier Instant Retrieval: Glacier 里最快但也最贵
    • S3 Glacier Flexible Retrieval: 三种模式 (Expedited 加急, Standard, Bulk 批量)
    • S3 Glacier Deep Archive: 存的时间最久, 两种模式 (Standard, Bulk), 48 小时 retrieval

  • S3 Intelligent Tiering
    • Move objects automatically between Access Tiers based on usage
    • 就是自动帮你把 Object 移到不同的 Storage Class 里
    • Intelligent Tier 是在 Standard 和 Standard IA 下面的, 不能从下往上转移


11. AWS S3 Advance

S3 Lifecycle Rules

  • Transition objects between storage classes (比如从 IA 到 Glacier)
    • Transition 需要 Lifecycle Rules (重要)
  • 比如从 Snowball 到 Glacier 就需要 Lifecycle Rule

  • Transition Actions: Configure objects to transition to another storage class (转移)
  • Expiration Actions: Configure objects to expire after some time (删除)

  • S3 Analytics: Help decide when to transition objects to the right storage class


S3 Requester Pays

  • The requester pays the cost of the request intead of the bucket owner
    • 反而是用户付钱, 但是用户得是 AWS 用户


S3 Event Notifications

  • Automatically react to certain event happened in S3 (比如图片上传)
    • Need to have IAM Permissions
    • S3 Event Notification 的 destination 是 SQS, SNS, Lambda (记住)
  • 大部分和 S3 事件相关的都是 S3 Event Notification, 而不是 EventBridge (重要)


S3 Performance

  • For Upload 上传 (重要)
    • Multi-Part Upload: 当上传文件大于 5 GB, 可以 parallelize uploads
    • S3 Transfer Acceleration: 将文件传到 AWS edge location, 可以和 Multi-Part 一起用
    • S3 Transfer Acceleration (S3TA) can speed up content transfers to and from S3 (针对关于 S3 的上传和下载)
    • 注意, S3 Transfer Acceleration 没办法 copy object between buckets

  • For Download 下载
    • S3 Byte-Range Fetches: Parallelize GETs, retrieve partial data (拿部分数据, 重要)


S3 Select & Glacier Select

  • S3 Select 使用 SQL 来做 server-side filtering, 过滤从而减少数据量
  • 400% Faster, 80% Cheaper (性能优化)


S3 Batch Operations

  • Perform bulk operations (批量操作) on existing S3 objects
    • 例如 encrypt all un-encrypted objects (加密所有没有加密的文件)
  • 在处理 Batch 之前, 用 S3 Inventory to get object list and use S3 Select to filter objects


12. AWS S3 Security

S3 Encryption

  • 一共有 4 种方法 (SSE-S3 免费, SSE-KMS 要钱)
    • Server-Side Encryption (SSE-S3, SSE-KMS, SSE-C)
    • Client-Side Encryption (如果用户已经有 encryption method)

  • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
    • Enabled by default for new buckets & objects
    • SSE-S3 的 header 是 x-amz-server-side-encryption
    • 使用 SSE-S3, 每一个 Object 都是由 unique key 来 encrypt 的
    • SSE-S3 使用 256-bit Advanced Encryption Standard (AES-256)
    • SSE-S3 是没有 automatic key rotation 的, 要用 SSE-KMS (重要)

  • Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
    • User control + audit key usage in CloudTrail (可以管理 key rotation)
    • SSE-KMS 的 header 是 x-amz-server-side-encryption + aws:kms

  • Server-Side Encryption with Customer-Provided Keys (SSE-C)
    • 用户自己有 keys, AWS will not store the key, HTTPS must be used
    • 当问题中提到用户需要使用自己的 key, 但是打算在 AWS 端做 encryption

  • Client-Side Encryption
    • User fully manages the keys and encryption cycle
    • 数据在送到 AWS 之前就是加密好的

  • Encryption in Transit (SSL/TLS)
    • 可以用 aws:SecureTransport 来 enforce (不过有 SSL 基本上就是 HTTPS)


S3 CORS

  • CORS: Cross-Origin Resource Sharing (跨域问题)

  • S3 CORS (重要)
    • A client makes a cross-origin request on S3 bucket, need the correct CORS headers
    • Can allow specific origin or *


S3 MFA Delete

  • MFA Delete: 防止不小心删除文件, 用户需要先验证身份 (重要)
    • To use MFA Delete: Versioning must be enabled
  • Only the bucket owner (root user) can enable/disable MFA Delete (重要)


S3 Access Logs

  • Any request made to S3 will be logged to another S3 bucket (数据分析)
    • 可以作为 Data Analysis 的工具


S3 Pre-signed URLs

  • Have URL expiration (过期)
    • 用在 Private Bucket 上, 一段时间后 URL 过期


Glacier Vault Lock & S3 Object Lock

  • S3 Glacier Vault Lock
    • Adopt a WORM (Write Once Read Many), 信息一旦写入就无法修改
    • The object can’t be deleted (无法修改)
  • 可以用 Glacier Vault Lock 储存 sensitive 数据, 然后用 Vault Lock Policy 去管理

  • S3 Object Lock (Must have Versioning, 重要)
    • Block an object version deletion for an amount of time (有时限)
    • 两个 Retention mode (重要)
      • Compliance: 所有人都没法删, Root user 也不行
      • Governance: 有些人可以删, 有权限的人可以
    • Legal Hold: 防止 Object Version 被改, 除非你自己把 Legal Hold 移除 (重要)


S3 Access Points

  • Access Points simplify security management for S3 Buckets (根据用户属性)
  • Each Access Point: DNS name + Access Point Policy


S3 Object Lambda

  • Use AWS Lambda to change object before retrieve by caller (在到达 caller 之前)
    • 用途: 可以给图片打水印
  • 需要 S3 Access Point 和 S3 Object Lambda Access Points


13. CloudFront & AWS Global Accelerator

AWS CloudFront

  • Content Delivery Network (CDN), 可以 serve static & dynamic content
    • Improve read performance, content is cached at the edge
  • 可以防止 DDoS protection (搭配 AWS Shield), 可以根据 content type 来 route, 可以指定 primary & secondary origins 来做 High Availiability & Failover
  • CloudFront Origins: S3 Bucket (OAC) 或者 Custom Origin (HTTP)
    • OAC: Origin Access Control (重要, 如果涉及到 S3 的问题)
  • 如果问到 CloudFront 而且需要 encryption, 选择 field level encryption (不是 KMS)
    • Field-level encryption allows you to enable your users to securely upload sensitive information to your web servers.

  • CloudFront vs S3 Cross Region Replication
    • CloudFront: 使用 global edge network, 有 TTL, 用于 static content
    • S3 Cross Region Replication (CRR): 每个 Region 都要设置, 但是没有 TTL, 用于 dynamic content (updated in near real-time, 只读)

  • Price Classes: 不同地区的价格不一样
    • 可以用来 cost reduction (降低价格)
  • CloudFront 允许 Proxy methods 和 Dynamic content 跳过 regional edge cache
  • 除了 WAF 可以去 block IP, 还可以使用 OAI (origin access identity, 重要)
    • OAI 也可以用来 secure communication between CloudFront & S3
  • 可以用 CloudFront signed URLs 和 CloudFront signed cookies 来 restrict access to documents (比如 subscription)
  • Can configure CloudFront to require HTTPS from clients (可以用 CloudFront 要求 client 必须使用 HTTPS)

CloudFront Geo Restriction

  • Allowlist & Blocklist (针对 IP, 国家进行访问限制)
    • CloudFront Geo Restriction 没办法和 VPC 一起使用


CloudFront Cache Invalidation

  • 让 CloudFront 跳过 TTL 立即更新
    • 比如你更新了后端, 但是 CloudFront 不会立即更新, 需要 CloudFront Invalidation 来跳过 TTL 来立即更新


AWS Global Accelerator

  • 利用 AWS internal network to route application
    • Provide 2 global static anycast IPs (重要)
    • Improve availability and performance of the applications (globally)
  • Global Accelerator improves performance for applications over TCP or UDP
  • Global Accelerator has automatic failover
  • Global Accelerator is more expensive as it adds an extra layer of infrastructure (对比 CloudFront 不是一个 cost-effective 的选择)

  • AWS Global Accelerator vs CloudFront
    • CloudFront: Content is served at edge, 比如图片和视频 (cacheable), Dynamic content such as API acceleration & dynamic site delivery
    • Global Accelerator: 适合 TCP 或者 UDP, 比如游戏和 IoT (static IP)

AWS Global Accelerator is a network service that can provide a global traffic management solution. By creating a standard accelerator in AWS Global Accelerator, you can guide user traffic to the endpoint closest to them, thereby improving the performance and availability of the application.


14. AWS Storage Extra

AWS Snow Family

  • 整个 Snow Family 都是硬件形式
  • Data migration (in & out AWS) & Edge Computing (process data at edge)
    • Snowcone (体积小, 适合带到特殊环境, 没有 Snowball 内存大)
    • Snowball Edge (Storage Optimized 和 Compute Optimized, 推荐处理 10 - 100 TB)
      • Compute Optimized 是支持 storage clustering 的
      • Terabytes, low costs, limited time = AWS Snowball devices
    • Snowmobile (不算在 Edge Computing 里, 数据传输最大, 推荐处理 10 PB 以上)

  • Edge Location: 比如海上或者矿洞里, 没有网络的情况下
    • 所以需要 Edge Computing 和 Snow Family
  • AWS OpsHub: 用来管理 AWS Snow Family
  • Snowball to Glacier: Use S3 lifecycle policy (Snowball 兼容 S3 但是 Glacier 不行, 重要)
    • 注意, 这里的 Glacier 指的是整个 Glacier Family (eg. Glacier Deep Archive)


AWS FSx

  • Third party high-performance file systems on AWS (第三方文件系统)
    • FSx for Windows: Window file system, support SMB & Windows NTFS
    • FSx for Lustre (Linux): 可以 maximize IOPS, 适合 HPC, 可以和 S3 Integrate
    • FSx for NetApp ONTAP: File system for NFS, SMB, iSCSI
    • FSx for OpenZFS: OpenZFS file system (和它名字一样)

FSx for Lustre provides the ability to both process the ‘hot data’ in a parallel and distributed fashion as well as easily store the ‘cold data’ on Amazon S3.

  • 两种 Deployment Options
    • Scratch File System: Temporary storage, data not replicated
    • Persistent File System: Long term storage, data replicated (Same AZ, 重要)


AWS Storage Gateway

  • Bridge between On-Premise data and cloud data (Hybrid cloud, 重要)
    • On-Premise 和 AWS Cloud 之间的桥梁 (保留从 On-Premise 去访问 AWS 数据的能力)
    • 重点, Hybrid Cloud, give On-Premise ability to access cloud storage
  • 提到 NFS 的就是 File Gateway, 问 S3 大部分都是 File Gateway
  • DataSync 是用来移数据的, File Gateway 是用来维持 access 的 (重要)
  • File Gateway 是 file storage, Volume Gateway 是 block storage (重要)

  • 一共有 4 种 Gateway (注意, Storage Gateway 泛指下面这 4 个 Gateway)
    • S3 File Gateway: 从 On-Premise 访问 S3 Buckets (NFS, SMB), file storage
    • FSx File Gateway: 从 On-Premise 访问 FSx (SMB, NTFS)
    • Volume Gateway: 从 On-Premise 访问 S3 和 EBS (iSCSI), block storage
      • Cached volumes: Access most recent data (eg. logs) 如果问到 S3 的话
      • Stored volumes: Entire dataset is On-Premise, backup at S3
    • Tap Gateway: 从 On-Premise 访问 Archieved Taps stored in AWS Glacier
      • 如果问题里面有 Tap 就是 Tap Gateway

AWS Transfer Family

  • Use FTP protocol to transfer files into or out S3 或 EFS
    • 有三种 Protocols: FTP, FTPS, SFTP (重要, 如果问到关于 FTP 或者 SFTP 的问题)


AWS DataSync

  • 可以从 On-Premise 或者 AWS 转移大量数据 (to & from)
    • On-Premise 到 AWS S3, EFS, FSx 等等 (need agent)
    • AWS 到 AWS (no agent)
  • Scheduled replication tasks (比如每周五都会 replicate 一次)
  • File permissions and metadata are perserved (重要)
  • DataSync 是用来移数据的, File Gateway 是用来维持 access 的

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data between on-premises storage systems and AWS Storage services, as well as between AWS Storage services.


15. SQS, SNS, Kinesis, Active MQ

AWS SQS

  • 用来 decouple applications (比如处理视频, 属于多对多模型)
  • SQS scale automatically (Unlimited throughput, unlimited message, can retry)
    • 遇到说 decouple microservice (但是没有 3rd party 的) 就是 SQS
  • Standard SQS 允许将 S3 作为 event notification destination, SQS FIFO 则不行
  • 如果遇到 SQS message process failure, 选择 DLQ 来解决 (重要)
  • 如果遇到需要处理 high-throughput request-reponse message pattern 的, 选择 temporary queue (重要)
  • 如果遇到需要 SQS 去 postpone the delivery of new messages, 那么就是 delay queue
  • 遇到 parallel 选 SQS 而不是 SNS

  • 可以有很多 consumer 来同时 parallel 处理 messages
    • Consumer delete messages after processing them (处理完就删掉)

  • SQS Message Visibility Timeout (重要)
    • Visibility Timeout is high: Consumer crash, re-process take time
    • Visibility Timeout is low: Get duplicate (防止 read duplicate, 增加 Timeout, 重要)
    • Use the ChangeMessageVisibility API call to increase the visibility timeout

  • 当遇到 SQS 而且需要 priority 的时候, Create two Amazon SQS standard queues, Set up Amazon EC2 instances to prioritize polling

  • 把 SQS 转成 FIFO queue
    • Delete the existing SQS and recreate it as FIFO queue
    • Make sure the name of the FIFO queue ends with .fifo suffix
    • Make sure the throughput for the FIFO queue not exceed 3000 meesage / second

SQS Long Polling

  • Decrease latency & decrease API call (减少 API 请求, 更好的 performance)
    • Minimize the cost of using SQS (省钱)
    • Long Polling 不能处理 SQS duplicate, 还是要用 Visibility Timeout


SQS FIFO Queue

  • 按顺序传递 message (SNS 也可以做到)
    • By default, FIFO queues support up to 300 messages per second
  • 如果没有 GroupID, 那只能有 1 个 consumer, 如果有 GroupID, 可以有多个 consumer


AWS SNS

  • Send one message to many receivers (可以给用户发 email)
    • Publisher & Subscriber 模型 (属于一对多模型)
    • Subscriber: SQS, Lambda, Kinesis Data Firehose, HTTPS endpoints, Email


Fan Out Pattern

  • Push once in SNS, receive in all SQS (由 SNS 传递 message 给 SQS 来接收)
    • Fully decoupled, no data loss
    • 可以用于 message filtering (信息过滤, 根据不同的 filter policy)
  • Kinesis 也可以使用 Fan Out Pattern (使用 Shard)


AWS Kinesis

  • Process, ingest, buffer streaming data in real-time (处理实时数据)

  • Kinesis Data Streams: Capture, process, store data streams
  • Kinesis Data Firehose: Load data streams into AWS data stores (S3, Redshift)
  • 当 Kinesis Data Streams 作为数据源写数据到 Kinesis Data Firehose 是不用 agent 的

Kinesis Agent cannot write to Amazon Kinesis Firehose for which the delivery stream source is already set as Amazon Kinesis Data Streams


Kinesis Data Streams

  • Have the ability to reprocess & replay stream data (replay, 处理数据, 重要)
    • 处理 real-time data stream, 比如 clickstreams, transactions, media (金融数据)
    • Consumer 有 Lambda, Kinesis Firehose, Kinesis Data Analytics
    • Kinesis Data Streams 有助于每秒从多个来源连续收集数 GB 的数据
  • Once data is inserted in Kinesis, it can’t be deleted (immutability)
  • 如果问到 Kinesis Data Stream 加上 SQL, 那么答案里一定有 Kinesis Data Analytics

  • Data share the same partition goes the same shard (有顺序的)

  • Capacity Mode (Provisioned 设置上限, On-Demand 自动 scale)
    • Provisioned mode: Choose a number of shards provisioned, scale manually
    • On-demand mode: No need to provision or manage capacity (自动)

如果遇到 ProvisionedThroughputExceededException 问题, 选择 batch messages 解决


Kinesis Data Firehose

  • Kinesis Data Firehose load streaming data into data stores and analytics tools
    • Managed service, auto scaling, serverless, support data transformation
    • Kinesis Data Firehose 提供将数据流传进数据存储或者数据分析的功能
  • Firehose 的对象是 S3, Redshift 这种 (Serverless, 专门处理 log 数据)
    • Firehose 不支持 DynamoDB 的
  • Near Real Time, 而且 Firehose 只支持一个 consumer (dump data in a single data repo)
    • 但是不要被 Near Real Time 忽悠, 要根据题目选择 Data Stream 或者 Data Firehose


  • Kinesis Data Streams vs Firehose
    • Data Streams: Write custom code, real-time, have data storage, have replay
    • Firehose: Fully managed, near real-time, no data storage, no replay


Data ordering for Kinesis

  • Same key (Partition) will always go to the same shard (有顺序)
  • 相比之下 SQS 只有 FIFO (针对于需要大量 consumer)


AWS MQ

  • 管理第三方消息代理 (Message Broker) 软件
    • RabbitMQ and ActiveMQ


16. Containers on AWS

AWS ECS

  • Elastic Container Service (Manage Docker containers on AWS)
  • ECS 有两种 Launch Type: EC2 和 Fargate (Fargate 是 serverless 的)
  • EC2 Launch Type: Need Provision, 需要 ECS Agent (重点)

  • Fargate Launch Type: Serverless, No Provision (重点)

  • IAM Roles for ECS (重要, 是 IAM)
    • EC2 Instance Profile: 只针对 EC2 Launch Type
    • ECS Task Role: 每个 Task 都有自己的 Role (负责)

  • ECS 可以和 Load Balancer 一起用
  • ECS 可以和 EFS 一起用 (Fargate + EFS = Serverless, 管理文件)

Amazon ECS with EC2 launch type is charged based on EC2 instances and EBS volumes used. Amazon ECS with Fargate launch type is charged based on vCPU and memory resources that the containerized application requests


ECS Auto Scaling

  • Automatically increase / decrease the number of ECS tasks
    • Target Scaling: Scale based on target value for CloudWatch metric
    • Step Scaling: Scale based on CloudWatch Alarm
    • Scheduled Scaling: Scale based on specific time (重要)


AWS ECR

  • Elastic Container Registry (ECR, 管理 Docker Image, 不是 Docker Container)
    • Store and manage Docker images on AWS


AWS EKS

  • Elastic Kubernetes Service (Manage Kubernetes clusters on AWS)
    • Support 2 deployment mode: EC2 & Fargate (和 ECS 一样)

  • EKS Data Volumes (需要 StorageClass)
    • Leverages a Container Storage Interface (CSI) compliant driver
    • 支持 EBS, EFS, FSx

  • EKS Node Type (一共三种)
    • Managed Node Groups (针对 EC2 Instance)
    • Self-Managed Nodes (自己创建)
    • AWS Fargate (不需要管理)


AWS App Runner

  • Deploy web application and APIs (Build 和 Deploy)
  • No infrasture experience required (不需要 AWS 经验)


17. Serverless

AWS Lambda

  • Virtual functions, Serverless, limited by time (short executions, 15 mins)
    • Run on-demand, scaling is automated
  • Could be Event-Driven, could handle CRON job (Jobs on a repeating schedule)
    • Use EventBridge to trigger Lambda every hour
  • Lambda 是有 account quota (配额限制) 的, 需要联系 AWS 来提高上限

  • Lambda Limits (限制)
    • Execution: Memory, Execution Time
    • Deployment: Size

By default, AWS Lambda functions always operate from an AWS-owned VPC and hence have access to any public internet address or public AWS APIs. Once an AWS Lambda function is VPC-enabled, it will need a route through a Network Address Translation gateway (NAT gateway) in a public subnet to access public resources


Lambda SnapStart

  • 对于 Java 11 以上, Function is invoked from a pre-initialized state


CloudFront Functions & Lambda@Edge

  • Execute logic at the edge (Edge Function, Serverless)
  • 两种方法: CloudFront Functions, Lambda@Edge (它们支持的语言不同)

  • CloudFront Functions (Javascript)
    • Native feature of CloudFront, short execution time

  • Lambda@Edge (NodeJS or Python)
    • Longer execution time

  • CloudFront Functions vs Lambda@Edge


Lambda in VPC

  • 正常情况下, Lambda 无法访问 VPC (所以要 Lambda in VPC)


RDS with Lambda

  • Invoke Lambda functions from DB instance
    • RDS for PostgreSQL, Aurora for MySQL, 需要 permission


AWS DynamoDB

  • NoSQL database, with replication across multiple AZs, auto-scaling, no provision
  • 两种 Class: Standard & Infrequent Access (IA)
  • 如果问到 DynamoDB 而且是处理 Email 的, 就是 DynamoDB Stream
    • 如果问到关于 Stream 的, 也要去考虑 DynamoDB Stream
  • 如果问到 DynamoDB 而且是处理 unpredictable 数据的时候, 选择 On-Demand table
  • By default, DynamoDB tables are encrypted with AWS owned key
  • 不要用 DB 去存 Image (不知道为啥)

  • Read / Write Capacity Mode
    • RCU 和 WCU 没有关联 (可以只加 RCU 不加 WCU)
    • Provisioned Mode: 自己定义需要多少 RCU 和 WCU
    • On-Demand Mode: 自动 Scale 需要的 RCU 和 WCU (贵)

  • Point-In-Time Recovery (PITR, 重要)
    • When enable PITR, DynamoDB backs up table data automatically with per-second granularity so that you can restore to any given second in the preceding 35 days.

DynamoDB Advance

  • DynamoDB Accelerator (DAX, 可以处理 cache, 但是 DAX 不是 relational 的)
    • Help solve read congestion by caching (microseconds latency, 缓存)
    • DAX 不支持 SQL query caching
    • 可以提高 DynamoDB 的 performance (提高的是 read 而不是 write, 重要)

  • DynamoDB Stream Processing (处理 Stream)
    • Ordered stream of item-level modifications in table
    • 用来处理 Stream 的, 可以 Invoke Lambda function (比如发邮件)

  • DynamoDB Global Tables (重要)
    • Make DynamoDB table accessible with low latency in multiple regions
    • 需要先有 DynamoDB Stream 作为前提

  • DynamoDB Time To Live (TTL, 重要)
    • Automatic delete items after an expiry timestamp (定时删除 DynamoDB 里的 item)


AWS API Gateway

  • Invoke Lambda function, expose REST API (Stateless client-server communication)
  • Lambda + API Gateway = No infrastructure to manage
  • API Gateway 可以防止 API overwhelmed by too many requests (防抖)
  • API Gateway Caching (可以 improve latency)
    • With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API
    • 反正记住, Read Replica 是要加钱的

  • API Gateway Endpoint Types
    • Edge-Optimized (default): For global clients, API Gateway live in one region
    • Regional: For client in same region
    • Private: 针对 VPC


AWS Step Functions

  • Build serverless visual workflow to orchestrate Lambda functions
    • 搭建 Serverless 可视化 Workflow (重要)


AWS Cognito

  • Give users identity to interact with web or mobile application
    • Cognito User Pools: 负责用户登录, have built-in user management
    • Cognito Identity Pools: 给临时 AWS credentials 来 access AWS 服务
  • 注意, ALB + Cognito User Pool 才是做 Auth 的, 而不是 CloudFront


18. Database Summary

AWS RDS (SM)

  • Manage PostgreSQL, MySQL, Oracle, SQL Server 等等
  • Provisioned, Auto Scaling, Backup
  • Have Read Replica & Multi-AZ (回去看第七章)


AWS Aurora (SM)

  • Manage PostgreSQL & MySQL (只有这两个, 回去看第七章)
  • Data stored in 6 replias, across 3 AZ, high available, auto-scaling
  • Aurora Serverless: For unpredictable workloads (Auto Scaling)
  • Aurora Global: Less than 1 second storage replication (保证至少有一个 Replica 可以用)
  • Aurora Database Cloning: 快速 access Aurora DB, 适合 test database 生成


AWS ElastiCache (SM)

  • Manage Redis / Memcached (In-memory data store, 回去看第七章)
  • Support Clustering (Redis), Multi-AZ and Read Replicas (Shard)
  • Require some application code change to be leveraged (重点)


AWS DynamoDB (SM)

  • Severless, NoSQL database (For rapidly evolve schemas, 回去看第十七章)
  • Capacity modes: Provisioned capacity or On-Demand capacity
  • DAX cluster for read cache, microsecond read latency (缓存)
  • Event Processing: DynamoDB Stream with Lambda or Kinesis Data Streams
  • Export to S3 without using RCU, Import from S3 without WCU


AWS S3 (SM)

  • S3 is a key / value store for objects (适合文件体积大的, 回去看第十章)
  • Tiers: Standard, IA, Glacier 等等
  • Features: Versioning, Encryption, Replication 等等
  • Encryption: SSE-S3, SSE-KMS, SSE-C


AWS DocumentDB

  • DocumentDB is for MongoDB (Also a NoSQL database, 但是只负责 MongoDB)


AWS Neptune

  • Graph database (For social network, 图数据库)


AWS Keyspaces

  • Manage Apache Cassandra (also NoSQL) database (阿帕奇卡桑德拉)
  • Use cases: Store IoT devices info, time-series data


AWS QLDB

  • QLDB: Quantum Ledger Database (Immutable, 无法删除)
    • For recording financial transactions (Serverless, 加密资产)


AWS Timestream

  • Serverless Time series database (时间序列数据库)
  • Use cases: IoT apps, real-time analytics


19. Data & Analytics

AWS Athena

  • Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL (Athena is Serverless)
    • Athena 支持 SQL query 去处理 S3 数据的
    • Athena cannot be used to analyze data in real time (没办法实时处理数据, 重要)
  • Use Athena to process logs, perform ad-hoc analysis, and run interactive queries (重要)

  • Performance Improvement
    • Use columnar data for cost-saving (可以有更少的 scan)

  • Federated Query
    • Allow to run SQL queries across data stored on AWS or On-Premise


AWS Redshift

  • Based on PostgreSQL, but it’s OLAP (Online analytical processing)
  • 处理 BI 数据, 和 AWS Quicksight 或者 Tableau 一起用
    • Petabyte scale - Redshift (重要)
  • 如果问到 Redshift 且和 S3 有关, 就是 Redshift Spectrum (重要)

  • Snapshots & DR
    • Redshift has “Multi-AZ” mode for clusters
    • Enable Automated Snapshots, copy the Redshift cluster to another AWS region

  • Load data into Redshift (将数据存到 Redshift 里)
    • Kinesis Data Firehose, S3 with Enhanced VPC Routing (注意), EC2 Instance

  • Redshift Spectrum (和 S3 有关)
    • Query data that is already in S3 without loading it (重要)


AWS OpenSearch

  • With OpenSearch, you can search any field, even partially matches
    • 原来叫做 ElasticSearch (就是做查询的)
  • 可以用 OpenSearch 实时处理和搜索 Logs


AWS EMR

  • EMR: Elastic MapReduce (处理大数据的)
    • EMR helps to creating Hadoop clusters (Big Data, 大数据)
    • 有 Hadoop, Apache Spark 啥的就是 EMR

  • Node types & purchasing
    • Node: Master, Core, Task
    • Purchasing options: On-Demand, Reserved, Spot Instance


AWS QuickSight

  • Serverless machine learning business intelligence service to create dashboard
    • 简单来说就是用 ML 来分析商业数据并搭建 Dashboard
  • In-memory computation using SPICE engine

  • Dashboard & Analysis
    • Share analysis or dashboard with Users or Groups (可以选择性的分享数据)


AWS Glue

  • Managed extract, transform and load (ETL) service (Serverless)
    • For customers to prepare and load their data for analytics
    • 遇到处理 ETL 的就是 Glue, 用到 Apache Spark
    • Use AWS Glue to process the raw data in Amazon S3 (重要)

AWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target.

  • Convert data into Apache Parquet format (转数据格式, 很奇怪, 但是会考)
    • 看到这个 Apache Parquet 就要想起 Glue

  • Glue Job Bookmarks: 防止 re-process 旧数据 (重要)
  • Glue Elastic Views: Combine and replicate data accross multiple data stores with SQL
  • Glue DataBrew: Clean and normalize data

AWS Lake Formation

  • Managed services to setup Data Lake (数据湖)
    • Central place to have all data for analytics purpose
    • 注意, 如果题目里提到 Fine-grained Access Control for application, 就要想起 Lake Foramtion


Kinesis Data Analytics

  • Real-time analytics on Kinesis Data Streams & Firehose using SQL (重要)
    • Fully managed, auto-scaling, serverless
    • 以 Kinesis Data Streams & Firehose 作为目标使用 SQL 进行数据分析

  • Amazon Managed Service for Apache Flink


AWS MSK

  • Managed Apache Kafka on AWS (Have Serverless)
    • Kinesis 的代替 (同样处理 Stream data), 但是针对 Apache Kafka

  • Kinesis Data Streams (第十五章) vs AWS MSK


20. Machine Learning

AWS Rekognition

  • Find object, people, text, scenes in images and videos using ML (识别物品)
  • Facial analysis and facial search
  • Content Moderation: Detect content that is inappropriate (重要)


AWS Transcribe

  • Convert speech to text (语音转文字, Transcribe 是转录的意思)
  • Automatically remove Personally Identifiable Information (PII, 重要)


AWS Polly

  • Convert text to speech (文字转语音)
  • Pronunciation Lexicons: Customize the pronunciation of words
  • Speech Synthesis Markup Language (SSML): Emphasize words (小声)


AWS Translate

  • Language translation (翻译)


AWS Lex & Connect

  • AWS Lex: Build chatbot (所有和 chatbot 有关的都是 Lex)
  • AWS Connect: Build cloud contact center


AWS Comprehend

  • Natural Language Processing (NLP)
  • Comprehend Medical: for clinical text (医疗方面, 非常重要)


AWS SageMaker

  • Build ML models (建 ML 模型)


AWS Forecast

  • Use ML to deliver highly accurate forecasts (做预测用的)


AWS Kendra

  • Managed document search service (文档查找)


AWS Personalize

  • ML with real-time personalized recommendations (做推荐用的)


AWS Textract

  • ML to extract text (提取文字用的, 重要)


21. AWS Mointoring & Audit

CloudWatch Metrics

  • CloudWatch Metrics helps to monitor every services in AWS
  • 可以用 CloudWatch Metrics 处理 CloudTrail logs 来监视异常

  • CloudWatch Metric Streams
    • Stream CloudWatch Metrics with near-real-time delivery (注意)


CloudWatch Logs

  • A place to store application logs in AWS (S3, Kinesis, Lambda)
    • Logs are encrypted by default

  • CloudWatch Logs Insights
    • Search and analyze log data stored in CloudWatch Logs

  • CloudWatch Logs Subscriptions (可以 Export 给 S3)
    • Get a real-time log events from CloudWatch Logs for processing and analysis

You can configure a CloudWatch Logs log group to stream data it receives to your Amazon OpenSearch Service cluster in NEAR REAL-TIME through a CloudWatch Logs subscription


CloudWatch Agent

  • 可以把 EC2 的 log 传递给 CloudWatch (By default, EC2 不传递 log, 重要)
  • CloudWatch Logs Agent: 传递给 CloudWatch
  • CloudWatch Unified Agent: 可以传递更多信息, 比如 CPU, RAM 等等


CloudWatch Alarms

  • CloudWatch Alarms are used to trigger notifications for any metric (用 SNS 发消息)
    • 比如 EC2 的 Health 出问题
    • Target: EC2, ASG, SNS


  • Composite Alarams (监视当前所有 Alarm 的情况)
    • Composite Alarams are mointoring the states of multiple other alarms


AWS EventBridge

  • Schedule CRON jobs (Jobs on a repeating schedule, event-driven)
  • React to events from SaaS application (AWS services)
    • 如果提到 3rd party application, 考虑 EventBridge

  • 基本上所有 event 都要经过 EventBridge

  • Event Bus (Replay archived events, 注意)
    • 有 Default Event Bus 和 Partner Event Bus (Datadog)
    • 可以 Archive Events 并且 Replay (Good for debug)

  • Schema Registry
    • Analyze events in Event Bus and infer schema

  • Resource-based Policy
    • Manage permissions (allow / deny) for a specific Event Bus


CloudWatch Insights

  • 这里的 CloudWatch Insights 有好几种不同的 Insights (4 种)
    • Container, Lambda, Contributor, Application
  • CloudWatch Container Insights (处理 Container 容器的)
    • Collect, aggregate, summarize metrics and logs from container
    • ECS, EKS, Kubernetes, Fargate 等等

  • CloudWatch Lambda Insights
    • Monitoring solutions for serverless application running on Lambda

  • CloudWatch Contributor Insights (注意, 可以查 IP)
    • Analyze log data and display contributor data (for system performance)

  • CloudWatch Application Insights
    • Provide dashboard to show the potential application issue related to AWS service


AWS CloudTrail

  • Provide governance, compliance and audit for AWS account (审计 events 和 API call)
    • 比如 History of events 或者 API calls
  • CloudTrail 是 Global Service, 如果东西被误删, 第一时间看 CloudTrail

  • CloudTrail Events
    • Management Events: Performed on resources in AWS account
    • Data Events: S3 object-level activity (GetObject, DeleteObject)
    • CloudTrail Insigths Events: Detect unusual activity (安全)

  • CloudTrail Insights (重要)
    • CloudTaril Insights to detect unusual activity in account

  • CloudTrail Event Retention (保存 events)
    • Keep events to S3 after they stored 90 days


AWS Config

  • AWS Config provides a detailed view of the configuration of AWS resources (重要)
    • Record configurations and changes over time (记录 config 是否被修改)
  • 可以用 Config 去检查 ACM 的 certificate 有没有临近过期

  • Config Rules: 可以自定义 Rule 来检查 (也有 AWS managed rules, 比如下面两个)

  • Config Remediations
    • Automate remediation (纠正) of non-compliant resource using SSM Automation Documents (eg. unrestricted SSH access)

  • Config Notifications
    • Use EventBridge to trigger notifications when AWS resources are noncompliant
    • 相当于当有不合规的 Resource 出现时, 可以 Trigger Notifications


22. IAM Advanced

AWS Organizations

  • Manage multiple AWS accounts, global service
  • Consolidated Billing across all accounts (合并账单, 重要)
    • 如果问到和 Shield Advance 有关的钱的问题, 那就是没有设置 Consolidated Billing
  • Shared reserved instance and Saving Plans discounts across accounts (重要)
  • 如果要把一个 Account 转到另一个 Organization, 要先把它从原来的 Organization 移除

  • Service Control Policies (SCP, 限制用户)
    • IAM policies applied to OU (Organization Unit) or Accounts to restrict Users and Roles (Blocklist and Allowlist)
    • IAM Roles 是针对 Service, SCP 针对 Organization

  • IAM Permission Boundary (回去看第二章)

    • A permissions boundary is an advanced feature for using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity.
  • Service Control Policies (SCPs)

    • If a user or role has an IAM permission policy that grants access to an action that is either not allowed or explicitly denied by the applicable service control policy (SCP), the user or role can’t perform that action
    • Service control policy (SCP) affects all users and roles in the member accounts, including root user of the member accounts
    • Service control policy (SCP) does not affect service-linked role

IAM Conditions

  • 要分清楚 bucket level permissions 和 object level permissions (/*)


IAM Roles vs Resource Based Policies

  • Role: Give up original permissions and take the permissions assigned to the Role
  • Resource Based Policy: The principal doesn’t have to give up permissions

  • EventBridge 例子
    • Resource based policy: Lambda, SNS, SQS 等等
    • IAM Roles: Kinesis stream, ECS 等等


Policy Evaluation Logic

  • IAM Permission Boundaries
    • A feature to set the maximum permissions an IAM entity can get

  • 答案 No, No, No (Evaluate 逻辑是先来后到, 先 Deny 就算后面 Allow 也没有用)


IAM Identity Center

  • One login for all (SSO)
    • AWS accounts in AWS Organizations
    • Business cloud applications (Salesforce)

  • Fine-grained Permissions and Assignments
    • Multi-Account Permissions: Manage access across AWS accounts
    • Application Assignments: SSO access to business applications
    • Attribute-Based Access Control (ABAC): Permission based on user’s attribute (tags)


AWS Directory Services

  • Create Active Directory (AD, 目录服务) in AWS (重要)
    • AWS Managed Microsoft AD: Create own AD in AWS
    • AD Connector: Proxy to redirect to On-Premise AD
    • Simple AD: AD-compatible managed directory on AWS


AWS Control Tower

  • Set up and govern a secure and compliant multi-account AWS environment (重要)
    • 管理 AWS 多账户环境, 而且是 Best Practice
  • ControlTower use AWS Organizations to create accounts

  • ControlTower - Guardrails (管理 ControlTower)
    • Preventive Guardrail: Use SCPs (Restrict REgions across all accounts)
    • Detective Guardrail: Use AWS Config (identify untagged resources)


23. AWS Security & Encryption

AWS KMS

  • Anytime you hear “encryption” for an AWS service, it’s most likely KMS
  • AWS managed encryption keys for us (Integrated with most AWS services)
    • 比如 EBS, S3, RDS, SSM (AWS managed keys),
  • KMS Keys are scoped per Region
  • 注意, KMS 并不适合保存 secret, 加密的东西不一定是 secret

  • Types of KMS Keys (3 种): AWS Owned, AWS Managed, CMK (Custom Managed)
  • 两种 Key 的形式: Symmetric (Single) & Asymmetric (Public & Private)
  • Automatic Key Rotation: 1 year

  • KMS Key Policies (Control access to KMS keys)
    • Default KMS Key Policy: Entire AWS account
    • Custom KMS Key Policy: Define who can access the key (Cross Account Access)

Deleting an AWS KMS key in AWS Key Management Service (AWS KMS) is destructive and potentially dangerous. Therefore, AWS KMS enforces a waiting period. (Pending state)


KMS Multi-Region Keys

  • Identical KMS keys in different AWS Regions (在其他 Region 也能用同样的 KMS Key)
    • Encrypt in one Region and decrypt in other Regions
  • Use cases: DynamoDB Global Tables & Global Aurora


S3 Replication with Encryption

  • Unencrypted objects and objects encrypted with SSE-S3 are replicated by default
  • For objects encrypted with SSE-KMS, you need more options
    • 也就是说 SSE-S3 在复制的时候会保持, 但是 SSE-KMS 需要设置


Encrypted AMI Sharing Process

  • 需要 Launch Permission, 需要 Share KMS Keys, 需要 Permission to decrypte
    • 反正只要知道 AMI 也是可以 Encrypt 并 Share 的


SSM Parameter Store

  • Secure storage for configuration and secrets
    • 比起 Secrets Manager 有更广的用途, 比如 URLs, AMI IDs, License keys 等等
  • Have built-in verion tracking (每次 edit secret 都会被记录)
  • SSM 没有 Automatic key rotation (重要)


AWS Secrets Manager

  • Store secrets, integrated with RDS & Aurora (非常重要, 存数据库 secrets 的)
    • 是给 confidential information (like database credentials, API keys) 用的
    • 比起 SSM Parameter Store, Secrets Manager 支持 Key 的轮换 (90 天)
    • 注意, Secret Manager 的 Key Rotateion 是 90 天, KMS 的 Key Rotateion 是一年
  • 比起 KMS, Secrets Manager 更适合去保存 secret, 比如 database credential, 而且 Secrets Manager 也有 Automatic key rotation

  • Multi-Region Secrets (和 Multi-Region Key 类似)
    • Replicate Secrets across multiple AWS Regions (disaster recovery)


AWS Certificate Manager (ACM)

  • Easily provision, manage, and deploy TLS Certificates (HTTPS)
    • 如果你要给 EB 上 HTTPS 就要用到 ACM
    • 如果是 third party SSL 就没办法使用 automatic certificate rotation

  • 可以用 EventBridge 来检查 ACM Certificates 是否过期 (过期 Invoke SNS)


AWS WAF

  • WAF: Web Application Firewall (防火墙)
  • Protect web application in Layer 7 (HTTP / HTTPS)
    • Layer 4 是 TCP / UDP, Layer 3 处理 VPC 之类的
  • 如果要 block countries, 可以用 WAF Geo Match 或者 WAF IP Set Statement (重要)
  • WAF 可以设置 rate-based rules, Shield 不可以
  • 如果需要在不同的 accout 或者 region 用 WAF, 考虑 Firewall Manager (重要)

  • Web ACL: 可以根据 IP, HTTP Headers, Size 来决定谁可以 Access
    • 也就是说, WAF 可以 block access from certain countries

If you want to use AWS WAF across accounts, accelerate WAF configuration, automate the protection of new resources, use Firewall Manager with AWS WAF


AWS Shield

  • Protect from DDoS attack (WAF 也可以, 但是考试遇到 DDoS 的时候选 Shield)
  • 有 Shield Standard (免费) 和 Shield Advanced (付费)


AWS Firewall Manager

  • Manage security rules in all accounts of an AWS Organization
    • 比如 WAF rules, AWS Shield Advanced, Security Group for EC2 & VPC 等等
  • Rules are applied to new resources as they are created
  • 如果需要在不同的 accout 或者 region 用 WAF, 考虑 Firewall Manager (重要)

  • WAF vs Firewall Manager vs Shield
    • 它们一起用来做大型安全保护 (WAF + Firewall Manager + Shield)
    • 如果只是日常保护, 就用 WAF
    • 如果遇到 DDoS 攻击, 考虑使用 Shield


AWS GuardDuty

  • Intelligent threat discovery (ML) to protect AWS account (用 ML 防止加密攻击)
    • Input data: CloudTrail, VPC, DNS (重要)
  • Use case: CryptoCurrency attacks


AWS Inspector

  • Automated Security Assessments (自动安全评估)
    • EC2 Instance & Container Images (ECR) & Lambda Functions


AWS Macie

  • Use ML to protect sensitive data (PII) in AWS (用 ML 保护敏感信息)


24. Networking VPC

  • CIDR = Base IP + Subnet Mask
    • 例子: CIDR 10.0.4.0/28, /28 代表 16 IPs = (2^(32-28) = 2^4), 8 的倍数大过 28 只有 32, 所以答案是 10.0.4.0 到 10.0.4.15 (0-15 一共有 16 个)
  • CIDR should not overlap, max CIDR size in AWS is /16, min CIDR size in AWS is /28


AWS VPC

  • VPC: Virtual Private Cloud
    • All new AWS account have a default VPC
  • CIDR should not overlap, max CIDR size in AWS is /16, min CIDR size in AWS is /28
  • 要让 VPC 使用 custom domain 需要 enableDnsHostnames 和 enableDnsSupport
  • 可以创建 Shared Service VPC, 这样每个 VPC 都可以 access 到需要的 services (重要)

  • Each Amazon EC2 instance that you launch into a VPC has a tenancy attribute (重要)
    • 可以在 dedicated 和 host 之间互相切换


VPC Subnet

  • Can have Public Subnet and Private Subnet
  • AWS reserves 5 IP addresses in each subnet (first 4 & last 1, 重要)
    • 记得前四个是从 0 开始算的, 也就是 0 - 3
    • 比如我要做一个可以处理 28 个的, 我需要 28 + 5 + 26 + 5 = 64
  • Subnet is always associated with Route Table (重要)
    • 问关于 Subnet 的问题就和 Route Table 没有关系


Internet Gateway (IGW) & Route Table

  • IGW allow resource (eg. EC2 Instance) in a VPC connect to internet
    • 相当于让 VPC 里的 resource 可以连上网
    • IGW 需要 Route Table (重要)
  • 如果 IGW 出问题
    • Need to also have Route Table (首先检查 Route Table, 因为 IGW 需要)
    • 检查 Security Group 是否允许通过
  • 处理 Network Address Translation 的就是 Internet Gateway
  • Internet Gateway 无法直接在 private subnet 里面使用 (重要)
  • NAT 针对的是 Subnet 层面, IGW 针对的是 VPC 层面 (重要)



Bastion Hosts

  • Use Bastion Host to SSH into private EC2 instance (SSH 进入 EC2)
  • Bastion Host is in the public subnet and is connected to the private subnet
    • 简单来说就是 Public Internet 到 Private Subnet
  • Bastion Host security group must restrict internet access (port 22)

Create a public Network Load Balancer that links to Amazon EC2 instances that are bastion hosts managed by an Auto Scaling Group


NAT Instances (Outdated)

  • NAT: Network Address Translation
  • Allow EC2 Instance in Private Subnet to connect to the Internet
    • 但现在都是用 NAT Gateway (NAT Gateway is the prefered solution now)

  • NAT instance can be used as a bastion server
  • Security Groups can be associated with a NAT instance
  • NAT instance supports port forwarding

NAT Gateway (NATGW)

  • AWS managed NAT, AZ specific, use Elastic IP (针对 IPv4, Egress 是 IPv6)
    • Allow EC2 Instance in Private Subnet to connect to the Internet
    • Requires an IGW, NATGW 是处于 public subnet 的 (注意)
  • NAT 针对的是 Subnet 层面, IGW 针对的是 VPC 层面

  • Resilient within single AZ, Multiple-AZ need multiple NATGW
    • 每个 AZ 都要一个 NATGW, 用来 fault-tolerance (容错能力)
  • Highly availiable within AZ (create in another AZ)


NACL & Security Groups

  • The request has to go over NACL before go to Subnet (Subnet level)
    • NACL is stateless (inbound outbound 都要检测)
  • The request has to go over Security Group before go to EC2 Instance (Instance level)
    • Security Group (SG) is stateful (inbound accepted = outbound accepted)
  • 反正记住 NACL 是 Subnet level (stateless), Security Group 是 Instance level (stateful)

  • Network Access Control List (NACL)
    • NACL control traffic from and to subnets (eg. block IP)
    • One NACL per subnet, each subnet have default NACL (注意)
      • Default NACL accept every inbound and outbound
    • NACL Rules have number, higher precedence with lower number (越优先数字越低)


VPC Peering & Sharing

  • VPC Peering: Privately connect two VPCs using AWS network (只适合少量 VPC, 重要)
    • 每一对 VPC 都要 VPC Peering, 和 S3 Replication 类似
  • Can create VPC Peering between VPCs in different AWS accounts / regions
  • Need to update Route Table in each VPC subnet to make sure they can communicate
    • 出问题就检查 Route Table

  • VPC Sharing: Allows multiple AWS accounts to create resources (eg. EC2, RDS) into shared and centrally-managed AWS VPC (重要)
    • 遇到 centrally managed 的就是 VPC Sharing
    • 而且是 owner 需要 share one or more subnet (注意)

  • VPC Endpoints allow private access to AWS services within a VPC (重要)
    • 用 Private Network 连接 AWS 服务, 和 On-Premise 没有关系
    • Interface Endpoint 支持大部分 AWS 服务 (付费)
    • Gateway Endpoint 只支持 S3 和 DynamoDB (很容易考, 免费)
  • 注意, VPC Gateway Endpoint 专门处理 S3 和 DynamoDB (不要去选择 NAT 或者 IGW)

  • 两种 Types of Endpoints (Interface / Gateway)
    • Interface Endpoints: Supports most AWS services (付费)
    • Gateway Endpoints: Must be used as a target in a route table (免费, 重要)
      • Only support S3 and DynamoDB (只支持 S3 和 DynamoDB)


VPC Flow Logs

  • Capture information about IP traffic going into interfaces (监视 IP 流量的)
    • Monitor & troubleshoot connectivity issues (网络连接问题)


Site-to-Site VPN (VPN Connection)

  • Site-to-Site VPN: Connect AWS to Corporate Data Center over public internet (重点, public internet, On-Premise to AWS)
    • Need a Virtual Private Gateway (VGW) on VPC (AWS 方)
    • Need a Customer Gateway (CGW) on DC (On-Premise 方)
  • 对比 DX, Site-to-Site VPN 没有提供 low latency, and high throughput connection
  • Site-to-Site VPN 有 encrypted network connectivity between On-Premise and VPC

  • 需要 VPN Gateway 和 Customer Gateway 来连接 VPC 和 On-Premise


VPN CloudHub

  • Provide secure communication for multiple VPN Connections (Site-to-Site VPN)
    • 在需要管理多个 Site-to-Site VPN 的时候
    • 可以作为 backup connection 使用 public internet 去防止 failover

If you have multiple AWS Site-to-Site VPN connections, you can provide secure communication between sites using the AWS VPN CloudHub.


Direct Connect (DX)

  • Provide a dedicated private connection from a remote network to VPC (重点, private internet, On-Premise to AWS)
    • 需要设置 VGW, 如果给 DX 找 backup, 那么就选择 Site-to-Site VPN (这两个服务类似)
    • 对比 Site-to-Site VPN, DX provide low latency, and high throughput connection
    • DX 不支持 encrypted network connectivity, Site-to-Site VPN 可以, 但是如果需要 encrypt DX connection 可以和 AWS VPN 一起用 (注意)
    • DX 的 set up 时间很长 (所以 DX 不算什么 quick solution)
  • All private, no public network involve (注意, 如果问到 all Private 就是 DX)
    • Direct Connect does not involve the Internet
  • 可以 access public resources (S3) 和 private resources (EC2)

  • 两种 Connection Type (带宽选项, Dedicated 最快)
    • Dedicated Connections: 1Gbps 到 100Gbps
    • Hosted Connections: 50Mbps 到 10 Gbps

  • Direct Connect Gateway (连接许多 VPC 在不同的 Region)
    • Direct Connection to one or more VPC in many different regions

  • PrivateLink vs DX

    • AWS PrivateLink provides a connection between VPCs (Virtual Private Clouds) and AWS services while bypassing the public Internet. It is a private network connection that securely transfers data without leaving the AWS network

    • AWS Direct Connect is a dedicated, private connection between the customer’s on-premises infrastructure at a data center and an AWS location. The main features of the connection are ultra-fast data transfer rates, low latency, and improved security since it bypasses the public Internet


Transite Gateway

  • Have transitive peering between thousands of VPCs, On-Premise 等等 (重要)
    • Regional resource, 可以 Cross-Region (Transite Gateway 连接了大量的 VPC)
    • 可以通过 ECMP 去 maximize the VPN throughput (重要)
  • Support IP Multicast, IP 多播可以在单次传输中向一组感兴趣的接收者发送互联网协议
  • 如果问到关于 star network 或者 hub-and-spoke 的就是 Transite Gateway

  • Site-to-Site VPN ECMP (重要)
    • ECMP = Equal-cost multi-path routing (increase bandwidth)
    • 路由策略, 将数据包转发到单个目的地可以通过具有相同路由优先级的多条最佳路径


VPC Traffic Mirroring

  • Capture and inspect network traffic in VPC (监测 VPC 网络流量)
  • Use cases: content inspection, threat monitoring 等等


IPv6 in VPC

  • IPv4 cannot be disabled for VPC and subnets
  • Eg. EC2 Instance get at least 1 private IPv4 and a public IPv6
    • 所以如果无法启动 EC2 Instance, 那么 IPv4 一定有问题
    • Create a new IPv4 CIDR in subnet (注意)


Egress-only Internet Gateway

  • Use for IPv6 only (Similar to NAT Gateway, but NATGW only for IPv4)
    • 只允许 outbound connection 从 VPC 到 Internet, 不允许 inbound 从 Internt 到 VPC
    • 也需要 Update Route Table


AWS Network Firewall

  • Protect entire AWS VPC (相当于 VPC 的防火墙, 千万要记住)
    • From Layer 3 to Layer 7 protection (全方面, 重要)

  • Have traffic filtering: Allow, drop, or alert traffic that match the rules

  • AWS Network Firewall is a managed firewall service that provides filtering for both inbound and outbound network traffic. It allows you to create rules for traffic inspection and filtering, which can help protect your production VPC


25. Disaster Recovery & Migrations

Disaster Recovery

  • RPO: Recovery Point Objective (这段时间的数据丢失)
  • RTO: Recovery Time Objective (这段时间 App 停机)
  • 遇到 DR 就应该去想 Multi-Region 的处理方法 (注意)

  • Disaster Recovery Startegies
    • Backup and Restore: High RPO & RTO, 将 On-Premise 数据备份在 Cloud 上
    • Pilot Light: 一部分 app 在 Cloud 上跑 (Only the critical infrastructure)
      • 注意, 遇到有提到 DR (Disaster Recovery) with minimum 的就是 Pilot Light
    • Warm Standby: 一部分 app 在 Cloud 上跑 (Scale-down version, scale-up quickly)
      • 注意, 遇到有提到 DR (Disaster Recovery) with scale-down 的就是 Warm Standby
    • Multi-Site: Every Low RTO & RPO, 全部 app 在 Cloud 上跑 (最贵)


Database Migration Service (DMS)

  • Migrate database to AWS (将数据库迁移到 AWS, 非常重要)
  • Support Homogeneous migration & Heterogeneous migration
    • Homogeneous 就是相同的数据库类型, Heterogeneous 就是不同的数据库类型
  • Need to have an EC2 Instance to perform DMS to make it work

  • AWS Schema Conversion Tool (SCT, 重要)
    • Covert Database Schema from one engine to another (如果要迁移的数据库类型不同)
    • If same DB engine, don’t need SCT (比如 PostgreSQL 到 AWS RDS PostgreSQL)

  • DMS Multi-AZ Deployement
    • DMS provisions and maintains a synchronously stand replica in different AZ
    • 相当于你在数据库迁移的时候可以在另外一个 AZ 有一个备份


RDS & Aurora Migration

  • RDS MySQL to Aurora MySQL (PostgreSQL 也是一样的道理)
    • Op1. DB Snapshots from RDS MySQL restored as Aurora MySQL
    • Op2. Create Aurora Read Replica from RDS MySQL
  • External MySQL to Aurora MySQL
    • Op1. Create backup file in S3, and create Aurora MySQL from S3


On-Premise Strategy with AWS

  • AWS Application Discovery Service (DS)
  • AWS Database Migration Service (DMS)
  • AWS Server Migration Service (SMS)
  • VM Import & Export (以上这些都要依赖 On-Premise)


AWS Backup

  • Fully managed service to automate backups across AWS services (做备份的, 重要)
    • 支持 EC2, EBS, S3, RDS, EFS 等等
  • Support Cross-Region backups and Cross-Account backups

  • AWS Backup Valut Lock (相当于给 backup 上锁)
    • Enforce a WORM state for all backups (意思就是所有 backup 都删不了)


Application Migration Service (MGN)

  • AWS Application Discovery Service (相当于在 MGN 之前要先做计划)
    • Plan migration projects by gathering information about On-Premise data centers
    • 有 Agentless Discovery 和 Agent-based Discovery
  • Result is viewed in AWS Migration Hub

  • AWS Application Migration Service (MGN)
    • Simplify migrating applications to AWS (将 App 迁移到 AWS)


VMware Cloud on AWS

  • Use VMware Cloud to manage and extend On-Premise Data Center


26. More SAA

Event Processing

  • Fan Out Pattern: Have SNS in the middle and subscribe to multiple SQS (重要)

  • S3 Event Notifications: Trigger Lambda or SNS when S3 event happen (在第十一章)
    • 如果遇到需要 set up DLQ (Dead Letter Queue), 选择 Lambda
    • DLQ 是处理当 message 没有办法被 processed (consumed) successfully
  • 也可以和 EventBridge 一起用, 有 advanced filtering
    • EventBridge 可以和 CloudTrail 一起用来 Intercept API call


Caching Strategies

  • Cache: 离用户越近越快, 但是可能 outdated, 所以需要 TTL (Time to Live)
  • API Gateway: Regional Caching (因为 API Gateway 是 Regional 的)
  • App Logic: 防止频繁调用数据库


Block IP Address

  • NACL: Decline the client IP
  • WAF: IP address filtering
  • 总之, 可以用 NACL 在处理, 也可以用 WAF 处理 (二者选一, 但是一般选择 WAF)


HPC in AWS

  • Cloud is the perfect place to perform HPC (High Performance Computing)
  • Data Management & Transfer (数据处理和迁移)
    • AWS Direct Connect, Snowball, AWS DataSync
  • 提到 HPC 就应该想到 EFA, Cluster Placement, FSx Lustre

  • Compute & Networking
    • EC2 Instance + EC2 Placement Group (Cluster, 因为 network 快)

  • Compute & Networking (Cont.)
    • EC2 Enhanced Networking (SR-IOV): 需要 Elastic Network Adapter (ENA)
    • Elastic Fabric Adapter (EFA): Improved ENA for HPC, only work for Linux

  • Storage
    • Instance-attached storage: EBS, Instance Store
    • Network storage: S3, EFS, FSx

  • Automation & Orchestration
    • AWS Batch: Run multi-node pararllel jobs in multiple EC2 instances
    • AWS ParallelCluster: Deploy HPC on AWS, can enable EFA on cluster


EC2 Instance High Availability

  • Elastic IP + Standby EC2 Instance + CloudWatch + Lambda
    • 当 failover, 转到 standby EC2 Instance

  • Elastic IP + Auto Scaling Group (min, max)
    • 当 failover, 有 replacement EC2 Instance (不需要 CloudWatch 了)


27. Other AWS Services

AWS CloudFormation

  • A declarative way of outlining AWS Infrastructure (相当于有一个模板帮你生成你想要的)
    • 需要什么 (eg. EC2 Instance, S3), CloudFormation 给你 create 什么
    • CloudFormation 是 IaaS (Infrastructure as a Service)

  • Infrastructure as a Service (IaaS, no need to manually create)
  • Good if we need to repeat architecture in different environment


CloudFormation StackSets

  • 使用 StackSets, 如果需要 cross account 或者 region 去使用 CloudFormation
  • AWS CloudFormation StackSet extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and regions with a single operation


CloudFormation Service Role

  • Service Role: IAM role that allows CloudFormation to create / update / delete resource
  • Give ability to user to do the above (安全方面)


AWS SES (Simple Email Service)

  • Managed service to send email securely (发邮件的)


AWS Pinpoint

  • Scalable 2-way (outbound / inbound) marketing communication serivce (重要)
    • 给客户发消息用的


AWS SSM Session Manager

  • SSM Session Manager: Systems Manager Session Manager
  • Allow to start a secure shell on EC2 and On-Premise (更好的从 On-Premise access EC2)
    • No need SSH access (Better security)


AWS Cost Explorer

  • Visualize & manage AWS costs and usage over time (可以生成消费报告)
  • Can choose a optimal Saving Plan
  • 如果问和钱相关的很大程度就是 Cost Explorer (重要)


AWS Compute Optimizer

  • 简单来说, Cost Explorer + Compute Optimizer 一起就是来省钱的

AWS Compute Optimizer recommends optimal AWS Compute resources for your workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics.


AWS Batch

  • Managed batch processing at any scale (批量处理)
    • Batch can dynamically launch EC2 Instance or Spot Instance
  • Batch jobs are defined as Docker images and run on ECS

  • Batch vs Lambda
    • Lambda: Time limit, Serverless
    • Batch: No time limit, Not Serverless


AWS AppFlow

  • Transfer data between Software-as-a-Service (SaaS) applications and AWS (重要)
    • 如果题目问到 SaaS 就要想起 AppFlow
    • Source: Salesforce, Slack 等等 (遇到 SaaS 就是 AppFlow)


AWS Amplify

  • A set of tools to develop and deploy full stack web and mobile application
    • 搭建全栈项目的


AWS Resource Access Manager (RAM)

  • RAM is a service that enables easily and securely share AWS resources with any AWS account or within AWS Organization. (eg. Transite Gateway, Subnets)
    • 选择 RAM 的主要原因是它便宜


AWS Systems Manager

  • AWS Systems Manager is an AWS service that you can use to view and control your infrastructure on AWS (比如 group resources, eg. EC2, S3 等等)
  • Use AWS Systems Manager Run Command to run a custom command that applies the patch to all EC2 instances


28. WhitePapers

  • Well Architected Framework: 有 6 个 Pillars 要记
  • Operational Excellence (运营) + Security (安全) + Reliability (可靠) + Performance Efficiency (性能效率) + Cost Optimization (省钱) + Sustainability (持续发展)
  • Maximum resilience is achieved by separate connections terminating on separate devices in more than one location

  • AWS Well-Architected Tool
    • Review architectures against 6 pillars (best practice)


AWS Trusted Advisor

  • High level AWS account assessment (AWS 账户评估)
    • Learn the best practices on cost optimization, performance, and security
  • 根据上面的 6 Pillars 来判断 (要记住)


29. Extra

  • DNS queries

    • To resolve any DNS queries for resources in the AWS VPC from the on-premises network, you can create an inbound endpoint on Amazon Route 53 Resolver and then DNS resolvers on the on-premises network can forward DNS queries to Amazon Route 53 Resolver via this endpoint.
    • To resolve DNS queries for any resources in the on-premises network from the AWS VPC, you can create an outbound endpoint on Amazon Route 53 Resolver and then Amazon Route 53 Resolver can conditionally forward queries to resolvers on the on-premises network via this endpoint.
  • EC2 Instance recover from impaired state

    • A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata
    • If your instance has a public IPv4 address, it retains the public IPv4 address after recovery
  • Spot Instance & Spot Fleet

    • If a spot request is persistent, then it is opened again after your Spot Instance is interrupted
    • Spot Fleets can maintain target capacity by launching replacement instances after Spot Instances in the fleet are terminated
    • When you cancel an active spot request, it does not terminate the associated instance
  • Improve the security at the authentication level by leveraging short-lived credentials

    • Use IAM authentication from AWS Lambda to Amazon RDS PostgreSQL
    • Attach an AWS Identity and Access Management (IAM) role to AWS Lambda
  • Only root user account can do

    • Some of the AWS tasks that only a root account user can do are as follows: change account name or root password or root email address, change AWS support plan, close AWS account, enable AWS Multi-Factor Authentication (AWS MFA) on S3 bucket delete, create Cloudfront key pair, register for GovCloud.

附录

请勿随意修改, 谢谢