Step 1 - Understand the problem and establish design scope
Candidate: Is this a mobile app? Or a web app? Or both?
Interviewer: Both
Candidate: What are the important features?
Interview: A user can publish a post and see her friends’ posts on the news feed page.
Candidate: Is the news feed sorted by reverse chronological order or any particular order
such as topic scores? For instance, posts from your close friends have higher scores.
Interviewer: To keep things simple, let us assume the feed is sorted by reverse chronological
order.
Candidate: How many friends can a user have?
Interviewer: 5000
Candidate: What is the traffic volume?
Interviewer: 10 million DAU
Candidate: Can feed contain images, videos, or just text?
Interviewer: It can contain media files, including both images and videos.
Now you have gathered the requirements, we focus on designing the system.
Step 2 - Propose high-level design and get buy-in
Feed publishing
Newsfeed building
Newsfeed APIs
publishing API and news feed retrieval API.
Feed publishing API
POST /v1/me/feed
Params:
• content: content is the text of the post.
• auth_token: it is used to authenticate API requests.
Newsfeed retrieval API
GET /v1/me/feed
Params:
• auth_token: it is used to authenticate API requests.
Feed publishing
Newsfeed building
Step 3 - Design deep dive
Feed publishing deep dive
Fanout service
Pros:
• The news feed is generated in real-time and can be pushed to friends immediately.
• Fetching news feed is fast because the news feed is pre-computed during write time.
Cons:
• If a user has many friends, fetching the friend list and generating news feeds for all of them are slow and time consuming. It is called hotkey problem.
• For inactive users or those rarely log in, pre-computing news feeds waste computing resources.
Fanout on read.
Pros:
• For inactive users or those who rarely log in, fanout on read works better because it will not waste computing resources on them.
• Data is not pushed to friends so there is no hotkey problem.
Cons:
• Fetching the news feed is slow as the news feed is not pre-computed.
The memory consumption can become very large if we store the entire user
and post objects in the cache. Thus, only IDs are stored.
Most users are only interested in the latest content, so the cache miss
rate is low.
Cache architecture
Step 4 - Wrap up
Scaling the database:
• Vertical scaling vs Horizontal scaling
• SQL vs NoSQL
• Master-slave replication
• Read replicas
• Consistency models
• Database sharding
Other talking points:
• Keep web tier stateless
• Cache data as much as you can
• Support multiple data centers
• Lose couple components with message queues
• Monitor key metrics. For instance, QPS during peak hours and latency while users
refreshing their news feed are interesting to monitor.