Twitter design

๐Ÿ“† Sun Feb 20 2022
โณ 7
๐Ÿ“– 623
๐Ÿ’— 3
๐Ÿ‘ 4
๐Ÿ“† Sun Feb 20 2022
โณ 7
๐Ÿ“– 623
๐Ÿ’— 3
๐Ÿ‘ 4
๐Ÿ“† Sun Feb 20 2022
โณ 7
๐Ÿ“– 623
๐Ÿ’— 3
๐Ÿ‘ 4
๐Ÿ“† Sun Feb 20 2022
โณ 7
๐Ÿ“– 623
๐Ÿ’— 3
๐Ÿ‘ 4

Being a huge platform as it is twitter processes insane amounts of information and provides millions of users with their precious dopamine ๐Ÿ˜„. That being said, it is very worth while to look into how one would build such a platform.
Thinking about basic premise of the system, even before going into requirements and details we can make few assumptions:

  • Designing the system as monolithic, even modular, wouldn't suffice, as we will probably run into many pitfalls with scaling and thus performance
  • Likely, not all parts of the system would have the same architecture
  • Using synchronous approach to data fetching will result in severe bottlenecks as the platform scales

With this in mind we can go ahead and think about requirements and design:

  • Definitions
    • User, person who has registered and logged in to the platform
    • Follower, User who has chosen to Follow another User's posts
    • Tweet, a post/message shared on a platform, visible by followers
    • Retweet, sharing a Tweet within the platform
    • Topic, Tweet topic defined by content and/or hashtags
    • Timeline, a feed for Tweets
  • Functional requirements
    • Users must be logged in to use the platform
    • Users can create and delete accounts
    • Users can: create, share and search for Tweets
    • Users must be able to reach millions of followers within several seconds of tweeting
    • Timelines:
      • Home timeline, a landing page timeline, main feed
      • User timeline, profile page feed of Tweets
      • Results timeline, shown when any filter is applied to any timeline
    • Tweets can have text, images and/or videos
    • Recommendations for users based on Tweets and Topics
    • Notifications when Followed User has taken action
    • Notifications when related Trending Topic (Re)Tweet has been created
    • Analytics for how Users interact with the platform
  • Non-functional requirements
    • System must not fail completely in case something bad occurs
    • As the platform grows, it must be able to support new features as easily as possible
    • Popularity of the platform assumed, it must be made fairly easy to integrate with
    • Constant updates make the system susceptible to often change thus working with the system must be seamless
    • System must adapt to often changes in load and concurrent userbase.

Characteristics

Considering that Twitter has more than 200 million users where larger influencers have even millions of followers, it can be determined that Twitter is far more read focused. Meaning that hundreds of thousands of queries are made every second to provide Timelines for users. While writing is also a challenge since we must provide content to large audiences when tweeting we must use asynchronous approach when dealing with creating Tweets. Applying eventual consistency seems reasonable since it is acceptable if a Follower receives/sees Tweet after a small delay.

With the outline defined, next thing that should be considered is data. Storing large amounts of data can certainly be a challenge, though Tweets are small we must take into consideration that we do not duplicate data needlessly, or create some unnecessary couplings. Additionally, due to large read requirements caching mechanics must be considered, thus we need a system that is capable of fast reads and horizontal scaling, this is where Redis would come in as a fitting solution. This does mean that we will need to store Tweet clones into Redis and our main storage (whatever that ends up). Note that I did mention that we should not duplicate data needlessly, however I do feel that this is very much needed, as the performance of the system takes precedence over storage in this context.

Now we can outline a high level solution as an initial version that we can iterate upon.

alt