I’ve always wanted to build one from scratch but why? Because I love to learn, and at least at this moment I think I learn the most by “just doing things”, so I’ve been on a quest to learn SEO and Marketing, what would have been the best way to know about search engines, other than building it, also learn and improve along the way.
Google had done a more than a well job in indexing the world’s data on the internet, for the last few decades, everywhere I asked this question on How to make a search engine, I got one answer, it’s a highly resource intensive, one thing I’ve learned being myself (obv. can’t put tags to my learnings but for those who care — entrepreneur, ceo and developer) is that often it’s the first step that’s the most difficult, and 90% of the times it is deciding that you’re gonna do it.
So, I started working on the backend first, being a Python enthusiast since my first year in college, I’d like to tribute my interest in Python to Django and Tkinter primarily, I wanted to also always have my own framework that I could add the features I’m more interested in or the one’s that I feel should be across any framework I use, some of them being, making app cross platform, meaning it works on MacOS, Windows, Linux and Android but natively, but also the web. Another one that I badly was interested in was an auto-SEO tags management, framework that would auto generate all meta tags for all the pages on the website.
In the backend I decided with what could I loose, if I used MongoDB, they even provide a web interface and a generous free tier, now 90+ hours later I’m here on this post, writing about it, if you’re reading this everything has gone well, I was able to debug all errors, fix very serious mishaps(like infinite recursive website scraping — that could run me out of my local ISP bandwidth for this month), bandwidth was definitely the most important issue, next would be compute, then power consumption.
Each problem I faced made me learn more about SEO and HTTP in general,
1. Content Resource Limitation
2. Compute Capacity Limits
3. Recursive Bandwidth
4. Search Results Tuning
5. MongoDB Queries
I learnt about a lot of things, while building MVIII that can’t be summed up in one post, so I’ll share the most amazing thing that I came across, and it’s * * drumrolls * * MongoDB search. After about 90+ hours of webscraping (thank you apple, facebook, github and amazon for having the largest and most confusing static sites)
yep, that’s 60k total html links, during this process I went from loving coding to hating it and the back to loving it.
I’ve also built a discord community for people who love to learn and discuss about search engines, and kept a link on the home under the Pay Anything to Support and join Discord Chat
I learnt a lot about MongoDB query engine and aggregator, these are just some awesome tools for a NoSQL Database. coming from usually programming in SQL paradigm and Firebase’s DB SDK, these felt like I had some control over the DB, also, the queries are non-jargon, you can formulate whatever you require, without requiring high storage. Like
{column : { $exists : true }}
CHAR: 29
in SQL this would be
SELECT COUNT(column) where COLUMN IS NOT NULL
CHAR: 45
this is just a next-level developer experience. In the future I hope to write more on my research with developing a search engine, about SEO among other engineering and development of things that I usually like to do.
Overall, now I understand why company like Google had to take different methods, build new tools and technology like gRPC and No-SQL engine’s
maybe this time I’ll try to avoid the mistakes, that Google made in terms of engineering and monetization model.
Thank you for reading this far, here’s an additional learning for you, sometimes things that seem infinite are actually finite, and who knows, even the vast looking space in universe might also be finite, but we will not know until you start.
PS: MVIIII is inspired by the roman’s thirst of knowledge
Fun Fact: Did you know Alexa Site Ranking closed July 31, 2023, now it’s a homepage for Amazon Alexa ?