Cathedrals Fall, Markets Endure

As a database software practitioner in China, I've recently been asked by many friends on WeChat about the news of a certain vendor's "team restructuring" in the industry. I actually don't want to comment on this matter. I firmly believe that for foundational software, the only path forward is open source. If it's not open source, or if the core isn't open source, the product's vitality is limited. So, I'd like to share some of my personal views on open source versus closed source, hoping that after reading this article, you'll have some thoughts of your own.

By the way, seeing this title, friends familiar with the open source movement will surely smile knowingly. Yes, as a disciple of ESR, I never hide my fondness for the work "The Cathedral and the Bazaar." Additionally, as an entrepreneur working in open source, our practice over the past few years has deepened our understanding of ESR's book. I'll try to summarize some questions we're frequently asked in this article, and in the last part, I'll dare to revise ESR's theory in the context of today's cloud era. Also, the software we discuss is limited to foundational software (databases, compilers, operating systems, etc.).

Is Code the Core Competitiveness?

I've talked with authors of some closed-source software projects, and most reasons for choosing closed source fall into the following categories:

They think their core algorithms are very powerful and don't want competitors to imitate them
They worry that once users get the code, they won't pay
They haven't found or built their own moat
The code is too ugly to open source
They're afraid people will find bugs

The first three answers are the most common. I can understand these responses very well, and they're all very legitimate reasons. But in this article, let's analyze them one by one objectively. For the fourth and fifth reasons, I don't want to elaborate too much—we can discuss them in the future if there's a chance. Let's focus on the first two, and I'll discuss the second one later.

For the first reason, let's think more deeply. Generally, there might be two situations:

My core code is short, perhaps a clever algorithm or a set of clever parameters
My engineering design and implementation are excellent, and the system architecture is leading

For the first situation, my consistent view is: In the same industry, unless you've achieved complete talent monopoly, in a fully competitive environment, if this is a high-value problem, then the short "core algorithm" you can think of, others can also think of. There's no silver bullet. Computer science is the art of finding balance among countless compromises and imperfections (of course, Turing Award-level ideas or quantum computers are exceptions, but such opportunities are rare). Even if closed source creates short-term monopoly advantages, this balance will inevitably be broken by another competitor, and eventually, a quality open-source alternative will emerge to take over everything (this open-source de facto standard may not even be better in the short term).

Most product advantages are actually reflected in engineering implementation, which is the second situation above: a group of excellent engineers, under correct design, build quality software. For this situation, whether open source or not, competitors can't easily imitate it. It's like a top student scoring 100 on an exam—showing this answer sheet to a struggling student won't immediately turn them into a top student, because code is just the result. What kind of thinking and choices led to this result—this process can't be opened. It's knowing the "what" but not the "why." Of course, even if you're also very capable and have a group of excellent engineers who quickly build a good product, it doesn't matter. The outcome is the same as the situation mentioned earlier: As long as you're closed source, and this problem is common enough and high-value, then in the long run, there will inevitably be an open-source solution that takes over everything. The reason behind this actually has nothing to do with code, because code here is not the core competitiveness. Regarding the third reason mentioned earlier, I think it's similar to the first—the author may recognize that code isn't necessarily the core competitiveness, but without building a good moat, they can only choose to use code as the moat.

If Code Isn't the Core Competitiveness, What Is?

Before discussing the true core competitiveness, let's talk about the limitations of closed-source software.

Let's look at the life of a closed-source software: The motivation for the project might be a company's or individual's insight into a market opportunity, finding a high-value scenario where developing software can significantly improve efficiency or create value, or it might even be a contract from a client. In any case, this company recruits a group of programmers, designers, and product managers to start project development. If everything goes smoothly, they successfully meet the client's needs, and the client happily pays. Then the company discovers that with some modifications (or even without modifications), this software can be sold to another client in the same industry. This is great—it feels like they've found a path to wealth. But the good times don't last long. The client's scenarios and needs are changing, and the original software may not be able to meet new requirements. But the development team only has these few people, and if they make a wrong direction judgment, they might miss the time window and opportunity. This means the requirements for the project leader are very high—they must continuously lead the industry direction. Another way is to choose a relatively narrow or slow-iterating field, which can extend survival time. For clients, it's also difficult—they always feel that needs are met half a step behind. Even for clients with R&D capabilities, because they're limited by not having source code, even if they know how to improve it, they can only stare helplessly.

The essence of this problem is: Although closed-source software vendors may be technical experts, they're not necessarily business or scenario experts. The speed of software evolution is limited by the evolution speed of the development team's and product managers' own cognition and knowledge. Unless the vendor is powerful enough to continuously lead the evolution direction of the entire industry, there's no solution.

Actually, this problem was already answered by Chairman Mao: "All correct leadership is necessarily 'from the masses, to the masses.' This means: take the ideas of the masses (scattered and unsystematic ideas) and concentrate them (through study turn them into concentrated and systematic ideas), then go to the masses and propagate and explain these ideas until the masses embrace them as their own, hold fast to them and translate them into action, and test the correctness of these ideas in such action. Then once again concentrate ideas from the masses and once again go to the masses so that the ideas are persevered in and carried through. And so on, over and over again in an endless spiral, with the ideas becoming more correct, more vital and richer each time." — On Methods of Leadership

If I may say so, Chairman Mao, placed in contemporary times, would be a master-level programmer even if he were just a programmer. Chairman Mao's words contain two key points that perfectly explain the source of open-source software's vitality. Let me elaborate below.

The Vitality of Open-Source Software Comes from Scenario Monopoly, and Behind It, the More Essential Monopoly Is Talent Monopoly

Why emphasize "from the masses"? Looking back at our discussion about closed-source software, a key point is that although the initial motivation for software comes from the insights of a few people, continuously maintaining insights is not easy. This is why many technical teams or product teams easily become "self-indulgent." Once they detach from users, such problems are extremely likely to occur. Closed-source software vendors reach users through nothing more than traditional commercial promotion and sales. The barrier from user interest to actual use is very high, and the implementation cycle is also very long. Additionally, sales usually stand between the product team and clients, using information asymmetry to obtain excess profits. The biggest information asymmetry is the closed source code itself or customization. The problem this leads to is that compared to popular open-source software, closed-source software cannot efficiently acquire, absorb, and understand more scenarios. For a general foundational software product, this is usually a fatal problem. If you haven't seen enough scenarios, you can't judge which product needs are universal needs that should be done and which are false needs that should be firmly rejected. I think this is the "touch" of product development.

For a popular open-source software, it doesn't have the problems mentioned above: Because there are enough users, you'll definitely see enough scenarios and enough strange usage patterns. Each user's feedback, each bug fixed, each suggestion proposed will continuously produce a "compound interest" effect. Your software becomes stronger, sees broader scenarios, which further allows you to reach a larger user base, helping the software become more powerful, and so on in a cycle. In fact, open-source software essentially trades some potential profits from information asymmetry for extremely efficient dissemination and scenario reach. But interestingly, the potential profits sacrificed may not actually be sacrificed—first, the paying ability may be limited; second, these users may actually give back to the project itself through promotion, advocacy, secondary dissemination, or code contributions.

In the process above, an even more powerful effect is produced: talent monopoly. As the saying goes, "people make things happen." All the technical decisions and practices in the scenario monopoly mentioned above are operated by people. A popular open-source software, in the process of becoming a de facto standard, will inevitably cultivate a large number of engineers familiar with this product, users, cheering fans, code contributors, and even critics. Traditionally, everyone understands the open-source community as narrowly referring to the developer community—only contributing code counts as participation. But I think anyone who has a connection with this product is part of the community. "Making the best use of everyone's talents" is the ultimate goal of building an open-source community. This advantage accumulates over time. This is easy to understand. For example: Engineer A from Company A learns to use TiDB in their work at Company A and solves problems well. Then this engineer, as a database expert, jumps to Company B. When encountering the same problem, what do you think they'll choose? 😊

Iteration, Iteration, Iteration—Only High-Speed Iteration Can Remain Invincible

There's a key point in Chairman Mao's words above about positive cycles, which is iteration. This principle also applies to software development. Software is never static. As markets and competitive environments change, your competitive advantage today may not be one tomorrow. Many people like to look at problems with a static perspective, enthusiastically comparing various solutions horizontally while ignoring evolution speed. On this point, I may value the vertical comparison of the same product more. For example: Currently there are three solutions A, B, and C. Looking at them now, these three solutions may not differ much, perhaps within 50%. But if one open-source solution doubles its improvement compared to itself six months ago each time (driven by the open-source community), while the closed-source solution's progress is limited by team size and resources, then unless it's a life-or-death situation, the choice should definitely be a solution with faster iteration speed, better growth rate, and more representative of the future. This is also easy to understand. This is an inertia of human thinking—people always tend to look at problems with linear thinking, so they often habitually underestimate non-linearly growing things.

Let me give a more striking example. I roughly calculated that from 2018 to now, in just over a year, the entire TiDB SQL layer project has had over 30,000 commits, with close to 60% of the source code modified. This means that each year's TiDB is different from the previous year—it's a TiDB that's more adapted to the present and more progressive. And as the community continues to grow, the iteration speed will become faster and faster. I completely cannot imagine that if TiDB were closed-source software, starting from the first line of code, how it could reach its current maturity in just 5 years. All of this is thanks to the acceleration and repeated iterations brought by the open-source community.

How to Make Money? The Future Is in the Cloud

We've talked a lot about product philosophy above. Let's now talk about business and the position of open-source software in the cloud era. Let's return to the topic mentioned at the beginning: worrying that users will get the code and won't pay. The implication behind this view is that users pay for code. If they have the code, users have no other reason to pay. Actually, this conclusion is unreliable. Customers pay to solve problems and create value, not for code. If the cost of getting your code and doing it themselves is greater than the money they give you (if you can honestly deliver value), users have no reason not to pay. And the costs here include obvious costs such as labor costs and machine costs, as well as some often-ignored costs such as sunk costs from missing market opportunities, business transformation and migration costs, learning costs, and risk costs from having no one who understands how to fix problems when they occur online. These hidden costs are often much higher than obvious costs.

My explanation above implies one point: The value of software depends on what problems it solves and what value it creates, not whether it's open source. For example: A distributed relational database must have more commercial value than a distributed cache. This is determined by the former's application scenarios, stored data, and provided capabilities, not by whether it's open source. So this is why we want to make a general-purpose database—because the value ceiling is higher.

Another point that needs emphasis: Open source is not a business model, but a better software development and distribution model. Additionally, I think business models, like software itself, also need to be designed. This design depends on product characteristics and company attributes, which means the business model suitable for Product A may not be suitable for Product B. Even for the same product, different companies may have different suitable business models.

Let me use Huawei, a company I greatly respect, as an example. Huawei is a very capable communications equipment manufacturer, a very successful mobile terminal manufacturer, and a very successful hardware manufacturer. Selling communications equipment, selling mobile phones, selling servers—do you see the commonality? Huawei is very good at selling hardware and boxes. Huge commercial success brings great inertia. The characteristics of the hardware and communications equipment market are: The capabilities of each company's products are not very different (at least not generationally), and the competition is about meeting other customer needs and low prices (for example: service, faster response, full customization). So it's not hard to understand that Huawei's software approach would be a business model of entering customer scenarios through low prices or even free software, then obtaining profits through hardware. The problem with this model is that there can't be too many customers. Once the battle line is stretched too long, when project budgets and hardware profits can't cover the R&D costs and support costs of customized software, this model will have problems.

I think that if you want to create scalable, sustainable profits through software, you need two key points:

Ecosystem: Software can form an ecosystem or organically integrate with existing ecosystems. The ecosystem complements the capabilities of a single product, thus further forming solutions.
Channel: Efficient distribution channels and support channels ensure that after user scale-up, as a vendor, sales and after-sales costs don't grow with customer growth (at least the slope of cost growth needs to be gentler).

Both are indispensable. For the first point, open-source software building ecosystems is very natural. Developers and solution providers will naturally achieve solution coverage through combinations of different open-source software. This efficiency is hard for closed-source customized software to keep up with. I won't elaborate on this.

For the second point, the ideal channel is actually the cloud. The cloud standardizes hardware, standardizes computing power, and even standardizes the delivery method of computing power, especially public cloud. The benefit of everything being standardized is that it can be automated. This is the real value for software vendors.

So the open source + cloud model: On the open-source side, it completes developer mindshare and solution formation, and on the cloud side, it completes extremely efficient distribution and value delivery. Looks beautiful, doesn't it? Theoretically, there's no problem, but some friends will definitely challenge me: In this model, what's the place for you open-source software vendors? Why doesn't the cloud provide open-source software services themselves? The high-profile "AWS bloodsucking" incidents in recent years that forced a bunch of open-source companies and projects to change licenses are an example.

Regarding this question, my view might be a bit different from mainstream opinion:

Cloud is eating open-source? No, open-source is eating the cloud.

Cloud vendors are like the operators back in the day, occupying the first position in customer contact, so naturally they place their own products on critical paths. But we all saw what happened with Mobile Dream Network and Fetion later. Let's use Fetion as an example. Do you remember that as China Mobile's Fetion, it couldn't communicate with China Unicom and China Telecom phone numbers back then? It wasn't until WeChat appeared later that it actually connected various operators, so the market pattern showed a clear watershed. Who the operator is doesn't matter—as long as the network is connected and the signal is good. For the cloud, it's the same. AWS definitely won't provide comfortable migration and connection solutions for GCP, and vice versa. But for customers, this choice is like forcing users to choose between China Mobile's Fetion or China Unicom's WoYou (I bet you've never even heard of WoYou 😊). Users will definitely say: Sorry, I don't want either. I'll choose WeChat. On the other hand, for providing open-source software services on the cloud, cloud vendors' own investment may not necessarily be more than the company behind this open-source project. A good example is Databricks, the company of the Spark founding team, which is also a company that provides Spark services 100% on AWS. Compared to AWS's official EMR, Databricks is not at a disadvantage at all—even customers and products surpass the native EMR. It's the same as how Fetion's development team quality definitely wasn't as high as WeChat's.

Due to the neutrality of open-source software, open-source software becomes almost the only option for users to maintain unified experience and unified services across multiple cloud vendors. Because of the existence of open-source software and open-source service providers, I believe the market will reach a balance: Cloud vendors will continue to optimize what they're good at, truly turning cloud infrastructure capabilities into a scaled business like water, electricity, and gas. Open-source software vendors build services based on the cloud's standard infrastructure and deliver business value. Open-source software projects and communities, due to vendors' continuous support, continue to flourish and occupy more users' minds. The three form a closed loop of the value chain. Don't rush—let the bullets fly for a while.

I've written thousands of words discussing open source. Finally, I'd like to end with a quote from "The Cathedral and the Bazaar" that I really like:

"Often, the most striking and innovative solutions come from realizing that your concept of the problem was wrong."

TIP

Article republished from: The Cathedral Will Fall, But the Bazaar Will Endure

Cathedrals Fall, Markets Endure ​

Is Code the Core Competitiveness? ​

If Code Isn't the Core Competitiveness, What Is? ​

The Vitality of Open-Source Software Comes from Scenario Monopoly, and Behind It, the More Essential Monopoly Is Talent Monopoly ​

Iteration, Iteration, Iteration—Only High-Speed Iteration Can Remain Invincible ​

How to Make Money? The Future Is in the Cloud ​