Is fair comparison a must in AB testing?

I wonder if the elements in AB test need to be exactly the same so as to allow a fair comparison?

I found it personally depending on different scenarios.

For example if it's a AB test in smaller scale (e.g. a module to recommend users with similar products on product details page) and concrete goal (e.g. testing what layout drives a higher AOV) with 1 control and 2 variants of layout & placement - each of the product cards in this module contains price (sale price and original price), product name and image. I tend to make sure the elements are used across the variants (e.g. if there is a discount price, all variants should have it)

While if it's a test involves more of a conceptual thought or hypothesis (e.g. airbnb tests multiple variants of homepage simultaneously) and I found the variants can be executed quite differently (e.g. one concept can focus on room search vs the other being inspiration driven and etc) It's technically hard for every elements or variables to be the same and comparable and need further drill down of reasons for why it's winning even if there is a winning variant.

Wonder someone can share their experiences and thoughts on this?