This is certainly a thought provoking question!
Unfortunately, Civ5 has trained many veterans that you want big, developed cities that "pull their weight;" cities with only a few population should be avoided because of the mechanics of Civ5's happiness system. In Civ6, however, there is no per city penalty, and amenities effectively become a "maximum population limit." But you even get the first 2 pop in a city amenity free.
**The trade off of all this is that you have to sink more hammers and population into settlers, which is not always easy to do, even with the card.
Currently, with everything being flat bonuses, every instance of a district is what matters for several yields (gold, faith, science, somewhat production) and so more cities=more instances of those districts. Not to mention trade routes.
With most yields, the aggregate sum is what matters: science, culture, faith, tourism, gold, great person points, and so on. This naturally tends to wide empires, but not necessarily packing.
production is a little different in that there is a time based aspect to it; you want to be able to produce certain things
fast enough to be competitive. But this usually comes into play for building wonders; being terrain bounded, this often has a secondary effect of making select cities the ones that want the high production as opposed to all of them. A group of "Non-core" cities all producing "commercial hub investment" is unbelievably powerful. Getting density with constrained territory, incentivizes packing. As does the Coliseum, which is so good that you should almost always put as many cities around it as you can. (note that you cannot swap a city's first ring of tiles to another, excepting already built districts; you don't want to accidentally mess up a powerhouse city!) As a side note, any city within range of the coliseum can have 8 population for free. Denser cities also means more value from other regional effects.
Further, city state envoy bonuses also put many districts on steroids.
With all that in mind, I think it whether to try to pack 'em in or not also has a huge civilization choice component. I think Australia, because of Outback Station adjacency, does better when they have some breathing room between cities; there's less 'loss' of tiles being eaten up by districts.
On the flip side, there are 4 civs which I think have phenomenal bonuses towards packing:
Japan
Their district adjacency ability is much stronger when they have more districts close together, which is exactly why you would pack 'em in.
England
The ability to have 2 trade routes per city is huge. A very strong strategy for marginal coastal cities is to make a triangle of city center-commercial hub-harbor; this assures 3 gold for the harbor, which via the harbor adjacency card guarantees 6 production from a shipyard. (This works for any civ; japan gets extra yield out of this. Plus you get nice gold from the harbor) England's two trade routes are important because you can put them anywhere; you can actually support your core with your "colonies" unlike other bonuses that stay local.
Aztecs
Builder-rushing districts means even snow outposts can get a district up in 5 turns once you have serfdom. Monty's leader ability means you get extra amenities from luxuries, helping support all this settling.
These other civs are really not in the same league as Germany, though.
Germany
IMO, they have the best dense city game out there. This stems from 2 factors:
Their Civ ability to support an extra district means they can place a Hansa+CH the turn they settle, and lock in the price on both.
The Hansa's adjacency bonus with CH applies to each CH. This means you can get really crazy. See illustration below for classic templates:
I will assume the craftsmen (+100% IZ adj. bonus) card is being run.
The diamond is the classic 2 city configuration. Each Hansa will produce at least 10 production, thanks to both CH's plus being adjacent to two districts. You can almost always pull this off, even if cities are not very close.
The triangle also guarantees 10 production per Hansa, but requires only 3 cities instead of 4 (for two diamonds.)
The ring is handy in case you have a resource you can put in the center. In this case, that would imply 12 production per Hansa.
The crescent is a variant on the ring. This allows one Hansa to get an extra CH adj. (+4 prod,) while the others are unchanged. This is useful if there's a resource in the way.
The 4 city cluster is much more efficient than 2 diamonds. The three core Hansas yield 14 production minimum; this bumps up to 16 if you place another district next to one of the core hansas (or the city center works for this too.) The fourth hansa can go wherever the resources make the most sense. (If you're really good, you can line up another set of cities so that it drops into the center of a Ring layout, granting a nice +18.) You can also protect all those hansas from saboteurs with just one spy in the middle, which is nice.
Obviously, these shapes can be rotated and translated to fit the terrain and fit where resources are. Usually you can configure one or two resources for each Hansa, and often another point of district adjacencies; this can push yields up to 20+, PER CITY. Since these clusters go between cities, the land around is leftover for working normally.
Consider that a normal IZ is considered "good" if it can achieve 6 production.
Six. Those are rookie numbers!
Contrast with 10-20 production; that's several mined hills worth of production, without having to work those tiles at all. You also aren't constrained by tiles that can be mined, so it's even easier to set up Ruhr valley in a good spot. There's no need to have a dense blanket of packed in cities either, just clusters of them around your empire.
And, of course, the Hansa is half price. Now that's
German Efficiency!