The trick to get around this is to move smoothly up and down the gradient of social interaction intensity, never dropping below a basic floor of presence: the sense that there are other people in the same place as you.
Instead of having two modes, “in a call” and “on my own,” we need to think about multiple ways of being together which, minimally, could be:
In a video call
In an anteroom to a video call, hearing the sound of others
In a doc together
On my desktop but with the sense that colleagues are around
And the job of the designer is to ensure that their software ensures the existence of these different contexts, instead of having the binary on-a-call/not-on-a-call, and to design the transitions between them.
What keeps me busy in my classes is trying to help my students learn how to think. They say, "Rob holds his hands like this...," and they don't know that the reason I hold my hands like this is not to make myself look that way. The end result is not to hold the gun that way; holding the gun that way is the end result of doing something else.
…The more general issue is that a person who doesn't understand the thing they're trying to copy will end up copying unimportant superficial aspects of what somebody else is doing and miss the fundamentals that drive the superficial aspects. This even happens when there are very detailed instructions. Although watching what other people do can accelerate learning, especially for beginners who have no idea what to do, there isn't a shortcut to understanding something deeply enough to facilitate doing it well that can be summed up in simple rules, like "omit needless words".