It's safe to assume he's talking about the chip in both cases. You can connect either of them to a battery charger, camera, or nuclear reactor but all of those 'peripherals' are completely irrelevant to this comparison.
FYI. In the embedded space, peripherals refer to the chip's built in capabilities. Typically there is a peripheral that handles SPI, a peripheral that handles I2C, a peripheral that handles USART. There might be a peripheral that handles USB, or I2S, or any number of different abilities. It's either transistors or microcode on more sophisticated MCUs that allows these peripherals, and not every pin is connected to every peripheral, which can make chip selection even trickier. It really sucks to start designing around a chip that looks like it does everything you need with enough pins to only find out that once you start doing pin layouts you can't use both USART2 and I2C because they use the same pins. Worse, is when the datasheet makes this difficult to discern, and you only find out when doing some firmware work on a devboard.
The ESP32 modules are available with or without the antenna built in, and the chips have an on-chip ethernet MAC peripheral, display support via on-chip LCD, I2C, or SPI peripherals, and camera via the on-chip CAM peripheral.
Not all versions have all these features. You can pick one based on the features you need.
FYI the RP2040 has peripherals too, a lot of the same ones.
It's very clear that GP has an odd misconception that an ESP32 is some kind of development board complete with "connectors" and possibly a battery charger, while an RP2040 is just the SoC itself.
GP is clearly talking about ESP32 boards, not the chip; they mentioned "LI Battery Controller", and "Display or Camera Connector". People sell RP2040 boards with those too, for example:
Do they sell chip-like modules with camera connectors and battery controllers, though? I don't deny that ESP32 modules look like and are soldered as though they were chips, but to me it's clear that GP was mixing up chips (and chip-like modules) with boards.
Espressif sell bare chips, modules with castellations for soldering to a PCB, and complete boards that typically contain a module. Some 3rd-party boards use a bare chip. Some 3rd-party boards also have castellations, as do Picos sold without pre-soldered headers. 3rd-party modules are also a thing.
Here's a pic of a module and several boards that demonstrates all of that.
No, that's both "just the chip", the chips have different peripherals. They both have the usual SPI, I2C, UART, ADC, etc., and they have their differences too.
I could just as well say "For an RP2040 to do USB you wire the RP2040 to the connector. For an ESP32 to do USB you wire the 32 to another chip, and that other chip to a connector."
It's clear to me GP has the impression that ESP32 is something more (a development board with an ESP32?) than the ESP32 "chip" itself.
Maybe look at the ESP32 chip's datasheet? [1] This is the first sentence in it: "ESP32 is a single 2.4 GHz Wi-Fi-and-Bluetooth combo chip designed with the TSMC low-power 40 nm technology." That's the chip. Not the development board.
Modules are "just" chip + all necessary hardware to start + antenna connector/pcb antenna. Those modules don't add any functionality to the chip.
The main benefit of those modules is: it's already certified with FCC and others, so you don't have to re-certify your design for radio communication. Since RP2040 does not have a radio, this is unnecessary.
If you do not need basic WiFi connectivity, the ESP32 is an overpriced, underpowered chip for which dozens of cheaper, more feature-rich alternatives exist.