State of the art-ful AI: VR Hub

From AI to AI and back again

This is the latest entry in our ‘State of the art-ful AI’ series, where we upload a photo to Chat GPT 4 and ask it for a photo description, then ask DALL-E 3 to generate an image using the description. How similar will they be?

Why this workflow?

Less than a year after its launch, ChatGPT and DALL-E have already seen multiple iterations. In October 2023, ChatGPT 4 gained the capability to describe uploaded photos. To track the progress of generative AI, we uploaded these AI descriptions to the Dall-E 3 image generation model to see how faithful its output is to the original photo. A variety of photos have been selected which we’ll bring to you throughout November and December.

Our original photo

This image shows a visit by the BBC to a local University in England in 2017.

The photo as described by Chat GPT 4 in 150 words

This image depicts an indoor promotional event setup by the BBC, as indicated by the large BBC logo. The setup includes a variety of interactive stations and promotional material, designed with a modern and playful aesthetic using wooden panels and bright, contrasting colors like yellow and purple. There is a “VR BOX” station, suggesting demonstrations or experiences related to virtual reality, and signs with words like “REALITY?”, “EXPLORE”, and “#FOR REAL”, hinting at the theme of exploring the boundaries between reality and virtual worlds. A microphone stand in the foreground suggests there may be live presentations or talks as part of the event.

The venue appears to be a public space, possibly a mall or a specialized event area, with people engaged in various activities: operating equipment, walking by, and overseeing the event. Bright lighting and the open design invite passersby to participate and engage with the displayed content.

Photos generated by Dall-E 3 using the AI description

Spot the difference

Using a separate chat, we asked Chat GPT what differences it finds between the original photo and the FIRST image generated above:

  1. Location: The first image appears to be set in an interior space with a staircase in the background, suggesting it could be inside a building such as a mall or an exhibition hall. The second image looks to be in a more open and brightly lit venue, possibly suggesting a different location or time of day.
  2. Exhibit Design: The design of the BBC exhibit in the first image is more rustic with what looks like wooden pallets and an industrial look. The second image shows a sleeker design with clean lines and a more modern aesthetic.
  3. Activity: In the first image, there are fewer people, and the setting appears to be in a state of preparation or breakdown with equipment cases and cables visible. In contrast, the second image shows more visitors engaging with the exhibit, suggesting it is fully operational.
  4. Lighting: The lighting in the first image is dimmer with a focus on the exhibit itself, whereas the second image has overall brighter lighting, contributing to a more inviting atmosphere.
  5. Branding: The placement of the BBC logo is different in each image. In the first, it is placed on a standalone structure; in the second, it is suspended above the exhibit. Additionally, the color scheme and presentation of the logo differ.
  6. Displays and Content: The content displayed on the screens is not visible in the first image, while the second image shows vibrant displays with graphical content.
  7. Audience: The first image captures adults within the scene, with one person on a higher level observing the setting. The second image shows a diverse group of visitors, including what appears to be younger individuals or children.
  8. Layout: The layout of the exhibit changes; the first image has a closed-off section with a large question mark, which is not present in the second image.
  9. Staging Area: The first image includes a microphone stand in the foreground, suggesting a presentation or announcement area, which is not present in the second image.
  10. VR Equipment: The second image has VR headsets on pedestals for people to try, implying interactive VR experiences.

These observations reflect the apparent differences in the setting, design, and usage of the space between the two images.

Example workflow video

Here’s our workflow in action, using an ice cream truck image. This is not an explainer video – It shows the main steps of the above ChatGPT 4 workflow in November 2023.