Raven Matrices by Analogy Reasoning

7 min readDec 8, 2020

Introduction

How smart are you? Is there a way to measure your intelligence? Is there a truly scientific way to measure intelligence in the first place? How do children understand bedtime stories? How do people solve puzzles? Raven matrices answer all these questions and more!

Raven matrices is a nonverbal group test typically used in educational settings in other words it’s a very effective test of general human intelligence.

Consider the following figure, we want to find the image that best fits in the blank portion on top.

Figure 1: Example problem like those on the Raven’s Progressive Matrices tests (Kunda, McGreggor, and Goel 2013).

While this may be easy for you if you used a top-down approach. In which we consider the top two elements are reflected across the horizontal axis, and then reflecting the bottom element. It’s more difficult to use a bottom-up approach which is just seeing the answer emerging in the empty space. This is called a Gestalt or figural approach.

Depending on the person one approach might seem easier than the other, however, both of them involve abstract reasoning. Abstract reasoning is considered a hallmark of human intelligence and is very complex from the computer’s standpoint.

so why bother solving Raven matrices problems? Such problems give researchers insights into the dynamic of the intelligence organization itself and the different computational models used to solve these problems with respect to the performance and approaches.

the way in which computational models learn to solve these problems and the performance of computational models compared to the performance of humans give insights into the organization of intelligence, both in humans and in artificial systems, and therefore holds significant value for research.

Background

In this post, we want to explore not only the world of raven matrices but also the latent space. The raven matrices test is known and used worldwide however until this day. Researchers and engineers aren’t able to mimic this intuition into the machine. How does the human brain be able to solve such problems without being objected to it and how can we implement the very same logic to the computer. In this project, we explored one of the ways to achieve that. The main idea is to use something called the latent space on a relatively big dataset and use this dataset to create vectors to tell the machine what’s the correct image should look like.

Let’s take a look at this raven problem:

For humans, the answer is quite clear, right? but for a machine that has never practiced the raven problems before do you think they can solve it? No!

Now imagine if those images were a dataset and we use them to define specific vectors that tell us which image to choose, now it’s easier for the machine to identify the answer as well, since we are literally specifying it, right?

Another example is shown down below:

Using the latent space we are able to find the correct vectors that will tell the machine which image to choose and hence it will allow the machine to solve such problems without seeing it before.

Methods

First, we will generate our Raven’s Progressive Matrices dataset.
Second, we will create several autoencoders: simple autoencoder, convolutional autoencoder, variational autoencoder, and Variational autoencoder with convolutional layers.
Then, we will compare those autoencoders.
Finally, we will generate the images from the latent space.

Generating the Dataset

First, we generate the dataset that we wanted to work on. In this step, we choose geometric shapes and lines. why? because we learn about these simple figures at a really young age so solving such problems should be relatively easy on the machine.

Creating the autoencoders

we needed to create the autoencoders so that we are able to train a neural network to learn latent features about the raven problem, and then every time we need to solve a raven problem and choose an image, we can actually study the problem images and generate the vectors and choose the image.

Enough talking about the autoencoders what’s autoencoders?!

Autoencoders are surprisingly simple neural architectures. They are basically a form of compression, similar to the way an audio file is compressed using MP3, or an image file is compressed using JPEG.

Figure 4: demonstrative simple autoencoder

In our project, we have used 4 different autoencoders: simple autoencoder, convolutional autoencoder, variational autoencoder, and variational autoencoder with convolutional layers.

Let’s compare the four of them:

The simple autoencoder: easier to train, fast, more stable, but not suitable for working with images and bad at encoding/decoding the data.
The convolutional autoencoder: good for working with images, but slower at training it and lacks features of variational autoencoders like the ability to produce disentangled latent representations and the ability to generate new data.
The variational autoencoder: just like the simple autoencoder it also easier to train, fast, more stable, and is known to produce more disentangled latent representations and can generate new data. However, the main disadvantage is the absence of convolutional layers that are useful to work with images.
The variational autoencoder with convolutional layers: It has the advantages of both the convolutional autoencoder and the variational autoencoder. However, it’s really slow in training, and also less stable and predictable. This was especially noticeable when we added one dense layer and the accuracy dropped significantly.

So why did we settled with the variational autoencoder with convolutional layers

we choose it because it’s more suitable for work with images since it has properties of both the CAE and VAE it should in theory produce less entangled data and is able to generate new data from latent space. Also, in our experiments, this network produced the best results when we tried to apply transformation vectors to original images.

Generating the images

Finally, using our autoencoders we are able to get the image vectors that based on it we are successfully able to generate the correct compressed images, and then we are able to decompress them to get the final correct images and solve the raven problem.

Figure 5: original answer

In the above picture, we see the correct answer to solve the raven test. The above picture represents the successful prediction that our autoencoder should generate accurately.

Figure 6: the predicted image

In the above picture is we present a prediction of the correct image. This prediction was obtained by applying the transformation vector to the given image (usually the third row, first column) and then use the decoder to predict the figure.

Figure 7: the generated image after decompression

In the above image, we see the final, compressed then decompressed image which our variational autoencoder with convolutional layers predicts. It’s indeed the correct image that solves the raven problem!

Results

we were able to successfully solve the raven matrices by generating our dataset then creating the autoencoder to generate the vectors in the latent space then generate the images and decompress them to successfully get the correct images to solve the raven matrices.

Hence, we are able to teach the machine a new way to mimic human intelligence and be able to solve problems such as the raven problems.

Project Challenges

In this project, we have faced many challenges such as:

The usage of 3x3 problems was making the process of getting the transformation vectors incredibly hard.
The GANS was a nightmare to develop. Since we wanted to make our project as authenticated as possible we wanted to create our own GANS, In which we have failed. why? well, GANs are known to be highly unstable, we couldn’t train it properly because it was always predicting false class 100% correct and the other 100% wrong.
The few regular patterns of 3x3 models.
Another challenge is that we were making the bottleneck too narrow and as a result the latent representations were highly entangled.
Finally, the encoder/decoder architecture wasn’t the best since the decompressed images didn’t look like originals.

Contribution

Mohga: Researching GANS, Latent Space, Raven Matrices, Autoencoders, and Raven Datasets, implementation of GANs, presentation, Blog content.

Daria: Researching Autoencoders, Datasets, Raven Matercies, GANS, implementation of Autoencoders, presentation, Blog content.

Our autoencoder can be found here [Autoencoder]
Our dataset can be found here [Dataset]
Our Presentation can be found here [Presentation ]

References

Miko laj Ma lki´nski, Jacek Ma´ndziuk. Multi-Label Contrastive Learning for Abstract Visual Reasoning. https://arxiv.org/pdf/2012.01944.pdf
Tianyu Hua1, Maithilee Kunda. Modeling Gestalt Visual Reasoning on the Raven’s Progressive Matrices Intelligence Test Using Generative Image Inpainting Techniques. https://arxiv.org/pdf/1911.07736.pdf
Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu. RAVEN: A Dataset for Relational and Analogical Visual rEasoNing. https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_RAVEN_A_Dataset_for_Relational_and_Analogical_Visual_REasoNing_CVPR_2019_paper.pdf
Maithilee Kunda. AI, visual imagery, and a case study on the challenges posed by human intelligence tests. https://www.pnas.org/content/pnas/117/47/29390.full.pdf
Matthew Stewart, Comprehensive Introduction to Autoencoders. https://towardsdatascience.com/generating-images-with-autoencoders-77fd3a8dd368