AI Creativity Flashcards

AI Creativity Case Study

1
Q

What are the three types of art described in the taxonomy of computer-generated art?

A

C-art, G-art, and CG-art

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is C-art?

A

Art that uses computers as part of the art-making process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is G-art?

A

Art that is generated, at least in part, by some process not under the artist’s direct control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is CG-art?

A

Art produced by leaving a computer program to run by itself, with minimal or zero human interference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are passive tools in the context of generative systems?

A

Tools that make no attempt to alter the user’s input, e.g., Microsoft Paint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are active tools in the context of generative systems?

A

Tools that actively interpret and process user inputs, adding things to it, e.g., the sketch pad mentioned in the lecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a first-order Markov model in text generation?

A

A model where the next state depends only on the current state (word)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a second-order Markov model in text generation?

A

A model where the next state depends on the two previous states (words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three parts of the example AI system for generating pop music?

A
  1. Lyric generator using GPT-2 transformer model, 2. Music generator using Music-VAE auto-encoder model, 3. Singing voice synthesis using the DiffSinger model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the basic process for generating text using a Markov model?

A
  1. Pick a random initial state, 2. Select from possible next states, 3. If no possible next state, go back to step 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name three examples of more complex generative text models.

A

Variable order Markov model, Long short-term memory network (LSTM), Transformer network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some features to consider when describing a generative system?

A

System architecture, Number of agents, Roles, Environment, Corpus, Input, Output, Communication, Human interactive modality, Task, Evaluation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a second-order Markov model differ from a first-order model?

A

It considers the two previous states (words) instead of just the current state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main difference between more complex generative models and simple Markov models?

A

They have a more complex method for picking the next state/output, including more complex state representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the context of generative systems, what is a corpus?

A

The collection of data (e.g., text, images, music) that the system uses to learn and generate new content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main advantage that transformers add to language models?

A

They add context awareness to embeddings, allowing for a combination of contextual and sequential data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does self-attention work in transformer networks?

A

It creates a new type of embedding that incorporates information about other words in the context, not just the current word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two main ways of encoding input in recurrent neural networks?

A
  1. One-hot encoded vectors, 2. Embeddings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the ‘bag of words’ approach good for, and what is its limitation?

A

It’s good for sentiment analysis, but not great for generating text as it ignores sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When was the transformer architecture first reported?

A

2017

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two main components that transformers model?

A

Sequence and context via self-attention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How many parameters does GPT-2 have?

A

1.5 billion parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How many layers does GPT-2 have?

A

48 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is meant by ‘zero-shot’ concept in relation to GPT-2?

A

GPT-2 can perform tasks it wasn’t specifically trained for, outperforming some specialized models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is Huggingface?

A

A platform on a mission to democratize good machine learning, providing tools and models for NLP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why was GPT-2 initially considered ‘too dangerous to release’?

A

Due to concerns about potential malicious applications of the technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the ‘auto-regressive mode’ of GPT-2?

A

It can generate an endless stream of words based on previous output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How much text data was GPT-2 trained on?

A

40G of text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are ‘attention heads’ in the context of transformers?

A

Multiple projections of attention, allowing the model to focus on different aspects of the input simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the main advantage of using pre-trained models like those from Huggingface?

A

They allow for quick implementation and fine-tuning of state-of-the-art language models without training from scratch

31
Q

What is a latent space in the context of machine learning?

A

A compact way of describing a dataset, representing the learned statistical structure through a reversible dimension reduction technique

32
Q

How does a variational autoencoder (VAE) differ from a standard autoencoder in terms of latent space representation?

A

VAEs encode to parameters of a statistical distribution (mean and variance), while standard autoencoders encode to a vector

33
Q

What is the main advantage of using a VAE over a standard autoencoder for generative tasks?

A

VAEs force all areas of the latent space to be meaningfully decodable, making them more useful for generative exploration

34
Q

What does the ‘recurrent’ part in Music-VAE refer to?

A

It uses recurrent neural network components, specifically LSTM (Long Short-Term Memory) units

35
Q

How much training data was used for Music-VAE?

A

1.5 Million MIDI files from the web

36
Q

What is the dimension of the latent vector in Music-VAE?

A

256 or 512 dimensions

37
Q

What are the three main components of Music-VAE’s architecture?

A

Encoder (Bidirectional LSTM), Latent Space, and Hierarchical Decoder

38
Q

How does Music-VAE represent musical sequences?

A

It encodes 2 or 16 bar musical sequences including pitches, durations, and timing, potentially with multiple parts (e.g., bass, melody, drums)

39
Q

What is the ‘unrelated fragments problem’ in music generation?

A

When sampling from latent space, generated sequences may not have coherent long-term structure

40
Q

How can one explore the latent space of a trained Music-VAE model?

A

By sampling random vectors, permuting existing vectors, or making subtle changes to vectors in the latent space

41
Q

What is self-supervised learning in the context of Music-VAE?

A

The model is trained to reproduce its inputs via encoding and decoding, without requiring separate labeled data

42
Q

Why are VAEs particularly useful for creative applications?

A

They allow for smooth interpolation between points in the latent space, enabling the generation of new, coherent samples

43
Q

What is the process of ‘permuting the sample’ in latent space exploration?

A

Generating a latent vector, saving it, reading it back, and then making alterations to explore variations

44
Q

How does Music-VAE handle multiple instrument tracks?

A

It can encode and decode sequences with multiple parts, such as bass, melody, and drums

45
Q

What is the advantage of using a hierarchical decoder in Music-VAE?

A

It ensures better use of the latent representation, allowing for more coherent long-term structure in generated sequences

46
Q

What are the three primary challenges in singing voice synthesis?

A
  1. Timing, 2. Expressiveness, 3. Holding long notes
47
Q

What was the basis for HAL 9000’s singing voice in ‘2001: A Space Odyssey’?

A

A tube model of the vocal tract, arranged by Joan Miller and Max Mathews

48
Q

What is ‘Pink Trombone’ in the context of voice synthesis?

A

A physical model of the vocal tract for speech synthesis

49
Q

What technique was used in the ‘Speak and Spell’ toy from the 1970s?

A

LPC (Linear Predictive Coding)

50
Q

What was a significant development in voice synthesis during the 1990s?

A

Realtime synthesis and the use of HMM (Hidden Markov Model) systems

51
Q

Name three recent deep learning-based singing voice synthesis systems.

A
  1. XiaoiceSing (2020), 2. HiFiSinger (2020), 3. DiffSinger (2021)
52
Q

What are the main components of the DiffSinger system?

A

Language Model, Speech Model, and Vocoder Model

53
Q

What is the purpose of the Language Model in DiffSinger?

A

To convert text input into phonetic representations

54
Q

What does the Speech Model in DiffSinger produce?

A

Mel (spectral) features

55
Q

What is the role of the Vocoder in DiffSinger?

A

To convert the Mel features into actual sound waves

56
Q

What dataset was used in the LJSpeech model mentioned in the notes?

A

A public domain speech dataset of 13,100 short audio clips from a single speaker reading non-fiction books

57
Q

What are Mel features in the context of speech synthesis?

A

Spectral frames or tiny slices of audio that have been transformed, similar to an embedding

58
Q

Name three key terms used in modern speech synthesis systems.

A

Embedding, Convolutional, Transformer

59
Q

What was a significant development in voice synthesis during the 2000s?

A

Vocaloid and concatenative spectral synthesis

60
Q

How do deep learning models in the 2010s-20s differ from earlier voice synthesis approaches?

A

They model sound from the audio level, not using symbolic models

61
Q

What are the main steps in putting together a complete AI music generation system according to the lecture notes?

A
  1. Generate lyrics, 2. Generate music, 3. Extract melody, 4. Sing, 5. Mix the singing with the backing
62
Q

What tool is mentioned for generating lyrics in the AI music system?

A

GPT-2

63
Q

What Python library is used to extract melody from a MIDI file?

A

mido

64
Q

What is a challenge in extracting melody from a MIDI file?

A

There is no standard MIDI channel for melody, unlike percussion which is always channel 10

65
Q

How is timing handled when extracting notes from a MIDI file?

A

By accumulating the time values of each message and printing the cumulative time with each note

66
Q

What format does the singing synthesis program expect for note input?

A

A list of notes with repetitions indicating duration, e.g. ‘c,g,eee’

67
Q

Name three tools mentioned for mixing the singing with the backing track.

A

Reaper, Audacity, and librosa

68
Q

Who created Reaper, and what other famous software did they create?

A

Justin Frankel, who also created Winamp

69
Q

What is Audacity?

A

A free and open-source audio editor

70
Q

What are the basic steps for mixing audio using librosa?

A
  1. Load the files, 2. Normalize/set levels, 3. Pad the shorter track with zeros, 4. Add the tracks together, 5. Write to disk
71
Q

What file format is mentioned for the final mixed output?

A

WAV (mix.wav)

72
Q

What popular chord progression is mentioned as an example for music generation?

A

D, A, Bm, G (referred to as a popular progression used in ‘4 chords’ by Axis of Awesome)

73
Q

How is the quality of the generated melody assessed in the described system?

A

By listening to it and deciding whether to proceed or keep sampling (note: the lecture mentions this could potentially be automated)

74
Q

What Python script is mentioned for converting MIDI to a note list?

A

midi_to_note_list.py