AI Creativity Flashcards by Jasper Chesselet

What are the three types of art described in the taxonomy of computer-generated art?

C-art, G-art, and CG-art

How well did you know this?

Not at all

Perfectly

What is C-art?

Art that uses computers as part of the art-making process

How well did you know this?

Not at all

Perfectly

What is G-art?

Art that is generated, at least in part, by some process not under the artist’s direct control

How well did you know this?

Not at all

Perfectly

What is CG-art?

Art produced by leaving a computer program to run by itself, with minimal or zero human interference

How well did you know this?

Not at all

Perfectly

What are passive tools in the context of generative systems?

Tools that make no attempt to alter the user’s input, e.g., Microsoft Paint

How well did you know this?

Not at all

Perfectly

What are active tools in the context of generative systems?

Tools that actively interpret and process user inputs, adding things to it, e.g., the sketch pad mentioned in the lecture

How well did you know this?

Not at all

Perfectly

What is a first-order Markov model in text generation?

A model where the next state depends only on the current state (word)

How well did you know this?

Not at all

Perfectly

What is a second-order Markov model in text generation?

A model where the next state depends on the two previous states (words)

How well did you know this?

Not at all

Perfectly

What are the three parts of the example AI system for generating pop music?

Lyric generator using GPT-2 transformer model, 2. Music generator using Music-VAE auto-encoder model, 3. Singing voice synthesis using the DiffSinger model

How well did you know this?

Not at all

Perfectly

What is the basic process for generating text using a Markov model?

Pick a random initial state, 2. Select from possible next states, 3. If no possible next state, go back to step 1

How well did you know this?

Not at all

Perfectly

Name three examples of more complex generative text models.

Variable order Markov model, Long short-term memory network (LSTM), Transformer network

How well did you know this?

Not at all

Perfectly

What are some features to consider when describing a generative system?

System architecture, Number of agents, Roles, Environment, Corpus, Input, Output, Communication, Human interactive modality, Task, Evaluation

How well did you know this?

Not at all

Perfectly

How does a second-order Markov model differ from a first-order model?

It considers the two previous states (words) instead of just the current state

How well did you know this?

Not at all

Perfectly

What is the main difference between more complex generative models and simple Markov models?

They have a more complex method for picking the next state/output, including more complex state representations

How well did you know this?

Not at all

Perfectly

In the context of generative systems, what is a corpus?

The collection of data (e.g., text, images, music) that the system uses to learn and generate new content

How well did you know this?

Not at all

Perfectly

What is the main advantage that transformers add to language models?

They add context awareness to embeddings, allowing for a combination of contextual and sequential data

How well did you know this?

Not at all

Perfectly

How does self-attention work in transformer networks?

It creates a new type of embedding that incorporates information about other words in the context, not just the current word

How well did you know this?

Not at all

Perfectly

What are the two main ways of encoding input in recurrent neural networks?

One-hot encoded vectors, 2. Embeddings

How well did you know this?

Not at all

Perfectly

What is the ‘bag of words’ approach good for, and what is its limitation?

It’s good for sentiment analysis, but not great for generating text as it ignores sequence

How well did you know this?

Not at all

Perfectly

When was the transformer architecture first reported?

2017

How well did you know this?

Not at all

Perfectly

What are the two main components that transformers model?

Sequence and context via self-attention

How well did you know this?

Not at all

Perfectly

How many parameters does GPT-2 have?

1.5 billion parameters

How well did you know this?

Not at all

Perfectly

How many layers does GPT-2 have?

48 layers

How well did you know this?

Not at all

Perfectly

What is meant by ‘zero-shot’ concept in relation to GPT-2?

GPT-2 can perform tasks it wasn’t specifically trained for, outperforming some specialized models

How well did you know this?

Not at all

Perfectly

What is Huggingface?

A platform on a mission to democratize good machine learning, providing tools and models for NLP

Why was GPT-2 initially considered 'too dangerous to release'?

Due to concerns about potential malicious applications of the technology

What is the 'auto-regressive mode' of GPT-2?

It can generate an endless stream of words based on previous output

How much text data was GPT-2 trained on?

40G of text

What are 'attention heads' in the context of transformers?

Multiple projections of attention, allowing the model to focus on different aspects of the input simultaneously

What is the main advantage of using pre-trained models like those from Huggingface?

They allow for quick implementation and fine-tuning of state-of-the-art language models without training from scratch

What is a latent space in the context of machine learning?

A compact way of describing a dataset, representing the learned statistical structure through a reversible dimension reduction technique

How does a variational autoencoder (VAE) differ from a standard autoencoder in terms of latent space representation?

VAEs encode to parameters of a statistical distribution (mean and variance), while standard autoencoders encode to a vector

What is the main advantage of using a VAE over a standard autoencoder for generative tasks?

VAEs force all areas of the latent space to be meaningfully decodable, making them more useful for generative exploration

What does the 'recurrent' part in Music-VAE refer to?

It uses recurrent neural network components, specifically LSTM (Long Short-Term Memory) units

How much training data was used for Music-VAE?

1.5 Million MIDI files from the web

What is the dimension of the latent vector in Music-VAE?

256 or 512 dimensions

What are the three main components of Music-VAE's architecture?

Encoder (Bidirectional LSTM), Latent Space, and Hierarchical Decoder

How does Music-VAE represent musical sequences?

It encodes 2 or 16 bar musical sequences including pitches, durations, and timing, potentially with multiple parts (e.g., bass, melody, drums)

What is the 'unrelated fragments problem' in music generation?

When sampling from latent space, generated sequences may not have coherent long-term structure

How can one explore the latent space of a trained Music-VAE model?

By sampling random vectors, permuting existing vectors, or making subtle changes to vectors in the latent space

What is self-supervised learning in the context of Music-VAE?

The model is trained to reproduce its inputs via encoding and decoding, without requiring separate labeled data

Why are VAEs particularly useful for creative applications?

They allow for smooth interpolation between points in the latent space, enabling the generation of new, coherent samples

What is the process of 'permuting the sample' in latent space exploration?

Generating a latent vector, saving it, reading it back, and then making alterations to explore variations

How does Music-VAE handle multiple instrument tracks?

It can encode and decode sequences with multiple parts, such as bass, melody, and drums

What is the advantage of using a hierarchical decoder in Music-VAE?

It ensures better use of the latent representation, allowing for more coherent long-term structure in generated sequences

What are the three primary challenges in singing voice synthesis?

1. Timing, 2. Expressiveness, 3. Holding long notes

What was the basis for HAL 9000's singing voice in '2001: A Space Odyssey'?

A tube model of the vocal tract, arranged by Joan Miller and Max Mathews

What is 'Pink Trombone' in the context of voice synthesis?

A physical model of the vocal tract for speech synthesis

What technique was used in the 'Speak and Spell' toy from the 1970s?

LPC (Linear Predictive Coding)

What was a significant development in voice synthesis during the 1990s?

Realtime synthesis and the use of HMM (Hidden Markov Model) systems

Name three recent deep learning-based singing voice synthesis systems.

1. XiaoiceSing (2020), 2. HiFiSinger (2020), 3. DiffSinger (2021)

What are the main components of the DiffSinger system?

Language Model, Speech Model, and Vocoder Model

What is the purpose of the Language Model in DiffSinger?

To convert text input into phonetic representations

What does the Speech Model in DiffSinger produce?

Mel (spectral) features

What is the role of the Vocoder in DiffSinger?

To convert the Mel features into actual sound waves

What dataset was used in the LJSpeech model mentioned in the notes?

A public domain speech dataset of 13,100 short audio clips from a single speaker reading non-fiction books

What are Mel features in the context of speech synthesis?

Spectral frames or tiny slices of audio that have been transformed, similar to an embedding

Name three key terms used in modern speech synthesis systems.

Embedding, Convolutional, Transformer

What was a significant development in voice synthesis during the 2000s?

Vocaloid and concatenative spectral synthesis

How do deep learning models in the 2010s-20s differ from earlier voice synthesis approaches?

They model sound from the audio level, not using symbolic models

What are the main steps in putting together a complete AI music generation system according to the lecture notes?

1. Generate lyrics, 2. Generate music, 3. Extract melody, 4. Sing, 5. Mix the singing with the backing

What tool is mentioned for generating lyrics in the AI music system?

GPT-2

What Python library is used to extract melody from a MIDI file?

mido

What is a challenge in extracting melody from a MIDI file?

There is no standard MIDI channel for melody, unlike percussion which is always channel 10

How is timing handled when extracting notes from a MIDI file?

By accumulating the time values of each message and printing the cumulative time with each note

What format does the singing synthesis program expect for note input?

A list of notes with repetitions indicating duration, e.g. 'c,g,eee'

Name three tools mentioned for mixing the singing with the backing track.

Reaper, Audacity, and librosa

Who created Reaper, and what other famous software did they create?

Justin Frankel, who also created Winamp

What is Audacity?

A free and open-source audio editor

What are the basic steps for mixing audio using librosa?

1. Load the files, 2. Normalize/set levels, 3. Pad the shorter track with zeros, 4. Add the tracks together, 5. Write to disk

What file format is mentioned for the final mixed output?

WAV (mix.wav)

What popular chord progression is mentioned as an example for music generation?

D, A, Bm, G (referred to as a popular progression used in '4 chords' by Axis of Awesome)

How is the quality of the generated melody assessed in the described system?

By listening to it and deciding whether to proceed or keep sampling (note: the lecture mentions this could potentially be automated)

What Python script is mentioned for converting MIDI to a note list?

midi_to_note_list.py

AI Creativity Flashcards

AI Creativity Case Study (74 cards)