Text reproduction with machine learning

HSDA18-Workshop-Moritz Ebeling.png

In the era of data, intelligence and computing, the authenticity of any digital content is not longer guaranteed. With machine learning technology, a human voice can be imitated, a moving image can be manipulated in real time, texts can be phrased by using raw data. All to make up something „real“. To get a glimpse of what’s going on, we built our own deep learning network! In this workshop, we trained a given neural network on original text to reproduce it, remixed it, produced more of it. Sometimes the output was complete rubbish, sometimes the algorithm repeated passgages from the original. But certainly it invented or rehashed content based on the given input, so who is faking whom?

This workshop was fun for beginners and pros!

For this workshop we needed:

(1) For beginners, this is a quite heavy task to either find out which version you have installed or to update to version 3.

Basic preparation

This workshop requires a few preparations. Please follow the instructions to get started. You also can find this page on hd18.moritzebeling.com.

Install Python 3.6

$ python -V

If that returns something in between 3.3 and 3.6, everything is good and you don’t need to continue reading this page.

However it is possible, that it returns 2.x even if you have the disired version installed. To be sure, type

$ python3 -V

You will first have to uninstall any version higher than 3.6.x. If you installed Python from the installer package (I’m sorry!), find Python 3.x in your applications folder, move it to the trash and then carefully type

$ sudo rm -rf /Applications/Python\ {version.number}/

You find the (now correct) installer on the official website. Confirm by checking for the version again. If everything is fine, you might want to continue with installing Tensorflow.

Type if you want the command python to interpret python3 instead of some old version, please type

$ alias python=python3

However, the effect of this action might not last forever and be undone soon for some reason.

Install Tensorflow 1.8 or 1.9

python3 -c 'import tensorflow as tf; print(tf.__version__)'

If that throws an error saying somethin with invalid syntax, please check for your Python version and downgrade.

Pip is a Python package manager that let’s you install Tensorflow and other software. You will need pip3 with version >10. Please check your version with:

pip3 -V

Current version is 18, so you might (or will have to) upgrade. Please try one of those:

$ pip3 install --upgrade

$ sudo pip3 install --upgrade

Then check if installation was successfull by checking vor the version again (see above). Then try installing Tensorflow again.

$ pip3 install tensorflow

If that seemed to be successful, confirm the installation by checking for the version (see above). If not, continue with step 2 from this installation guide.

Error "Could not find ..."

This error seems to be quite common. Then try

$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.9.0-py3-none-any.whl

This command can also be used to upgrade your version of Tensorflow.

Check for the version again to assure, that Python and Tensorflow are working nice together.

Here you find the official install guides for various platforms.

Some official Tensorflow tutorials to get started with.

Archive of tested Tensorflow models on GitHub.

Try these commands

$ pip uninstall tensorflow $ pip3 uninstall tensorflow

Install the neurual network

Create a working directory

~/my-folder/ • other stuff that you may have here • tensorflow/

   •   _model/
       A folder with the machine learning model inside. You don’t have to do anything here.
   •   rnn.py
       The program that trains and plays the neural network. You can open it to adjust parameters, but you can do that later
   •   my-new-project/
       •   input/
           •   your-input-data.txt

Run your first neural network, let your computer do the work

python3 rnn.py




The preview sequence, loss and accuracy calculations as well as regularly generated text blocks give you an impression on how the training progresses.





Regarding training

sequence_length: 30

   The string length of a training sequence
   If you are training on poetry, where rhyme and the length of lines is really importat, increase a little bit, e.g. 40-50.

batch_size: 200

   Training sequences inside one batch (200)
   The size of one batch is then sequence_length*batch_size, which has to be notably lower than the amount of text input that you provide. In other words, bring more text or decrease batch_size.

validation: True

   Wether validation is switched on. Slows down training process

epochs: 500

   Number of training epochs.

Regarding play

output_length: 10000

   Length of text to be produced when playing

top_n: 3

   Number of possibilities that are involved in the prediction.
   1 = only the highest scoring possibility makes it, danger of repeating input
   2 or 3 = allows for some variation
   10 = might become rubbish or non-language again
   This value is used for text generation during training and play


What is happening?

Machine learning


Recurrent neural networks

H He Hel Hell Hello

Other resources

Theory on recurrent neural networks Video introduction to recurrent neural networks:

Some excerpts from generated texts:

Neural Aaron

"Instead of a money, I was pro-Castro to a couple months, why now good at some sense of the process of their evonds and the topic to the stove of the basiness on the street. Theyre so rare. If you want to have a business problem. This is a stable talented was they are. If you want to go to get studies. And if were actually working on and started an argument. Instead of a monthly, whenever this was a group of the doctors who supposed. This sensifil was the top"


"When misfortune confounds us in an instant we are saved by the humblest actions of memory or attention:

the taste of fruit, the taste of water, that face returned to us in dream, the first jasmine flowers of November, the infinite yearning of the compass, a book we thought forever lost,

the pulsing of a hexameter, the little key that opens a house, the smell of sandalwood or library, the ancient name of a street, the colourations of a map,

an unforeseen etymology, the smoothness of a filed fingernail, the date that we were searching for, counting the twelve dark bell-strokes, a sudden physical pain."


"There is a commodity, is with the value of the coat is the same as the coat and the labour of the producers, with the same as they are exchangeable in the same proportion. In the first place, the linen as the circulating medium, and contequently at the same time the price of the commodities therefore the products of the labour of the individual producer is a commodity. He thenes a commodity in its sterial character of labour bestowed in the production of commodities. It becomes value is a commodity, as being actually compared with a commodity as a commodity, and therefore the sum of the prices to be realised as the production of a commodity becomes doubled, the labour time necessary in which they are exchangeable with a definite quantity of has or Bailey to be a use in accordance with the social division of labour, he must always been taked by the some propertion in which the value of a commodity is an exchange-value, and therefore this equivalent"

Neural Donna


"This is a common longuage, like any other time, we are not innocence is a suptoid tritical aptrociated by machines, and thinging a new developmental competition is a network and ethnography, and their intimate, uncture, and monstrous is a major form of contention. But these each of the social relations on science and technology proveses; which we have alsocindicated in the social relations of science and technology provide fresh moniters the mochice of the most primitive, and its competent, potent sistems, cultural revolutionary subjects might be anoun the definition of the self, the intersise from without realistically intersived in the face feainist sensitivity, a dimage of the oppositional intorsection of feminism account be a view of papsidely is notestate."

The Correspondent headlines:

"This is the voice of the safety syndrome Why we still stand in the way of our elections The city of the future of the basic income This weekend: the fight against the year How a government opens a political debate about who is willing Why the media is expelled as a good conversation Our own elections are going to change the world. What I learned about the difference between games for power The problem (and 9 more stories to catch up to) An ode to Jonistori"

Neural Queering the Map: