How to Develop a Character-Based Neural Language Model in Keras

By Jason Brownlee on February 23, 2021 in Deep Learning for Natural Language Processing 86

A language model predicts the next word in the sequence based on the specific words that have come before it in the sequence.

It is also possible to develop language models at the character level using neural networks. The benefit of character-based language models is their small vocabulary and flexibility in handling any words, punctuation, and other document structure. This comes at the cost of requiring larger models that are slower to train.

Nevertheless, in the field of neural language models, character-based models offer a lot of promise for a general, flexible and powerful approach to language modeling.

In this tutorial, you will discover how to develop a character-based neural language model.

After completing this tutorial, you will know:

How to prepare text for character-based language modeling.
How to develop a character-based language model using LSTMs.
How to use a trained character-based language model to generate text.

Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Update Feb/2018: Minor update to generation for API change in Keras 2.1.3.
Update Feb/2021: Updated final code example to remove redundant line.

How to Develop a Character-Based Neural Language Model in Keras
Photo by hedera.baltica, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

Sing a Song of Sixpence
Data Preparation
Train Language Model
Generate Text

Need help with Deep Learning for Text Data?

Take my free 7-day email crash course now (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Sing a Song of Sixpence

The nursery rhyme “Sing a Song of Sixpence” is well known in the west.

The first verse is common, but there is also a 4 verse version that we will use to develop our character-based language model.

It is short, so fitting the model will be fast, but not so short that we won’t see anything interesting.

The complete 4 verse version we will use as source text is listed below.

Sing a song of sixpence,
A pocket full of rye.
Four and twenty blackbirds,
Baked in a pie.

When the pie was opened
The birds began to sing;
Wasn't that a dainty dish,
To set before the king.

The king was in his counting house,
Counting out his money;
The queen was in the parlour,
Eating bread and honey.

The maid was in the garden,
Hanging out the clothes,
When down came a blackbird
And pecked off her nose.

Sing a song of sixpence,

A pocket full of rye.

Four and twenty blackbirds,

Baked in a pie.

When the pie was opened

The birds began to sing;

Wasn't that a dainty dish,

To set before the king.

The king was in his counting house,

Counting out his money;

The queen was in the parlour,

Eating bread and honey.

The maid was in the garden,

Hanging out the clothes,

When down came a blackbird

And pecked off her nose.

Copy the text and save it in a new file in your current working directory with the file name ‘rhyme.txt‘.

Data Preparation

The first step is to prepare the text data.

We will start by defining the type of language model.

Language Model Design

A language model must be trained on the text, and in the case of a character-based language model, the input and output sequences must be characters.

The number of characters used as input will also define the number of characters that will need to be provided to the model in order to elicit the first predicted character.

After the first character has been generated, it can be appended to the input sequence and used as input for the model to generate the next character.

Longer sequences offer more context for the model to learn what character to output next but take longer to train and impose more burden on seeding the model when generating text.

We will use an arbitrary length of 10 characters for this model.

There is not a lot of text, and 10 characters is a few words.

We can now transform the raw text into a form that our model can learn; specifically, input and output sequences of characters.

Load Text

We must load the text into memory so that we can work with it.

Below is a function named load_doc() that will load a text file given a filename and return the loaded text.

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

We can call this function with the filename of the nursery rhyme ‘rhyme.txt‘ to load the text into memory. The contents of the file are then printed to screen as a sanity check.

# load text
raw_text = load_doc('rhyme.txt')
print(raw_text)

# load text

raw_text = load_doc('rhyme.txt')

print(raw_text)

Clean Text

Next, we need to clean the loaded text.

We will not do much to it here. Specifically, we will strip all of the new line characters so that we have one long sequence of characters separated only by white space.

# clean
tokens = raw_text.split()
raw_text = ' '.join(tokens)

# clean

tokens = raw_text.split()

raw_text = ' '.join(tokens)

You may want to explore other methods for data cleaning, such as normalizing the case to lowercase or removing punctuation in an effort to reduce the final vocabulary size and develop a smaller and leaner model.

Create Sequences

Now that we have a long list of characters, we can create our input-output sequences used to train the model.

Each input sequence will be 10 characters with one output character, making each sequence 11 characters long.

We can create the sequences by enumerating the characters in the text, starting at the 11th character at index 10.

# organize into sequences of characters
length = 10
sequences = list()
for i in range(length, len(raw_text)):
	# select sequence of tokens
	seq = raw_text[i-length:i+1]
	# store
	sequences.append(seq)
print('Total Sequences: %d' % len(sequences))

# organize into sequences of characters

length = 10

sequences = list()

for i in range(length, len(raw_text)):

# select sequence of tokens

seq = raw_text[i-length:i+1]

# store

sequences.append(seq)

print('Total Sequences: %d' % len(sequences))

Running this snippet, we can see that we end up with just under 400 sequences of characters for training our language model.

Total Sequences: 399

1	Total Sequences: 399

Save Sequences

Finally, we can save the prepared data to file so that we can load it later when we develop our model.

Below is a function save_doc() that, given a list of strings and a filename, will save the strings to file, one per line.

# save tokens to file, one dialog per line
def save_doc(lines, filename):
	data = '\n'.join(lines)
	file = open(filename, 'w')
	file.write(data)
	file.close()

# save tokens to file, one dialog per line

def save_doc(lines, filename):

data = '\n'.join(lines)

file = open(filename, 'w')

file.write(data)

file.close()

We can call this function and save our prepared sequences to the filename ‘char_sequences.txt‘ in our current working directory.

# save sequences to file
out_filename = 'char_sequences.txt'
save_doc(sequences, out_filename)

# save sequences to file

out_filename = 'char_sequences.txt'

save_doc(sequences, out_filename)

Complete Example

Tying all of this together, the complete code listing is provided below.

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# save tokens to file, one dialog per line
def save_doc(lines, filename):
	data = '\n'.join(lines)
	file = open(filename, 'w')
	file.write(data)
	file.close()

# load text
raw_text = load_doc('rhyme.txt')
print(raw_text)

# clean
tokens = raw_text.split()
raw_text = ' '.join(tokens)

# organize into sequences of characters
length = 10
sequences = list()
for i in range(length, len(raw_text)):
	# select sequence of tokens
	seq = raw_text[i-length:i+1]
	# store
	sequences.append(seq)
print('Total Sequences: %d' % len(sequences))

# save sequences to file
out_filename = 'char_sequences.txt'
save_doc(sequences, out_filename)

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# save tokens to file, one dialog per line

def save_doc(lines, filename):

data = '\n'.join(lines)

file = open(filename, 'w')

file.write(data)

file.close()

# load text

raw_text = load_doc('rhyme.txt')

print(raw_text)

# clean

tokens = raw_text.split()

raw_text = ' '.join(tokens)

# organize into sequences of characters

length = 10

sequences = list()

for i in range(length, len(raw_text)):

# select sequence of tokens

seq = raw_text[i-length:i+1]

# store

sequences.append(seq)

print('Total Sequences: %d' % len(sequences))

# save sequences to file

out_filename = 'char_sequences.txt'

save_doc(sequences, out_filename)

Run the example to create the ‘char_seqiences.txt‘ file.

Take a look inside you should see something like the following:

Sing a song
ing a song
ng a song o
g a song of
 a song of
a song of s
 song of si
song of six
ong of sixp
ng of sixpe
...

Sing a song

ing a song

ng a song o

g a song of

a song of

a song of s

song of si

song of six

ong of sixp

ng of sixpe

...

We are now ready to train our character-based neural language model.

Train Language Model

In this section, we will develop a neural language model for the prepared sequence data.

The model will read encoded characters and predict the next character in the sequence. A Long Short-Term Memory recurrent neural network hidden layer will be used to learn the context from the input sequence in order to make the predictions.

Load Data

The first step is to load the prepared character sequence data from ‘char_sequences.txt‘.

We can use the same load_doc() function developed in the previous section. Once loaded, we split the text by new line to give a list of sequences ready to be encoded.

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load
in_filename = 'char_sequences.txt'
raw_text = load_doc(in_filename)
lines = raw_text.split('\n')

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load

in_filename = 'char_sequences.txt'

raw_text = load_doc(in_filename)

lines = raw_text.split('\n')

Encode Sequences

The sequences of characters must be encoded as integers.

This means that each unique character will be assigned a specific integer value and each sequence of characters will be encoded as a sequence of integers.

We can create the mapping given a sorted set of unique characters in the raw input data. The mapping is a dictionary of character values to integer values.

chars = sorted(list(set(raw_text)))
mapping = dict((c, i) for i, c in enumerate(chars))

1 2	chars = sorted(list(set(raw_text))) mapping = dict((c, i) for i, c in enumerate(chars))

Next, we can process each sequence of characters one at a time and use the dictionary mapping to look up the integer value for each character.

sequences = list()
for line in lines:
	# integer encode line
	encoded_seq = [mapping[char] for char in line]
	# store
	sequences.append(encoded_seq)

sequences = list()

for line in lines:

# integer encode line

encoded_seq = [mapping[char] for char in line]

# store

sequences.append(encoded_seq)

The result is a list of integer lists.

We need to know the size of the vocabulary later. We can retrieve this as the size of the dictionary mapping.

# vocabulary size
vocab_size = len(mapping)
print('Vocabulary Size: %d' % vocab_size)

# vocabulary size

vocab_size = len(mapping)

print('Vocabulary Size: %d' % vocab_size)

Running this piece, we can see that there are 38 unique characters in the input sequence data.

Vocabulary Size: 38

1	Vocabulary Size: 38

Split Inputs and Output

Now that the sequences have been integer encoded, we can separate the columns into input and output sequences of characters.

We can do this using a simple array slice.

sequences = array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]

1 2	sequences = array(sequences) X, y = sequences[:,:-1], sequences[:,-1]

Next, we need to one hot encode each character. That is, each character becomes a vector as long as the vocabulary (38 elements) with a 1 marked for the specific character. This provides a more precise input representation for the network. It also provides a clear objective for the network to predict, where a probability distribution over characters can be output by the model and compared to the ideal case of all 0 values with a 1 for the actual next character.

We can use the to_categorical() function in the Keras API to one hot encode the input and output sequences.

sequences = [to_categorical(x, num_classes=vocab_size) for x in X]
X = array(sequences)
y = to_categorical(y, num_classes=vocab_size)

sequences = [to_categorical(x, num_classes=vocab_size) for x in X]

X = array(sequences)

y = to_categorical(y, num_classes=vocab_size)

We are now ready to fit the model.

Fit Model

The model is defined with an input layer that takes sequences that have 10 time steps and 38 features for the one hot encoded input sequences.

Rather than specify these numbers, we use the second and third dimensions on the X input data. This is so that if we change the length of the sequences or size of the vocabulary, we do not need to change the model definition.

The model has a single LSTM hidden layer with 75 memory cells, chosen with a little trial and error.

The model has a fully connected output layer that outputs one vector with a probability distribution across all characters in the vocabulary. A softmax activation function is used on the output layer to ensure the output has the properties of a probability distribution.

# define model
model = Sequential()
model.add(LSTM(75, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())

# define model

model = Sequential()

model.add(LSTM(75, input_shape=(X.shape[1], X.shape[2])))

model.add(Dense(vocab_size, activation='softmax'))

print(model.summary())

Running this prints a summary of the defined network as a sanity check.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm_1 (LSTM)                (None, 75)                34200
_________________________________________________________________
dense_1 (Dense)              (None, 38)                2888
=================================================================
Total params: 37,088
Trainable params: 37,088
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

lstm_1 (LSTM) (None, 75) 34200

_________________________________________________________________

dense_1 (Dense) (None, 38) 2888

=================================================================

Total params: 37,088

Trainable params: 37,088

Non-trainable params: 0

_________________________________________________________________

The model is learning a multi-class classification problem, therefore we use the categorical log loss intended for this type of problem. The efficient Adam implementation of gradient descent is used to optimize the model and accuracy is reported at the end of each batch update.

The model is fit for 100 training epochs, again found with a little trial and error.

# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(X, y, epochs=100, verbose=2)

# compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model

model.fit(X, y, epochs=100, verbose=2)

Save Model

After the model is fit, we save it to file for later use.

The Keras model API provides the save() function that we can use to save the model to a single file, including weights and topology information.

# save the model to file
model.save('model.h5')

1 2	# save the model to file model.save('model.h5')

We also save the mapping from characters to integers that we will need to encode any input when using the model and decode any output from the model.

# save the mapping
dump(mapping, open('mapping.pkl', 'wb'))

1 2	# save the mapping dump(mapping, open('mapping.pkl', 'wb'))

Complete Example

Tying all of this together, the complete code listing for fitting the character-based neural language model is listed below.

from numpy import array
from pickle import dump
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# load doc into memory
def load_doc(filename):
	# open the file as read only
	file = open(filename, 'r')
	# read all text
	text = file.read()
	# close the file
	file.close()
	return text

# load
in_filename = 'char_sequences.txt'
raw_text = load_doc(in_filename)
lines = raw_text.split('\n')

# integer encode sequences of characters
chars = sorted(list(set(raw_text)))
mapping = dict((c, i) for i, c in enumerate(chars))
sequences = list()
for line in lines:
	# integer encode line
	encoded_seq = [mapping[char] for char in line]
	# store
	sequences.append(encoded_seq)

# vocabulary size
vocab_size = len(mapping)
print('Vocabulary Size: %d' % vocab_size)

# separate into input and output
sequences = array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
sequences = [to_categorical(x, num_classes=vocab_size) for x in X]
X = array(sequences)
y = to_categorical(y, num_classes=vocab_size)

# define model
model = Sequential()
model.add(LSTM(75, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(X, y, epochs=100, verbose=2)

# save the model to file
model.save('model.h5')
# save the mapping
dump(mapping, open('mapping.pkl', 'wb'))

from numpy import array

from pickle import dump

from keras.utils import to_categorical

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

# load doc into memory

def load_doc(filename):

# open the file as read only

file = open(filename, 'r')

# read all text

text = file.read()

# close the file

file.close()

return text

# load

in_filename = 'char_sequences.txt'

raw_text = load_doc(in_filename)

lines = raw_text.split('\n')

# integer encode sequences of characters

chars = sorted(list(set(raw_text)))

mapping = dict((c, i) for i, c in enumerate(chars))

sequences = list()

for line in lines:

# integer encode line

encoded_seq = [mapping[char] for char in line]

# store

sequences.append(encoded_seq)

# vocabulary size

vocab_size = len(mapping)

print('Vocabulary Size: %d' % vocab_size)

# separate into input and output

sequences = array(sequences)

X, y = sequences[:,:-1], sequences[:,-1]

sequences = [to_categorical(x, num_classes=vocab_size) for x in X]

X = array(sequences)

y = to_categorical(y, num_classes=vocab_size)

# define model

model = Sequential()

model.add(LSTM(75, input_shape=(X.shape[1], X.shape[2])))

model.add(Dense(vocab_size, activation='softmax'))

print(model.summary())

# compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit model

model.fit(X, y, epochs=100, verbose=2)

# save the model to file

model.save('model.h5')

# save the mapping

dump(mapping, open('mapping.pkl', 'wb'))

Running the example might take one minute.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You will see that the model learns the problem well, perhaps too well for generating surprising sequences of characters.

...
Epoch 96/100
0s - loss: 0.2193 - acc: 0.9950
Epoch 97/100
0s - loss: 0.2124 - acc: 0.9950
Epoch 98/100
0s - loss: 0.2054 - acc: 0.9950
Epoch 99/100
0s - loss: 0.1982 - acc: 0.9950
Epoch 100/100
0s - loss: 0.1910 - acc: 0.9950

...

Epoch 96/100

0s - loss: 0.2193 - acc: 0.9950

Epoch 97/100

0s - loss: 0.2124 - acc: 0.9950

Epoch 98/100

0s - loss: 0.2054 - acc: 0.9950

Epoch 99/100

0s - loss: 0.1982 - acc: 0.9950

Epoch 100/100

0s - loss: 0.1910 - acc: 0.9950

At the end of the run, you will have two files saved to the current working directory, specifically model.h5 and mapping.pkl.

Next, we can look at using the learned model.

Generate Text

We will use the learned language model to generate new sequences of text that have the same statistical properties.

Load Model

The first step is to load the model saved to the file ‘model.h5‘.

We can use the load_model() function from the Keras API.

# load the model
model = load_model('model.h5')

1 2	# load the model model = load_model('model.h5')

We also need to load the pickled dictionary for mapping characters to integers from the file ‘mapping.pkl‘. We will use the Pickle API to load the object.

# load the mapping
mapping = load(open('mapping.pkl', 'rb'))

1 2	# load the mapping mapping = load(open('mapping.pkl', 'rb'))

We are now ready to use the loaded model.

Generate Characters

We must provide sequences of 10 characters as input to the model in order to start the generation process. We will pick these manually.

A given input sequence will need to be prepared in the same way as preparing the training data for the model.

First, the sequence of characters must be integer encoded using the loaded mapping.

# encode the characters as integers
encoded = [mapping[char] for char in in_text]

1 2	# encode the characters as integers encoded = [mapping[char] for char in in_text]

Next, the sequences need to be one hot encoded using the to_categorical() Keras function.

# one hot encode
encoded = to_categorical(encoded, num_classes=len(mapping))

1 2	# one hot encode encoded = to_categorical(encoded, num_classes=len(mapping))

We can then use the model to predict the next character in the sequence.

We use predict_classes() instead of predict() to directly select the integer for the character with the highest probability instead of getting the full probability distribution across the entire set of characters.

# predict character
yhat = model.predict_classes(encoded, verbose=0)

1 2	# predict character yhat = model.predict_classes(encoded, verbose=0)

We can then decode this integer by looking up the mapping to see the character to which it maps.

out_char = ''
for char, index in mapping.items():
	if index == yhat:
		out_char = char
		break

out_char = ''

for char, index in mapping.items():

if index == yhat:

out_char = char

break

This character can then be added to the input sequence. We then need to make sure that the input sequence is 10 characters by truncating the first character from the input sequence text.

We can use the pad_sequences() function from the Keras API that can perform this truncation operation.

Putting all of this together, we can define a new function named generate_seq() for using the loaded model to generate new sequences of text.

# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_text, n_chars):
	in_text = seed_text
	# generate a fixed number of characters
	for _ in range(n_chars):
		# encode the characters as integers
		encoded = [mapping[char] for char in in_text]
		# truncate sequences to a fixed length
		encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
		# one hot encode
		encoded = to_categorical(encoded, num_classes=len(mapping))
		# predict character
		yhat = model.predict_classes(encoded, verbose=0)
		# reverse map integer to character
		out_char = ''
		for char, index in mapping.items():
			if index == yhat:
				out_char = char
				break
		# append to input
		in_text += char
	return in_text

# generate a sequence of characters with a language model

def generate_seq(model, mapping, seq_length, seed_text, n_chars):

in_text = seed_text

# generate a fixed number of characters

for _ in range(n_chars):

# encode the characters as integers

encoded = [mapping[char] for char in in_text]

# truncate sequences to a fixed length

encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')

# one hot encode

encoded = to_categorical(encoded, num_classes=len(mapping))

# predict character

yhat = model.predict_classes(encoded, verbose=0)

# reverse map integer to character

out_char = ''

for char, index in mapping.items():

if index == yhat:

out_char = char

break

# append to input

in_text += char

return in_text

Complete Example

Tying all of this together, the complete example for generating text using the fit neural language model is listed below.

from pickle import load
from keras.models import load_model
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences

# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_text, n_chars):
	in_text = seed_text
	# generate a fixed number of characters
	for _ in range(n_chars):
		# encode the characters as integers
		encoded = [mapping[char] for char in in_text]
		# truncate sequences to a fixed length
		encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
		# one hot encode
		encoded = to_categorical(encoded, num_classes=len(mapping))
		# predict character
		yhat = model.predict_classes(encoded, verbose=0)
		# reverse map integer to character
		out_char = ''
		for char, index in mapping.items():
			if index == yhat:
				out_char = char
				break
		# append to input
		in_text += char
	return in_text

# load the model
model = load_model('model.h5')
# load the mapping
mapping = load(open('mapping.pkl', 'rb'))

# test start of rhyme
print(generate_seq(model, mapping, 10, 'Sing a son', 20))
# test mid-line
print(generate_seq(model, mapping, 10, 'king was i', 20))
# test not in original
print(generate_seq(model, mapping, 10, 'hello worl', 20))

from pickle import load

from keras.models import load_model

from keras.utils import to_categorical

from keras.preprocessing.sequence import pad_sequences

# generate a sequence of characters with a language model

def generate_seq(model, mapping, seq_length, seed_text, n_chars):

in_text = seed_text

# generate a fixed number of characters

for _ in range(n_chars):

# encode the characters as integers

encoded = [mapping[char] for char in in_text]

# truncate sequences to a fixed length

encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')

# one hot encode

encoded = to_categorical(encoded, num_classes=len(mapping))

# predict character

yhat = model.predict_classes(encoded, verbose=0)

# reverse map integer to character

out_char = ''

for char, index in mapping.items():

if index == yhat:

out_char = char

break

# append to input

in_text += char

return in_text

# load the model

model = load_model('model.h5')

# load the mapping

mapping = load(open('mapping.pkl', 'rb'))

# test start of rhyme

print(generate_seq(model, mapping, 10, 'Sing a son', 20))

# test mid-line

print(generate_seq(model, mapping, 10, 'king was i', 20))

# test not in original

print(generate_seq(model, mapping, 10, 'hello worl', 20))

Running the example generates three sequences of text.

The first is a test to see how the model does at starting from the beginning of the rhyme. The second is a test to see how well it does at beginning in the middle of a line. The final example is a test to see how well it does with a sequence of characters never seen before.

Sing a song of sixpence, A poc
king was in his counting house
hello worls e pake wofey. The

Sing a song of sixpence, A poc

king was in his counting house

hello worls e pake wofey. The

We can see that the model did very well with the first two examples, as we would expect. We can also see that the model still generated something for the new text, but it is nonsense.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

Padding. Update the example to provides sequences line by line only and use padding to fill out each sequence to the maximum line length.
Sequence Length. Experiment with different sequence lengths and see how they impact the behavior of the model.
Tune Model. Experiment with different model configurations, such as the number of memory cells and epochs, and try to develop a better model for fewer resources.

Summary

In this tutorial, you discovered how to develop a character-based neural language model.

Specifically, you learned:

How to prepare text for character-based language modeling.
How to develop a character-based language model using LSTMs.
How to use a trained character-based language model to generate text.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

86 Responses to How to Develop a Character-Based Neural Language Model in Keras

Prakash November 7, 2017 at 12:33 am #

Hi Jason – Thanks for sharing this article. I am in learning phase and when I try to run your program (defining the load_doc function), I am getting error. Is there any package that I need to install before I run the code ?

Running the first set of lines for loading the doc into memory gives me the following error

> return text
Error: unexpected symbol in ” return text”

Reply
- Jason Brownlee November 7, 2017 at 9:51 am #
  
  It looks like a copy-paste error, ensure you maintain the indenting of the Python code.
  
  Reply
Klaas November 8, 2017 at 6:54 am #

Thanks a lot Jason. One general question. On your blog I read a lot about one hot encoding. From the mnist dataset I get it that it is easy to compare probabilities (e.g if the Number is 2 I want my network to output a 1 on the 3rd row). But when it comes to language huge vocabularies is a one hot encoding not completely inefficient? I mean e.g. 1 Million vocab size and each word a vector with one 1 and 999.999 zeros? I do not really get that.

Reply
- Jason Brownlee November 8, 2017 at 9:31 am #
  
  Yep, in general we try to reduce the size of the vocab to ensure the model trains in a reasonable time.
  
  But what is the alternative? Perhaps less crisp word predictions and worse skill?
  
  Reply
  - Thomas Shorts February 17, 2021 at 3:28 pm #
    
    For the “token by word” issue that Klass mentions (having a very wide X data because tokenizing by word yields many more classes than if tokenized by character), would it work to simply feed the model the sequence of embeddings as opposed to a one-hot-encoded sequence where only one of the rows contains the embedding for each word / token?
    
    You would have to change the input layer around from the example, but would that approach work and alleviate the issues associated with high dimensionality?
    
    Reply
    - Jason Brownlee February 18, 2021 at 5:12 am #
      
      Yes, you ordinal encode your input, feed the integers to the embedding, then feed the sequences of embedding vectors to the LSTM or whatever model you like.
      
      I have tons of examples of this on the blog, e.g.:
      https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/
      
      Reply
Ravi Annaswamy November 10, 2017 at 6:47 am #

Yet another excellent tutorial, Dr.Jason.

Reply
- Jason Brownlee November 10, 2017 at 10:42 am #
  
  Thanks Ravi!
  
  Reply
srihari November 10, 2017 at 3:09 pm #

Hi,

Can we implement using nltk as helping library to keras, in transforming the text.

Srihari

Reply
- Jason Brownlee November 11, 2017 at 9:15 am #
  
  Yes, here is an example:
  https://machinelearningmastery.com/clean-text-machine-learning-python/
  
  Reply
Stuart November 11, 2017 at 6:24 am #

Awesome article. Really appreciate the level of line-by-line detail.

I think there are some mistakes around this part of the article:

“Next, the integers need to be one hot encoded using the pad_sequences() Keras function.”

I think you mean truncated instead of one hot encoded? Also it’s missing the accompanying code snippet.

Reply
- Jason Brownlee November 11, 2017 at 9:28 am #
  
  Thanks, fixed. I meant the to_categorical() function for one hot encoding.
  
  Reply
Antonio November 24, 2017 at 7:40 am #

Cool, thanks for sharing!

Reply
- Jason Brownlee November 24, 2017 at 9:52 am #
  
  You’re welcome.
  
  Reply
  - Antonio November 24, 2017 at 7:45 pm #
    
    Quick question, if I may.. if we want to characterize the sequence with some extra input features, how to we prepare the data? Just to illustrate, sticking to the above example, for example I may want to associate to each sequence the name of the Person writing the sequence. This feature may change slightly the prediction of the next character. I thought one option would be to create and train a different model for each Person, but I think this would be quite suboptimal, since majority of the rules learned will be in common to every Person and the data set will be reduced for each model/Person. Another option would be to encode the Person in the input, within each time step. But in this case there is a little bit of redundancy, since the Person input feature will be the same across all the time steps. So, is there a way to provide as input to the model a feature which is independent and unchanged across all the time steps, but which characterize the entire input sequence? Thanks very much
    
    Reply
    - Jason Brownlee November 25, 2017 at 10:17 am #
      
      Great question, perhaps a multiple-input model:
      https://machinelearningmastery.com/keras-functional-api-deep-learning/
      
      Reply
Ethan B January 10, 2018 at 5:05 pm #

Hi Jason, thanks for the great article!

I have one question regarding the training phase. I was thinking about using character embeddings, for example fitting a word2vec model on characters which I would then use to train the LSTM, rather than using the one hot encoded characters. Do you think this would give any sort of performance gain? I was going to test this idea myself, but I was curious if you had tried this yourself first, or if you think it is a worthwhile approach.

Reply
- Jason Brownlee January 11, 2018 at 5:49 am #
  
  I have not tried an embedding of chars, sorry.
  
  Reply
- shm June 15, 2019 at 5:03 am #
  
  hi Ethan B, have you fit word2vec model on characters??
  
  Reply
  - Efstathios Chatzikyriakidis May 15, 2020 at 1:55 am #
    
    There is not meaning on doing that. Characters have no semantics.
    
    Reply

Al February 2, 2018 at 10:52 am #

Hi, It’s really a nice tutorial!!
I have one trouble. When I try to predict using generate_seq, I got this error ValueError: Error when checking : expected lstm_1_input to have shape (408, 37) but got array with shape (10, 37)
why this would happen? Thanks!!!

Riya John February 28, 2018 at 1:54 am #

Hi, I got a similar error: ValueError: cannot reshape array of size 380 into shape (1,1,10)

Code worked for me when I commented line: encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1]) in generate_seq()

Franco Arda June 5, 2018 at 4:23 pm #

@Riiya, thanks for pointing that out.
@Jason, AI and Riya are right. Code doesn’t run (your blog or book). We need to

#encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1])

to make prediction work.

Jason Brownlee June 6, 2018 at 6:36 am #

Are you able to confirm that you have the latest version of Keras and other libraries installed?

python: 3.6.5
scipy: 1.1.0
numpy: 1.14.3
matplotlib: 2.2.2
pandas: 0.23.0
statsmodels: 0.9.0
sklearn: 0.19.1
nltk: 3.3
gensim: 3.4.0
xgboost 0.6
tensorflow: 1.8.0
theano: 1.0.1
keras: 2.1.6

python: 3.6.5

scipy: 1.1.0

numpy: 1.14.3

matplotlib: 2.2.2

pandas: 0.23.0

statsmodels: 0.9.0

sklearn: 0.19.1

nltk: 3.3

gensim: 3.4.0

xgboost 0.6

tensorflow: 1.8.0

theano: 1.0.1

keras: 2.1.6

Franco Arda June 7, 2018 at 3:32 pm #

Update now, but issue remains.
Don’t answer for me – I’m happy that it runs.
Kay-Michael Würzner June 21, 2018 at 5:36 pm #

There is definitely a problem with that reshape line:

ValueError: cannot reshape array of size 780 into shape (1,1,10)

Checked the versions of installed libraries. As far as I can see, keras, numpy and tensorflow are needed here.

numpy==1.14.5 Keras==2.2.0 Keras-Applications==1.0.2 Keras-Preprocessing==1.0.1 tensorflow==1.8.0
Commenting the reshape code indeed helps successfully running the code but given the output it looks like it screws up the actual prediction… Any ideas?
Jason Brownlee June 22, 2018 at 6:02 am #

I wonder if is an issue with the data file, are you able to confirm the raw data for the text matches the post?

Franco Arda June 5, 2018 at 4:20 pm #

indeed, there’s an error ….see below

Reply

Neeraj April 7, 2018 at 2:26 pm #

Hi Jason,

Can you please help me with an error. I am new at python so many times I dont know how to resolve an error.

Error 1 :

mapping = load(open(‘mapping.pkl’, ‘rb’))
Traceback (most recent call last):

File “”, line 1, in
mapping = load(open(‘mapping.pkl’, ‘rb’))

NameError: name ‘load’ is not defined

Error 2 :
from pickle import load

mapping = load(open(‘mapping.pkl’, ‘rb’))
Traceback (most recent call last):

File “”, line 1, in
mapping = load(open(‘mapping.pkl’, ‘rb’))

EOFError: Ran out of input

Thanks,
Neeraj

Reply
- Jason Brownlee April 8, 2018 at 6:10 am #
  
  Perhaps double check that you have coped all of the code from the example?
  
  Reply
sagar June 2, 2018 at 1:23 am #

Hi Jason, this is very helpful and nicely detailed. Thank you for sharing.

I am working on a problem where I have some 32000 rows of jumbled characters “wewlsfnskfddsl…eredsda” and each row is of length 406. These are hashed, probably. And I need to predict to which class do they belong to? Here class is 1-12 names of books.

Any suggestions on how I could modify your code above. Would my problem still need text generation? As this is a multi-class classification problem.

Thank you very much. Looking forward to your advice.

Reply
- Jason Brownlee June 2, 2018 at 6:38 am #
  
  Sounds like classification. A language mode/text generation would not be helpful.
  
  I would recommend testing an MLP, CNN and LSTM on the problem. Also look at some of the tutorials on the blog for sentiment classification, they will provide a template. No need for a word embedding either.
  
  Reply
- Koushik June 4, 2018 at 7:52 pm #
  
  I think many to one RNN model should work for your problem
  
  Reply
SM June 4, 2018 at 6:53 pm #

Hi Jason, thank you for the above suggestions. I am trying to implement a LSTM model and following this post on how to set up. I have a questions about input_shape for LSTM and for dense layer.

My xtrain is a sequence of numbers as a NumPy array. To give you a background: I have 32514 rows of jumbled characters “wewlsfnskfddsl…” , which I reshaped: X = X.reshape((1,32514,1)) into 3D for in LSTM’s input_shape, taking inspiration from your post “https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/”.

However, here my y is of 12 classes (0 to 11). It is a multiclass classification problem.

y = to_categorical(y, num_classes=12)

How should I define my dense layer? According to keras document, here’s what the input_shape is for dense and LSTM layers:

Dense: (batch_size, input_dim)
LSTM: (batch_size, timesteps, input_dim)

# define model
model = Sequential()
model.add(LSTM(75, input_shape=(32514,1)))
model.add(Dense(input(1), activation=’softmax’))
print(model.summary())
# compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
# fit model
model.fit(X, y, epochs=100, verbose=2)

# save the model to file
model.save(‘model.h5’)
# save the mapping
dump(mapping, open(‘mapping.pkl’, ‘wb’))

When I run the above cell, the program seems to be asking to for another input.

Is this correct? Thank you very much. I love your blog.

Reply
- Jason Brownlee June 5, 2018 at 6:37 am #
  
  You can learn more about the shape of data for LSTMS here:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-prepare-my-data-for-an-lstm
  
  Reply
Ahmed Sahlol June 11, 2018 at 12:53 am #

Many thanks Jason for the topic and the detailed explanation, really looks awesome.
I have a question: Why u assigned the LSTM hidden layer = 75 memory cells?

Reply
- Jason Brownlee June 11, 2018 at 6:09 am #
  
  The model was configured with a little trial and error.
  
  Reply
Christian June 27, 2018 at 12:14 am #

Hi Jason,

Do u know if I can use this technique (character) to cluster rare words, like DriverId with Driver, Name, etc ?. Or Vehicle with vehicleId, location, latitude, longitude?,

Thanks.

Reply
- Jason Brownlee June 27, 2018 at 8:19 am #
  
  Perhaps you can use an embedding for these features?
  
  Reply
Ayden M. June 28, 2018 at 10:11 pm #

Thank you so much for this tutorial. I am getting this error when training my LM, and I can’t figure out how to overcome this:

IndexError: too many indices for array

It occurs in line 39: X, y = sequences[:,:-1], sequences[:,-1]

From what I understand, it doesn’t want to create a 2D vector from a 1D vector. I tried to reshape with numpy but I keep getting errors of similar nature. Do you have any idea how to solve this?

Reply
- Jason Brownlee June 29, 2018 at 6:06 am #
  
  Perhaps this post on mnumpy arrays will help you inspect your code and identify the issue:
  https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/
  
  Reply
- littleflow3r November 21, 2018 at 4:23 pm #
  
  check your char_sequences.txt
  probably it has an empty line in the last row
  so when the code tried to index the last row (the empty line), it gives too many indices error (since it only has one index)
  
  Reply
Tuomas V. September 25, 2018 at 12:33 am #

Hello!

Thanks for the incredibly easy-to-follow tutorial.

I’m wondering whether there’s a more memory-friendly way to one-hot encode than using to_categorical?

The sequences text file I’m working with is nearly 1 Gb in size, my vocabulary size is quite large and thus there are MemoryErrors at:

sequences = [to_categorical(x, num_classes=vocab_size) for x in X]

Any help would be appreciated!

Reply
- Jason Brownlee September 25, 2018 at 6:25 am #
  
  Hmmm. Some thoughts:
  
  Don’t use one hot encoding, use integer encoding and a word embedding and process the file in chunks via progressive loading.
  
  Reply
  - Tuomas V. October 10, 2018 at 10:41 pm #
    
    Droppped one-hot encoding, switched to fastText word vectors and it has been smooth sailing since. Thanks!
    
    Reply
    - Jason Brownlee October 11, 2018 at 7:56 am #
      
      Nice work!
      
      Reply
Ayan January 10, 2019 at 4:41 pm #

Hi Jason..I want to view the embedded representation for each character(embedded using LSTM)..Can you please suggest the lines needed to do so?

Reply
- Jason Brownlee January 11, 2019 at 7:40 am #
  
  Sorry, I don’t have any examples of character based embeddings – if there is such a thing.
  
  Reply
Fengtao Wu January 19, 2019 at 9:21 am #

There is a error in the complete example at last. In the line 17:

“encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1])”

should be:

“encoded = encoded.reshape(-1, encoded.shape[1], encoded.shape[2])”

Could you please check it again?

Reply
- Jason Brownlee January 20, 2019 at 5:38 am #
  
  Why is that exactly?
  
  Reply
- Dude July 16, 2019 at 4:40 am #
  
  I’m not sure why but this change causes the code to work from me. I have been having the same error as others.
  
  Reply
  - Jason Brownlee July 16, 2019 at 8:23 am #
    
    Interesting.
    
    Are all libs up to date? Python 3? Keras? TensorFlow?
    
    Reply
Mahnaz February 28, 2019 at 10:08 pm #

I am using the char-level language model to predict the next character given a sequence of previous characters (say 20 characters). My train sequences are almost 1088637. I am using the following model for my training:

model = Sequential()
model.add(LSTM(1000, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(vocab_size, activation=’softmax’, kernel_initializer=’normal’))

But I get only 50% of the text correct. It always goes wrong from the middle of the text. Any suggestion? I appreciate your help.

Reply
- Jason Brownlee March 1, 2019 at 6:19 am #
  
  Perhaps the model configuration or learning configuration requires tuning, a good place to start is here:
  https://machinelearningmastery.com/start-here/#better
  
  Reply
Nauman January 26, 2020 at 11:31 pm #

Dear is there any tutorial for character-based machine translation?

Reply
- Jason Brownlee January 27, 2020 at 7:05 am #
  
  I don’t think I have one, sorry.
  
  Reply
Efstathios Chatzikyriakidis May 15, 2020 at 2:05 am #

Hi Jason,

In your word-based language model examples you presented us:

1. One-Word-In -> One-Word-Out framing
2. N-Words-In -> One-Word-Out framing
3. Line-by-Line framing

Also for the character-based language model you use here:

1. N-Char-In -> One-Char-Out
2. Also suggest the usage of Line-by-line as an experimentation

However, you can also use a different approach for learning a language model which is:

Sequence2Sequence:

Example of input and output:

Hello Jason!-E-
-S-Hello Jason!

This can be done using an LSTM with return_sequences=True where input sequence length equals output sequence length.

Inputs can be encoded as word embeddings and outputs can be encoded as one-hot.

Right?

Reply
- Jason Brownlee May 15, 2020 at 6:04 am #
  
  Yes, but seq2seq requires an encoder-decoder model, not simply a return sequences from an LSTM.
  
  Reply
Shankar May 20, 2020 at 6:13 pm #

A very basic doubt, sorry since I’m a beginner in this stuff… On what basis do you declare X and Y to be sequences[,:-1] and [:,-1]?

Reply
- Jason Brownlee May 21, 2020 at 6:13 am #
  
  What do you mean? Perhaps you can elaborate on your doubt?
  
  Reply
Pablo June 11, 2020 at 4:55 am #

Hi!

Thanks for this! I’m trying to run the example but there is seems to be an error in line
X, y = sequences[:,:-1], sequences[:,-1]

It turns out, sequences is first a list of values later converted to an array. This array is not multidimensional and hence the error in the above quoted line. I tried to fix this by replacing that line with

X, y = sequences[:-1], sequences[-1]

which makes sense in a single-dimension array, but now throws another error in the following line:

model.add(LSTM(75, input_shape=(X.shape[1], X.shape[2])))

I understand X must be of a different shape and that’s why in this last quoted line it is inteded to provide two integers from the two dimensions.

Any idea how to fix this?
Thanks!

Reply
- Jason Brownlee June 11, 2020 at 6:05 am #
  
  Thanks. Are you sure, as I believe the example runs as expected without change:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
  - Pablo June 11, 2020 at 5:08 pm #
    
    Hi, thanks for such a quick answer!
    
    You are right, I didn’t copied the code just right.
    I just did, nonetheless and got this error:
    
    Traceback (most recent call last):
    File “generate.py”, line 37, in
    print(generate_seq(model, mapping, 10, ‘Sing a son’, 20))
    File “generate.py”, line 18, in generate_seq
    encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1])
    ValueError: cannot reshape array of size 380 into shape (1,1,10)
    
    while generating text. Commenting line 18 makes it work, but I’m not sure it does as expected. Could you confirm it?
    
    Reply
    - Jason Brownlee June 12, 2020 at 6:09 am #
      
      Are you able to confirm that your version of Keras and TensorFlow are up to date?
      
      Reply
      - Pablo June 13, 2020 at 2:21 am #
        
        HI, thanks for anwering back
        
        I can confirm I have my system up to date. That means tensorflow package at version 2.2.0, which also provides Keras. Is that ok?
      - Jason Brownlee June 13, 2020 at 6:09 am #
        
        Your versions look good. I will investigate.
        
        Update: The example works as-is with TensorFlow 2.2 and Keras 2.3.
        
        Ensure you copied the complete code example.
Pablo June 15, 2020 at 7:26 pm #

Hello again Jason,

First of all thanks for all the trouble you are taking with this.

I just copied again the three complete codes intro empty files and still get the same error posted above:

Traceback (most recent call last):
File “generate_.py”, line 36, in
print(generate_seq(model, mapping, 10, ‘Sing a son’, 20))
File “generate_.py”, line 17, in generate_seq
encoded = encoded.reshape(1, encoded.shape[0], encoded.shape[1])
ValueError: cannot reshape array of size 380 into shape (1,1,10)

Which I fix just commenting that line.

My python is version 3.8.3, tensorflow 2.2.0, keras 2.3.0-tf. I have everything installed from official Archlinux repos (packages {,python}tensorflow-opt).

Maybe there is something OS-related messing with this. To be honest, I don’t think this is worth much trouble. I’m able to create the sequences, train the model and generate the text just by modifying that line. There is our discussion here, so anyone can check this comments and find a fix in case they need it.

Thank you again,
Pablo.

Reply
- Jason Brownlee June 16, 2020 at 5:37 am #
  
  Perhaps.
  
  Hang in there.
  
  Reply
Firas Obeid September 1, 2020 at 3:45 am #

Is it necessary to one-hot encode my features if my features(characters) lets say are are much larger and expand to 88 for example? Because that will have a tall on my training time and memory…

Reply
- Jason Brownlee September 1, 2020 at 6:38 am #
  
  Perhaps try it and see, compare to other encodings like an ordinal encoding and an embedding.
  
  Reply
  - Firas Obeid September 1, 2020 at 7:47 am #
    
    Yes most definitely thanks! Iam also trying converting to bytes ‘utf-8’ to simplify embedding look-up and then inverting back to get readable text after prediction.
    
    Reply
    - Jason Brownlee September 1, 2020 at 7:54 am #
      
      Let me know how you go.
      
      Reply
Stanislav September 13, 2020 at 9:33 am #

Hi Jason.
I want to ask about the metrics in this part of the code.

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

As far as I understand, these metrics work on sequence comparisons.

Reply
- Stanislav September 13, 2020 at 9:44 am #
  
  Hi Jason.
  I want to ask about the metrics in this part of the code.
  
  model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
  
  As far as I understand, these metrics work on sequence comparisons.
  
  model.fit(X, y, epochs=30, verbose=2, callbacks=[csv_logger])
  
  As a result, the final dataset is divided into 2 X and Y. In this case, is it learning without a teacher? And “y” is used only for calculating metrics?
  
  If I misunderstood correctly, please explain why to separate data into “X” and “y”
  
  And what type of training this is in the end, I do not quite understand?
  
  Thanks in advance.
  
  PS. Sorry for my English
  
  Reply
  - Jason Brownlee September 14, 2020 at 6:43 am #
    
    In the above it is working on each character output, not on sequences.
    
    If you are interested in sequences, perhaps a seq2seq model would be more appropriate:
    https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/
    
    Reply
- Jason Brownlee September 14, 2020 at 6:41 am #
  
  Cross-entropy is minimized so the model predicts the desired output.
  
  Accuracy is used to evaluate the output categorical variables, e.g. the expected characters. it may or may not be the most appropriate metric in this case. e.g. if we got 100% the model has memorised the input which might not be desirable.
  
  Reply
  - Stanislav September 14, 2020 at 7:19 pm #
    
    One more question. I don’t quite understand why there are 2 input arrays “X” and “y” ?
    
    Could you explain this in detail?
    
    Thanks.
    
    Reply
    - Jason Brownlee September 15, 2020 at 5:19 am #
      
      Good question, see this:
      https://machinelearningmastery.com/faq/single-faq/what-are-x-and-y-in-machine-learning
      
      Reply
kaur November 30, 2020 at 7:22 pm #

One epoch out of 100 taking around 180 seconds with loss 1.3362 and accuracy of 0.3520 for biological sequence file having size 1.59 MB.

How to reduce it ?

Reply
- Jason Brownlee December 1, 2020 at 6:19 am #
  
  Here are some suggestions:
  https://machinelearningmastery.com/faq/single-faq/how-do-i-speed-up-the-training-of-my-model
  
  Reply
Himanshu December 29, 2020 at 2:20 am #

Hi Jason,

I am getting this error on execution of the code:

ValueError: cannot reshape array of size 380 into shape (1,1,10)

Could you please advise. Thanks

Reply
- Jason Brownlee December 29, 2020 at 5:16 am #
  
  Perhaps these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
cnsn8 March 3, 2021 at 6:32 am #

hello, thanks for good information. I would be very happy if you could look. I take the code exactly and run it as it is. However, I get the following error in the model.fit section.

ValueError: Input 0 of layer sequential_15 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 398, 11, 38]

How can I fix this ?
thank you.

Reply
- Jason Brownlee March 3, 2021 at 8:06 am #
  
  You’re welcome.
  
  Sorry to hear that, these tips will help:
  https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-me
  
  Reply
cnsn8 April 5, 2021 at 8:31 am #

Thanks again, I fixed it. I have an another question. Is there any source you can recommend that explains the subject of simple controllable text generation with code? Or would you consider making a tutorial on this topic.

Reply
- Jason Brownlee April 6, 2021 at 5:14 am #
  
  Well done!
  
  Yes, you can find many examples of language models on the blog:
  https://machinelearningmastery.com/?s=language+models&post_type=post&submit=Search
  
  Reply

Navigation

How to Develop a Character-Based Neural Language Model in Keras

Tutorial Overview

Need help with Deep Learning for Text Data?

Sing a Song of Sixpence

Data Preparation

Language Model Design

Load Text

Clean Text

Create Sequences

Save Sequences

Complete Example

Train Language Model

Load Data

Encode Sequences

Split Inputs and Output

Fit Model

Save Model

Complete Example

Generate Text

Load Model

Generate Characters

Complete Example

Extensions

Further Reading

Summary

Develop Deep Learning models for Text Data Today!

Develop Your Own Text models in Minutes

Finally Bring Deep Learning to your Natural Language Processing Projects

More On This Topic

86 Responses to How to Develop a Character-Based Neural Language Model in Keras

Leave a Reply Click here to cancel reply.