Indentation in Python

Arun Thiagarajan
4 min readJun 5, 2022

--

Do you think this line of code will be parsed successfully by Python?

def foo():
print ("Foo using 4 space for indent")
def bar():
print ("Bar using 2 spaces for indent")
if __name__ == "__main__":
foo()
bar()
print ("Main uses 6 spaces for indent")

Will not Python parser raise indentation error here? To a surprise, the Python parser will parse the above code block. How does the parser handle indentation? In this post, I will share how the parser handles indentation in Python.

Handling Indentation in Python

Before going onto details, lets take a brief overview of Logical lines and Physical Lines. A physical line is an actual line which we type in a text editor. A logical line is a line which is complete in logic. Multiple physical line can combine together to make a logical line. A quick example can say the difference.

# This example has 3 logical lines, 3 physical lines
a = [1, 2, 3]
b = [2, 3, 4]
c = [3, 5, 6]

# This example has 1 logical line, 3 physical lines
a = [1, 2, 3,
4, 5, 6,
7, 8, 9]
# This example has 1 logical line, 3 physical lines
a = 1 + \
2 + \
3

In more concrete terms, logical lines are separated by a the NEWLINE token whereas physical lines are terminated by the end-of-line sequence (ex: ASCII linefeed — \n, carriage return). In the rest of the part, we will describe how the parser determine indentation level of a line.

A python program is read by a parser which accepts a stream of tokens as input.

The parser uses a stack to keep track of the indentation level of a line. When the parser begins to parse the input file, before reading the first line, a single zero is pushed in the indentation stack. This zero will never be popped off again.

The first step in determining the indentation level of a line is to convert tabs to spaces. The parser reads a line character by character. When it encounters a tab, the tab is replaced by one to eight spaces i.e a single \t token is replaced into 8 spaces.

The parser considers only leading whitespaces at the beginning of logical lines to determines the indentation level of a line. At the beginning of each logical line, the parser determines the level of indentation of a line as the total number of characters preceding the first non-blank character in the line. If the level of indentation is equal to the top of the stack, nothing happens (remember the stack contains initially one element — zero). If the level of indentation is greater than the top of the stack, the indentation level of the line is pushed into the stack. If the level of indentation is less than the top of the stack, the elements of the stack are popped of until a matching level of indentation is found.

When the current indentation level is greater than zero and the the parser is popping the elements in the stack to determine the matching outer block, if none of the elements in the stack matches the current indentation level, it will raise an indentation error stating unindent does not match any outer indentation level. An indentation is also rejected as inconsistent if a source file mixes tabs and spaces.

Let’s go through a example (Note: This example uses spaces and not tabs for indentation levels):

  1. Before the parsing begins, the indentation stack is initialized with only one element — zero. The state of the stack will be: [0] .
  2. The first line has no indent i.e level of indent is 0. The parser will read that line and move to the next line. Stack state: [0]
  3. The level of indent in second line is 2. The parser will compare the top of the stack to the current indentation level. Since the current indentation level is greater than stack top, the parser will push 2 into the stack and generate an INDENT token. The stack will now be [0, 2] .
  4. The third line has indentation level same as line 2. Hence, no changes happens to the indentation stack. Stack state: [0, 2]
  5. The fourth line has 6 spaces for indent. The parser will generate an INDENT token and push 6 into the stack. New stack state: [0, 2, 6]
  6. In the fifth line, the parser will compare the top of the stack with the current indentation level 2. Since the current indentation level is less than top of the stack, the parser will pop the top of the stack until the current indentation level matches the top of the stack. In this case, just by popping one element the indentation level matches to the top of the stack. Parser will generate a DEDENT token and update the stack state to be [0, 2] .
  7. Now, the parser has reached the end of the file. It generates a DEDENT token for each number greater than zero in the stack. The final state of the stack will be the start state: [0]

Maximum Indentation Level

So, how many levels of indentation will Python parser accept? A level of indentation is one block of code. For example, the below code sample has 0 level of indentation in line 1, one level of indentation in line 2, 3 and 2 levels of indentation in line 4.

def foo():
print ("foo")
if True:
print ("True")

Python allows upto 100 level of indentation. If the level of indentation is more than 100, it raises IndentationError: too many levels of indentation . Why does Python stop at 100 level of indentation? It’s because of this one line of Python source code in the file Parser/tokenizer.h: #define MAXINDENT 100 /* Max indentation level */ .

Try editing this line, say making it to 150 and compile python once again to increase the indentation level limit.

--

--

No responses yet