Python Code Translation: From Source to Bytecode

Let's walk through how Python translates your code from plain text to bytecode, step by step. We'll use this example:

a = 6
b = 4
print(a + b)

1. Source Code (Text)

This is the code you write in a .py file. It's just text, readable by humans.

2. Tokenization

Python first breaks your code into tokens—the smallest meaningful elements (like names, numbers, and operators).

Example tokens:

Token Type	Token Value
NAME	'a'
OP	'='
NUMBER	'6'
NEWLINE	'\n'
NAME	'b'
OP	'='
NUMBER	'4'
NEWLINE	'\n'
NAME	'print'
OP	'('
NAME	'a'
OP	'+'
NAME	'b'
OP	')'
NEWLINE	'\n'

3. Parsing and AST (Abstract Syntax Tree)

Next, Python organizes the tokens into a tree structure that represents the meaning of your code. This is called the AST.

AST Outline:

Module
- Assign (a = 6)
- Assign (b = 4)
- Expr (print(a + b))
  - Call
    - func: print
    - args:
      - BinOp (a + b)

This tree shows how each part of your code relates to the others.

4. Compilation to Bytecode

Python then compiles the AST into bytecode—a set of instructions for the Python Virtual Machine (PVM).

Example bytecode (using dis module):

  1           0 LOAD_CONST               0 (6)
              2 STORE_NAME               0 (a)
  2           4 LOAD_CONST               1 (4)
              6 STORE_NAME               1 (b)
  3           8 LOAD_NAME                2 (print)
             10 LOAD_NAME                0 (a)
             12 LOAD_NAME                1 (b)
             14 BINARY_OP                0 (+)
             16 PRECALL                  1
             18 CALL                     1
             20 POP_TOP
             22 LOAD_CONST               2 (None)
             24 RETURN_VALUE

Each line is a low-level instruction that the PVM can execute.

5. Execution

The Python Virtual Machine runs the bytecode, performing the actual calculations and function calls.

Summary Table

Step	What Happens	Example Output/Structure
Source Code	Human-readable text	`a = 6`
Tokenization	Break into tokens	`NAME`, `OP`, `NUMBER`, ...
AST	Build tree of code structure	Assign, BinOp, Call, ...
Bytecode	Compile to VM instructions	`LOAD_CONST`, `STORE_NAME`, ...
Execution	Python VM runs the bytecode	Output: `10`

In short: Python takes your code through tokenization, parsing (AST), compilation to bytecode, and finally execution. Each step helps Python understand and run your code efficiently.

⁂

References: