Advanced python Flashcards
How is memory managed in Python?
- The OS carries out (or denies) requests to read and write memory. It also creates a virtual memory layer that applications (including Python) can access
- The default Python implementation, CPython, handles memory management for Python code
- Each Python object (everything is an object) has a C PyObject, with a reference count, used for garbage collection, and a pointer to the actual object in memory
- Python’s Global Interpreter Lock (GIL) locks the interpreter when a thread is interacting with the shared memory resource
- An object’s reference count increments when it is used (e.g. assigned to another variable) and decremented when a reference is removed. If it drops to 0, the object has a specific deallocation function that is called which “frees” the memory so that other objects can use it (garbage collection but really just makes it available again).
- Python uses a portion of the memory for internal use and a portion as a private heap space for object storage
- Python’s memory manager divides the private heap into variable-size arenas, and those arenas into fixed-size blocks called “pools”
- A usedpools list tracks all the pools that have space available for data for each size class. When a given block size is requested, the algorithm checks this usedpools list for the list of pools for that block size
- pools contain a pointer to their “free” (available for use again) blocks of memory. As the memory manager makes blocks “free,” they are added to the front of the freeblock list. It will used free blocks before unused blocks
- Arenas are instead organized into a list called usable_arenas, sorted by the number of free pools available. The fewer free pools, the closer the arena is to the front of the list so those most full of data will be selected to place new data into.
- Arenas are the only things that can truly be freed to the os (instead of overwritten). So those arenas that are closer to being empty should be allowed to become empty, reducing the overall memory footprint of the Python program.
How does garbage collection work in Python?
- Reference Counting: Python primarily uses reference counting for garbage collection. When an object’s reference count drops to zero, it means the object is no longer accessible, and its memory can be reclaimed. e.g. variables declared inside a function have a local scope. They are created when the function is called and are destroyed when the function exits, freeing up the memory
- Cycle Detector: Python’s garbage collector includes a cycle detector to identify and clean up reference cycles (groups of objects that reference each other, creating a cycle) that wouldn’t be collected by reference counting alone.
- Generational Garbage Collection: Python uses a generational approach to garbage collection, dividing objects into three generations (young, middle-aged, and old). New objects start in the youngest generation, and objects that survive multiple garbage collection rounds are promoted to older generations.
- Thresholds and Tuning: The garbage collector is triggered when the number of objects in a generation exceeds a certain threshold. These thresholds can be manually tuned to optimize garbage collection performance for specific applications.
- Manual Control: Python provides functions like gc.collect() to manually trigger garbage collection, and gc.set_debug() to debug garbage collection behavior. However, in most cases, the automatic garbage collection process is sufficient.
What are Python namespaces? Why are they used?
A namespace in Python ensures that object names in a program are unique and can be used without any conflict. Python implements these namespaces as dictionaries with ‘name as key’ mapped to a corresponding ‘object as value’. This allows for multiple namespaces to use the same name and map it to a separate object. A few examples of namespaces are as follows:
- Local Namespace includes local names inside a function. the namespace is temporarily created for a function call and gets cleared when the function returns.
- Global Namespace includes names from various imported packages/ modules that are being used in the current project. This namespace is created when the package is imported in the script and lasts until the execution of the script.
- Built-in Namespace includes built-in functions of core Python and built-in names for various types of exceptions.
The lifecycle of a namespace depends upon the scope of objects they are mapped to. If the scope of an object ends, the lifecycle of that namespace comes to an end. Hence, it isn’t possible to access inner namespace objects from an outer namespace.
What is Scope Resolution in Python?
Sometimes objects within the same scope have the same name but function differently. In such cases, scope resolution comes into play in Python automatically. A few examples of such behavior are:
Python modules namely ‘math’ and ‘cmath’ have a lot of functions that are common to both of them - log10(), acos(), exp() etc. To resolve this ambiguity, it is necessary to prefix them with their respective module, like math.exp() and cmath.exp().
What are decorators?
Decorators in Python are essentially functions that add functionality to an existing function in Python without changing the structure of the function itself. They are represented by the @decorator_name in Python and are called in a bottom-up fashion
The beauty of decorators lies in the fact that besides adding functionality to the output of the method, they can even accept arguments for functions and can further modify those arguments before passing it to the function itself. The inner nested function, i.e. ‘wrapper’ function, plays a significant role here. It is implemented to enforce encapsulation and thus, keep itself hidden from the global scope.
What are Dict and List comprehensions?
Python comprehensions are syntactic sugar constructs that help build altered and filtered lists, dictionaries, or sets from a given list, dictionary, or set. Using comprehensions saves a lot of time and code that might be considerably more verbose. Examples:
- Performing mathematical operations on the entire list
- Performing conditional filtering operations on the entire list
- Combining multiple lists into one
- Flattening a multi-dimensional list
What is lambda in Python? Why is it used?
Lambda is an anonymous function in Python, that can accept any number of arguments, but can only have a single expression. It is generally used in situations requiring an anonymous function for a short time period. Lambda functions can be used in either of the two ways:
- Assigning lambda functions to a variable e.g.
mul = lambda a, b : a * b
print(mul(2, 5)) # output => 10
- Wrapping lambda functions inside another function:
def myWrapper(n):
return lambda a : a * n
mulFive = myWrapper(5)
print(mulFive(2)) # output => 10
How do you copy an object in Python?
In Python, the assignment statement (= operator) does not copy objects. Instead, it creates a binding between the existing object and the target variable name. To create copies of an object in Python, we need to use the copy module. Moreover, there are two ways of creating copies for the given object using the copy module -
Shallow Copy is a bit-wise copy of an object. The copied object created has an exact copy of the values in the original object. If either of the values is a reference to other objects, just the reference addresses for the same are copied.
Deep Copy copies all values recursively from source to target object, i.e. it even duplicates the objects referenced by the source object.
How are arguments passed in python - by value or by reference?
In Python, arguments are passed by reference, i.e., reference to the actual object is passed.
- Pass by value: Copy of the actual object is passed. Changing the value of the copy of the object will not change the value of the original object.
- Pass by reference: Reference to the actual object is passed. Changing the value of the new object will change the value of the original object.
What is pickling and unpickling?
Python library offers a feature - serialization out of the box. Serializing an object refers to transforming it into a format that can be stored, so as to be able to deserialize it, later on, to obtain the original object. Here, the pickle module comes into play.
Pickling:
Pickling is the name of the serialization process in Python. Any object in Python can be serialized into a byte stream and dumped as a file in the memory. The process of pickling is compact but pickle objects can be compressed further. Moreover, pickle keeps track of the objects it has serialized and the serialization is portable across versions.
The function used for the above process is pickle.dump().
Unpickling:
Unpickling is the complete inverse of pickling. It deserializes the byte stream to recreate the objects stored in the file and loads the object to memory.
The function used for the above process is pickle.load().
Note: Python has another, more primitive, serialization module called marshall, which exists primarily to support .pyc files in Python and differs significantly from the pickle.
What are generators in Python?
Generators are functions that return an iterable collection of items, one at a time, in a set manner. Generators, in general, are used to create iterators with a different approach. They employ the use of yield keyword rather than return to return a generator object.
Instead of computing all the values upfront and storing them in memory, it generates them on-the-fly using a function and the yield keyword.
- Memory Efficiency: Since generators generate values on-the-fly, they can represent large sequences of data without consuming memory for all the items at once. This is particularly useful when working with large datasets or streams of data.
- Lazy Evaluation: Generators compute values lazily, meaning they produce values one at a time and only when requested. This can lead to performance improvements in scenarios where not all values are needed.
- Infinite Sequences: Generators can represent infinite sequences
generate fibonacci numbers upto n
def fib(n):
p, q = 0, 1
while(p < n):
yield p
p, q = q, p + q
x = fib(10) # create generator object
x.__next__() # output => 0
x.__next__() # output => 1
x.__next__() # output => 1
x.__next__() # output => 2
x.__next__() # output => 3
x.__next__() # output => 5
x.__next__() # output => 8
x.__next__() # error
for i in fib(10):
print(i) # output => 0 1 1 2 3 5 8
What are the benefits of generator functions?
- Memory Efficiency: Generators yield one item at a time, making them more memory-efficient than functions that return a list with all the output values, especially for large data sets.
- Lazy Evaluation: Generators produce values only when requested, allowing you to start using the results immediately without waiting for the entire result set to be generated, leading to better performance.
- Simplicity and Modularity: Generators can simplify code by eliminating the need for temporary variables and complex loops, and they can be easily chained together to create modular data processing pipelines.
What is PYTHONPATH in Python?
PYTHONPATH is an environment variable which you can set to add additional directories where Python will look for modules and packages. This is especially useful in maintaining Python libraries that you do not wish to install in the global default location.
What is the use of help() and dir() functions?
help() function in Python is used to display the documentation of modules, classes, functions, keywords, etc. If no parameter is passed to the help() function, then an interactive help utility is launched on the console.
dir() function tries to return a valid list of attributes and methods of the object it is called upon. It behaves differently with different objects, as it aims to produce the most relevant data, rather than the complete information.
- For Modules/Library objects, it returns a list of all attributes, contained in that module.
- For Class Objects, it returns a list of all valid attributes and base attributes.
- With no arguments passed, it returns a list of attributes in the current scope.
What is the difference between .py and .pyc files?
.py files contain the source code of a program. Whereas, .pyc file contains the bytecode of your program. We get bytecode after compilation of .py file (source code). .pyc files are not created for all the files that you run. It is only created for the files that you import.
Before executing a python program python interpreter checks for the compiled files. If the file is present, the virtual machine executes it. If not found, it checks for .py file. If found, compiles it to .pyc file and then python virtual machine executes it.
Having .pyc file saves you the compilation time.