Python Interview QA DataScience GenAI
Python Interview QA DataScience GenAI
1. What are Python's key features that make it suitable for data science and Gen AI development?
Answer: Python has simple syntax, a large ecosystem of libraries (like Pandas, NumPy,
Scikit-learn, TensorFlow, PyTorch, Transformers), strong community support, and flexibility for rapid
prototyping.
2. Explain the difference between a list, tuple, set, and dictionary in Python.
Answer: Lists are ordered and mutable; tuples are ordered and immutable; sets are unordered
and contain unique elements; dictionaries store key-value pairs and are mutable.
Answer: List comprehension provides a concise way to create lists. Example: [x**2 for x in
Answer: Python uses reference counting and garbage collection to manage memory. The 'gc'
Answer: Generators allow you to iterate through data without storing everything in memory using
Answer: Decorators modify the behavior of functions or methods. They're often used for logging,
8. What libraries do you use in Python for data manipulation and analysis?
Answer: Pandas, NumPy, SciPy for analysis; Matplotlib, Seaborn, Plotly for visualization;
Answer: Use 'isnull()', 'dropna()', or 'fillna()' to detect, remove, or fill missing values respectively.
10. What are groupby operations in Pandas and when do you use them?
Answer: 'groupby()' splits the data into groups, applies a function (like mean or sum), and
11. Explain the difference between NumPy arrays and Python lists.
Answer: NumPy arrays are faster, support vectorized operations, and use less memory compared
to Python lists.
Answer: You can perform operations directly on arrays without loops. Example: 'arr * 2' multiplies
each element by 2.
Answer: Use 'plt.plot()', 'plt.hist()', 'sns.barplot()', etc. Matplotlib is low-level; Seaborn is built on top
15. How do you use Python to interact with transformer-based models like GPT or BERT?
Answer: Use libraries like Hugging Face Transformers to load models, tokenize input, and
Answer: It's a Python library to work with pretrained NLP models. I've used it to build
Answer: Use Hugging Face Trainer API with a custom dataset and define training arguments,
Answer: Tokenization splits text into tokens. In Transformers, use model-specific tokenizers like
'AutoTokenizer.from_pretrained(...)'.
19. Explain the concept of embeddings and how you generate them using Python.
Answer: Embeddings are vector representations of text. Generate them using models like BERT
Answer: Use vector stores (like FAISS) for retrieval, retrieve relevant documents, then pass them
22. Write a Python script to read a CSV file and calculate summary statistics.
df = pd.read_csv('data.csv')
print(df.describe())
try:
result = 10 / 0
except ZeroDivisionError:
24. Explain the difference between synchronous and asynchronous programming in Python.
Answer: Synchronous code runs line-by-line; asynchronous uses 'async/await' for concurrent I/O
25. What are Python's data classes and where would you use them?
Answer: Data classes reduce boilerplate for classes that primarily store data. Use '@dataclass'