1. How big is the data?
df.shape
is used in pandas to get the dimensions of a DataFrame.
2. How does the data look like?
df.sample
in pandas is used to select rows or columns from a DataFrame randomly.
3. What is the data type of cols?
df.info
()
is a pandas method that provides a concise summary of a DataFrame.
4. Are there any missing values?
df.isnull().sum()
in pandas is used to identify and count the number of missing (null or NaN) values in each column of a DataFrame.
5. How does the data look mathematically?
df.describe()
in pandas is used to generate descriptive statistics for the numerical columns (by default) in a DataFrame.
6. Are there duplicate values?
df.duplicated().sum()
in pandas is used to identify and count duplicate rows in a DataFrame.
7. How is the correlation between cols?
df.corr()['Survived']
is used to compute the correlation coefficients between the column Survived
and all other numeric columns in the DataFrame.