How to Understand Any Dataset: 7 Essential Questions

How to Understand Any Dataset: 7 Essential Questions

1. How big is the data?

df.shape is used in pandas to get the dimensions of a DataFrame.

2. How does the data look like?

df.sample in pandas is used to select rows or columns from a DataFrame randomly.

3. What is the data type of cols?

df.info() is a pandas method that provides a concise summary of a DataFrame.

4. Are there any missing values?

df.isnull().sum() in pandas is used to identify and count the number of missing (null or NaN) values in each column of a DataFrame.

5. How does the data look mathematically?

df.describe() in pandas is used to generate descriptive statistics for the numerical columns (by default) in a DataFrame.

6. Are there duplicate values?

df.duplicated().sum() in pandas is used to identify and count duplicate rows in a DataFrame.

7. How is the correlation between cols?

df.corr()['Survived'] is used to compute the correlation coefficients between the column Survived and all other numeric columns in the DataFrame.