Exploratory Data Analysis Questions
Practice questions for Exploratory Data Analysis topic in AWS Certified Machine Learning - Specialty. 48 questions covering this domain.
A data science team is building a demand forecasting model using a dataset with daily sales records. They want to capture weekly seasonality and recen...
A binary classification model for fraud detection is trained on a dataset where 99% of records are non-fraud and 1% are fraud. The model achieves 99% ...
A training dataset for a regression model contains some records where a continuous feature has values that are more than 4 standard deviations from th...
A data scientist wants to apply a series of transformations to a dataset imported from Amazon Redshift, visualize feature correlations, detect data qu...
A data scientist is building a model to predict whether a customer will cancel their subscription next month. The dataset includes a column cancellati...
A dataset for ML training contains 500 features, many of which are correlated. Training is slow and model performance is poor due to the curse of dime...
A data scientist preprocesses features using a standardization scaler and then runs 5-fold cross-validation. The scaler parameters (mean and standard ...
A data scientist is building a binary classification model and wants to identify which features have the highest individual predictive power relative ...
A dataset contains features with very different scales — age (0 to 100) and annual salary (0 to 200,000). A gradient descent-based algorithm is conver...
A machine learning model requires numeric input features. A dataset contains a categorical feature named color with values Red, Green, and Blue. Which...
A data scientist builds a linear regression model and finds that the overall R-squared is high, but several individual feature coefficients have unexp...
A dataset for ML training has a numeric feature with 8% missing values that are Missing Completely at Random (MCAR). Which is the simplest and most co...
A data scientist is analyzing whether two numeric features in a dataset are linearly correlated. The features are continuous and appear to be normally...
A large tabular dataset has 40% missing values in an important continuous feature. Simple mean imputation is considered but the data scientist suspect...
A data scientist wants to identify which features in a dataset have very low variance and contribute minimal information to the model. Removing these ...
A data scientist trains a model and evaluates it with 5-fold cross-validation, achieving a mean validation accuracy of 92%. When the model is deployed...
A dataset has a continuous feature with values ranging from 10 to 10,000. A decision tree-based algorithm is being used. Which preprocessing step is N...
A data scientist is performing feature engineering on a time-stamped customer transaction dataset. They want to capture whether a transaction occurred...
A feature engineering pipeline for a natural language processing task converts raw text reviews into numeric representations. A review column contains...
A classification model for predicting customer churn is being evaluated. The business team cares most about ensuring that customers who are truly at r...
Sign in to see all 48 questions
Create a free account to browse all questions — completely free during our launch phase.