Welcome to pyDataset

  1. Renero
Dataset is for educational purposes, mainly. It tries to help those approaching Data Science in Python for the first time, who must deal with common (and time consuming) data preparation tasks.

This package tries, through a very simple approach, to collect all the common tasks that are normally done over pandas DataFrames, like:

  • load data
  • set the target variable
  • describe the health status of the dataset
  • drop/keep columns or sample from simple lists
  • split the dataset
  • count categorical and numerical features
  • fix NA’s
  • find correlations
  • detect skewness
  • scale numeric values
  • detect outliers
  • one hot encoding
  • find under represented categorical features
  • perform stepwise feature selection
  • compute information gain,
  • plot some useful charts


To install this package, simply pip from this git repo:

$ pip install git+https://github.com/renero/dataset

The API Documentation

If you are looking for information on a specific function, or method, this part of the documentation is for you.

