Outliers

  1. Upload clean survey data to DB (either in PHPMyAdmin or https://www.lrs.org/data/advanced/tools).
  2. Create and calculate dynamic stats in survey data at https://www.lrs.org/public/data/tools/create-dynamic-columns/
    1. NOTE:  some outlier stats are dynamic, and therefore all calculations must occur before processing them!
  3. Calculate outliers at LINK

Outliers are processed as follows.

  1. A list of outlier stats is created.
    1. E.g. books, books_per, ebooks, libcomp
  2. Outlier columns for each outlier stat (xxxx_r) are created in the clean survey data in the DB.
    1. These columns are always types INT (integer) or DOUBLE (decimal).
    2. A default value of “-1” is given.
  3. All outlier data are selected and inserted into their respective outlier columns.
    1. E.g. UPDATE table SET books_r = books, books_per_r = books_per, ebooks_r = ebooks, libcomp_r = libcomp …
  4. All outlier data are grouped by enrollment group, with cdeschlcode as the primary key.
    1. E.g. [stat][enrollment code][school ID] = value
  5. Percentiles are calculated on a PER ENROLLMENT GROUP basis
  6. Each variable/enrollment group is iterated, and any value that is in the 95th percentile is marked as an outlier.
  7. All outliers are given a value of “-1” that ensures they will not be included in aggregate calculations.
    1. E.g. UPDATE table SET books_r = “-1” WHERE cdeschlcode = 1,
Scroll to Top