- Upload clean survey data to DB (either in PHPMyAdmin or https://www.lrs.org/data/advanced/tools).
- Create and calculate dynamic stats in survey data at https://www.lrs.org/public/data/tools/create-dynamic-columns/
- NOTE: some outlier stats are dynamic, and therefore all calculations must occur before processing them!
- Calculate outliers at LINK
Outliers are processed as follows.
- A list of outlier stats is created.
- E.g. books, books_per, ebooks, libcomp
- Outlier columns for each outlier stat (xxxx_r) are created in the clean survey data in the DB.
- These columns are always types INT (integer) or DOUBLE (decimal).
- A default value of “-1” is given.
- All outlier data are selected and inserted into their respective outlier columns.
- E.g. UPDATE table SET books_r = books, books_per_r = books_per, ebooks_r = ebooks, libcomp_r = libcomp …
- All outlier data are grouped by enrollment group, with cdeschlcode as the primary key.
- E.g. [stat][enrollment code][school ID] = value
- Percentiles are calculated on a PER ENROLLMENT GROUP basis
- Each variable/enrollment group is iterated, and any value that is in the 95th percentile is marked as an outlier.
- All outliers are given a value of “-1” that ensures they will not be included in aggregate calculations.
- E.g. UPDATE table SET books_r = “-1” WHERE cdeschlcode = 1,