Test Score Analytics Using APEX

What is the value for a school district to have all of its student performance data in one database?

The answer is quick and accurate multi-year analytics. After the initial report is created, each year is uploaded and populates all of the reports.


Here is an example from a school district, that is looking forward to this year’s Beginning of Grade 3 test (BOG) data and making comparisons to other test data. Using a brief SQL statement the mean scores and standard deviations can be computed and reported for each school for each year in the APEX data system.

Then the distribution of scores in the form of a histogram for each of the years can be viewed. Using a simple drop-down filter, the year and the individual school can be overlaid and viewed .

The question is then what is the relationship between the BOG scores and the i_Ready reading scores when both are converted to Normal Curve Equivalent (NCE) scores? Here is the sample SQL code for that report:

   where I_READY_RD.SID=BOG.SID and BOG.YEAR = 2020
      and I_READY_RD.TEST_DATE = ‘BOY’
        group by BOG.SCH_CODE
      order by BOG.SCH_CODE

The next question is: What does this data look like as a scatter plot. In the scatter plot, an entire district can be shown for a testing year or the scatter plot can be made to show one school, and then the other data color will show al of the remaining schools. In conclusion, when a district has the BOG and I_Ready (or I_Station) data for this year, it will be better able to understand the significance of the information and perhaps understand the impact on being out of school for months has had on its students.  

Assessment Literacy: Item Statistics

At the 2018 Connecting Communities of Education Stakeholders Conference in Greensboro, I was pleased to see several sessions on using Schoolnet data to identify skill weaknesses and to form remediation student groups. The presenter began by listing listed the reasons teachers struggle with data analysis, with the reasons including:

  • Not enough time
  • Not knowing how to interpret the data
  • Too many numbers, and
  • Can’t get past the negative data (results).

The presenter then outlined the steps and strategies which are essential for making the most of the Schoolnet student scores.  These steps included looking at the overall average score of the assessment and the percentage of students who correctly answered each item. I completely endorse the approach the presenter recommended for examining the assessment data and identifying students in need. I recommend taking look at item statistics by someone at the district level and communicating information about “weak” items to the teachers to avoid misinterpretation of the student assessment results.

Digging Deeper Into the Scores

I sat next to an instructional coach who was following along with her own district’s benchmark data as she shared her reports with me for discussion. As we looked at the student scores, I asked if she had examined the item discrimination values. She was not aware of this term. I explained that item discrimination was the ability of the item to differentiate high performing students from low performing students.  For example, let’s say an item had a p-value of .50, which means that 50% of the students got the answer correct. So we could conclude it was an item of medium difficulty. However, if the item has a very low discrimination value, it means that the higher performing students got the item incorrect, while the lower performing students got the answer correct.

Here is the sample data for the item:
Answer A – incorrect 40% of the high performing students selected that answer.
Answer B – correct 42% of the lower performing students selected that answer.
Answer C – incorrect 10% of the lower performing students selected that answer.
Answer D – incorrect 8% of the lower performing students selected that answer.

As you can see in this oversimplified example, the item was probably answered correctly by chance and the item did not add to our understanding of student performance on that skill. The question to be asked is: What led the high performing students to select answer A?

  1. Was the item worded in such a way that led high performing students to mistakenly select answer A?
  2. Was there some misunderstanding of what was taught – learned that resulted in the high performing students’ error?
  3. Select items
Item Discrimination Statistics

The statistical procedure for computing item discrimination is called a point biserial correlation. This procedure transforms the responses to the item from A-D to a 0 or a 1 for incorrect and correct. A correlation is then performed using the 0 and 1 score against the total percent correct score. For this example shown in the graph, the point biserial correlation is -0.511.

 When more students in the lower performing group than in the upper performing group select the right answer to an item, the item actually has negative validity. Assuming that the criterion itself has validity, the item is not only useless but is actually serving to decrease the validity of the test.
See: http://ericae.net/ft/tamu/Espy.htm   for an excellent discussion of this topic.

 My Recommendations:
  1. Design your assessments so that each answer choice will provide meaningful information about the students’ understanding of the underlying skill.
  2. Include enough items on the assessment so that the skills are adequately sampled. The instructional coach showed me an assessment of 15 items and the average percent correct was 45% correct. This small number of items means that decisions were being made about students who got only 5-7 items correct.
  3. Field test each benchmark with a few students to identify problems before administering the test to hundreds of students.
  4. Always look at the p-value and item discrimination values for each item.
  5. Ask student why they selected an answer, especially if the item has a low discrimination value.
  6. Train the teachers in how to interpret the data.
    1. Provide protocols to guide the teachers in the interpretation process.
    2. Provide item analysis information such as if the student selected the incorrect foil A, it probably means there is a misunderstanding of a particular concept. That way all students who got that item incorrect and answered A can be remediated on the misunderstanding.
    3. Provide the data in a way that it is easy to view and manipulate.
    4. Have teachers COLLABORATIVELY examine their data and have a shared discussion so one teacher can scaffold other teachers’ understanding of the test results.

Dr. Lewis Johnson
Lead Consultant
Data Smart LLC