The revised Standards for RTOs have ditched "statistically valid" sample size, but this is a risk.
- Specialised VET Services
- 3 days ago
- 9 min read
Why we should not ditch the "statistically valid" sample size from validation of assessment judgements under the revised Standards for RTOs.

The revised Standards for Registered Training Organisations (RTOs) have introduced a number of additions and changes that RTOs must demonstrate they meet for ongoing registration. They have also dropped a number of terms, practices and requirements compared to those under the Standards for RTOs 2015.
One noted in particular is the requirement for a statistically significant random sample to be calculated for post-use validation purposes.
This article explains why we think this is a poor move and why RTOs should maintain the "statistically significant" in their random samples for validation of assessment judgements.
First of all, let's break down a few key understandings:
1) The 2015 requirements
In the Standards for RTOs 2015, Clauses 1.9 - 1.11 laid out the requirements for RTOs to conduct validation and stipulated:
There had to be a 5 year plan in place based on relative risk of the training products outlining:
What on the RTO's scope of registration is to be validated, when
Who will lead / participate - and what credentials they must hold
What will happen with the results from the activity
This requirement was supported with the definitions of validation and statistically valid included in the Glossary, whereby validation used to mean:
"Validation is the quality review of the assessment process. Validation involves checking that the assessment tool/s produce/s valid, reliable, sufficient, current and authentic evidence to enable reasonable judgements to be made as to whether the requirements of the training package or VET accredited courses are met. It includes reviewing a statistically valid sample of the assessments and making recommendations for future improvements to the assessment tool, process and/or outcomes and acting upon such recommendations."
And the term statistically valid:
"...for the purposes of these Standards, a random sample of appropriate size is selected to enable confidence that the result is sufficiently accurate to be accepted as representative of the total population of assessments being validated."
This meant that to perform validation of assessment judgments as part of the 5-year, risk-informed validation schedule, RTOs needed to calculate a statistically valid random sample.
What's the point of a statistically valid random sample?
To answer this question, let's again, first break down the components.
A random sampling technique means the evidence selected for analysis will be drawn from a range of assessment judgements - there is no deliberate picking the best, picking the worst, just the results from this cohort or from that trainer etc. The 'random' is meant to ensure that whatever is being reviewed is likely to give a good cross section of actual RTO practice. In essence, random sampling is about creating a smaller sub-set from the total group. And creating that sub-set in such a way as it gives every item, every person, from the larger group, an equal opportunity of being selected for inclusion in the sub-set.

(Image from Skills Education short course How to Create a Random Sample)
Random sampling is meant to avoid bias - like only picking the best to review, or only looking at assessment judgments made by the most experienced trainers. Neither of these circumstances will give a true indication of what's happening across the board. And basing any operational decisions on information that does not give the fullest picture is a risk. So, using a random sampling technique to create the subset, the sample, is a way to reduce risk.
The other critical element to be considered is size of the sample.
Sure, it may have been selected in way that ensures an equal chance of inclusion, but if the size of the sample group is small, there's the chance the following might happen:
The sample is less likely to representative of the entire group
Some members (people/items) of the original group may be missed altogether
Both of these scenarios increase risk that the results from reviewing the sample will not be a true indication of what's happening across the board, and therefore, increases the possibility that true issues will be missed.

(Image from Skills Education short course How to Create a Random Sample)
So how do you figure out how big is big enough?
That's where the notion of 'statistically valid' comes in.
A statistically valid sample size is one that is large enough that you can be reasonably confident that the results from the sub-set sample group will give an accurate and reliable picture of what's going on in the entire group.
Statistically valid findings are considered:
Trustworthy
A genuine reflection
So, when you cannot validate in every single one of your assessment tools every single assessment judgement made by every single assessor in every single course delivered with every single cohort across every single location, creating a random sample of statistically significant size gives a degree of assurance that that smaller cross-section snippet will give a reasonably accurate picture of actual practice and results across all those variables.
Yes, there are some complex mathematical calculations underpinning the calculation of a statistically valid sample size but there are numerous 'sample size calculators' that make it easy. A user simply has to plug in a set of known values and the calculator tells how many need to be in the sub-set for it to be - within set margins - representative of the entire group.
The set margins relate to confidence and error level. Basically, how confident you want to be that the results from the sample will truly reflect the entire group results and how much error in prediction of those results you are willing to accept.
Here's an example:
The RTO has 1,000 assessments completed in the last 6 months for a brand new course they are delivering - the unit is new, the assessments are new, the trainers are new.
1,000 is the entire group for that particular course - everything across all of its assessors, of all its locations.
Because the course is new, the RTO wants to ensure everything across the board is going well and as it should be; they want to catch any issues quickly so they can be fixed before moving too much further down the track with the next cohort.
The RTO is only willing to accept a 5% margin of error and they want to be confident that the results from the sample analysis will reflect the entire group's results 99% of the time. So the calculator determines a statistically significant sample size of 400 is needed. So 400 must be randomly chosen from the 1,000.

For another, different and well-established course that has been run many times with trainers experienced with the content, the RTO is less worried that there are unknown issues. They simply want to confirm that everything is tracking along nicely - as it should be to deliver quality training and assessment.
So this time, they set a different set of margins in the calculator. They stay with the default values of a 15% margin of error and a 95% confidence level - saying they want to be confident that the results from the sample analysis will reflect the entire group's results 95% of the time.
As you can see, the number required for the sample size this time is drastically changed to the previous:

The variance in what level of risk the RTO considered each course to present, guided their calculations for how big their random sample needed to be so that they could be assured the results would be:
Trustworthy
A genuine reflection of the entire group of 1,000 judgements made
The revisions to the Standard for RTOs 2015 have, however, produced a different set of requirements.
Here's what the Outcome Standards, in full regulatory effect from 1 July 2025, have to say about validation:
2) The 2025 Requirements
In the National Vocational Education and Training Regulator (Outcome Standards for NVR Registered Training Organisations) Instrument 2025 (Outcome Standards 2025), validation is defined as:
"validation means the review of the assessment system to ensure that:
(a) assessment tools are consistent with the training product and the requirements set out in this instrument; and
(b) assessments and assessment judgements are producing consistent outcomes."
You'll note the differences between this new definition of requirements and the last. There is no mention of reviewing a sample, of ensuring that sample is statistically valid, nor about using the results to make recommendations for improvements.
The Outcome Standards make no mention of 'statistically valid' and therefore, no mention of 'random'.
They do however, makes stipulations about validation under Standard 1.5.
The RTO must demonstrate:
Validation is used to ensure compliance with the Training Package and Standards requirements
A risk-based 5-year validation plan is in place for everything on its scope of registration
How it uses a risk-based approach to determine which parts of its assessment system will be validated, when; and what the sample size for those training products is going to be
It gets TAE training products independently validated after use with the first cohort of learners
People with the requisite credentials are involved in validation
That validation outcomes are not solely determined by the same person who designed and/or delivered the course
Validation outcomes inform continuous improvement
(NB: the above is paraphrasing Standard 1.5)
To drill down into Standard 1.5.2(c)(ii), it specifically states:
"An NVR registered training organisation demonstrates it utilises a risk-based approach – informed by any risks to training outcomes, any changes to the training product or any feedback from VET students, trainers, assessors, and industry – to determine the sample size of assessments that are to be validated in respect of a particular training product"
This is telling the RTO what determinants must contribute to the selection of the sample size for validation and yes, it must be risk-informed.
But (in our opinion) it falls short of accounting for the biggest risk - and that is not requiring a random sample drawn to give each item/person an equal chance of selection for analysis, nor a sample of a size large enough to be considered an accurate and reliable representation of the entire group. It is not requiring a sample that will give results that are less likely to be biased and be truly representative of the results from assessment judgements given by all assessors to all students in all locations.
That is, the Outcome Standards no longer mandate the trustworthy, genuine reflection of results from the sample to the wider, full scope of actual practice.
We see this as an incredible oversight. The law now literally says an RTO can pull together a sample irrespective of size or assurance of how representative it is, as long as it is a sample chosen because there is a risk to the training product because of outcomes being achieved, changes to the unit/s and/or because of feedback from stakeholders. These things are all important for sure, but wouldn't an RTO (and regulator) want to be certain of the results? Ensuring that they're not just the results from the course that was delivered by the most experienced trainer, or the least experienced trainer, or the course that attracted the most feedback, or the course that had changed recently?
With the stipulation simply being "a sample" who's to say that a sample of 10 out of 1000 is not going to be appropriate?
By the way, a sample of 10 could mean that the RTO is prepared to accept a 15% margin of error with results that would probably apply to the whole only 65% of the time. Or, with a better ranking on margin of error (10%), a sample of 10 will give results that are only 45% likely to mirror the whole 1000. Can you see the inherent risk here of not using a statistically valid sample size?
True risk-based determination must account for all variables across all of the operating environment - which means more factors likely to be present across everything under investigation!

One glimmer of hope
The main difference between the overall 2015 requirements and those under the revisions made for 2025, is that the Outcome Standards are not prescriptive. As Outcome Standards, the RTO must determine how it will meet the requirements contained in the legislative instrument.
In both cases, the legislative instrument sets out the MINIMUM actions for quality and compliance.
There's nothing to say RTOs cannot - and should not - go above and beyond the minimum requirements for delivery of quality training and assessment in Australia. So, there is scope for RTOs to maintain the logical use of the statistically valid sample size when pulling together their random sample for validation purposes.
For all of the reasons outlined above, we encourage RTOs not to roll the dice on how valid their sample selection will be. The whole point of validation is to confirm judgements made and to pinpoint areas where improvements may be required.
Before applying any changes to systems, tools and processes across the entire RTO's operation, take the necessary precautions to ensure any inferences from, or conclusions about, results from the analysed sample are truly representative of everything. That is, that the sample is big enough and representative enough that results can reliably apply to the whole. Anything less is a risk - and potentially a costly risk for RTOs.
We encourage RTOs to maintain the practice of random sampling.
We encourage RTOs to not ditch the 'statistically valid' component of their random samples and instead, ensure the effort they go to in conducting validation of assessment is going to produce results that can reliably be counted on to paint a true picture of all assessment practice.
-This article is AITA Scale Rating 1 - 100% Human Intelligence
What is the AITA Scale?