AI to score Chinese candidates, much higher than human scoring

There are a lot of studies have shown that including the GRE, including e-rater scoring machine, many AI mechanisms to judge the article, there are obvious flaws.
But long years, AI has not been abandoned not only a variety of essay exams, but more and more popular.

GRE: machines than people prefer Chinese candidates

As early as 1999 , sponsored by GRE American Educational Testing Service (ETS), to begin writing the score with e-rater.
According to official information with, the natural language processing (NLP) model, scoring criteria are the following eight:
*  Based on content analysis vocabulary considerations (Content Analysis Based on Vocabulary Measures)
·  Vocabulary complexity / language (Lexical Comlexity / Diction)
·  Syntax error ratio (Proportion of Grammar Errors)
·  Usage error ratio (Proportion of Usage Errors)
·  Mechanical error ratio (Proportion of Mechanics Errors)
It refers to the spelling wrong, wrong capitalization, punctuation and so wrong technical problems.
·  Style comments proportion (Proportion of Style Comments)
For example, a phrase appeared excessive, excessive sentence is too short, too long sentences and so on.
·  Article organization and development division (Organization and Development Scores)
·  Authentic language (Features Rewarding Idiomatic Phraseology)
Of course, this is more than AI service GRE. At least, as TOEFL and GRE, ETS also produced the exam.
As for this algorithm are defects lie, ETS official would have done a lot of research, and never shy findings.
In 1999,2004,2007,2008,2012 and 2018 essay, the AI can be found to the Chinese mainland scores of candidates, generally score higher than humans.
In contrast, African-Americans who, AI often give points lower than humans. Candidates in the mother tongue is Arabic, Spanish and Hindi there, we had a similar situation. Even if the team has improved algorithms, it did not eliminate the problem.
ETS, a senior researcher, said:
If we want the algorithm friendly to a group of a few countries, it is likely to be harmful to the other groups.
Still further, individual sub-scores to observe the situation of AI.
All candidates will find inside, e-rater candidates to mainland Chinese grammar  (Grammar) and writing skills  (Mechanics) points, generally low;
The selection of articles on the length and complexity of the word, the Chinese mainland candidates AI scoring above average. Ultimately, AI continent to score the candidate's overall score higher than humans. GRE essay out of six points, AI than human scoring average of 1.3 points higher .
In contrast, African-Americans who, AI than humans scoring average low 0.81 points . Well, this is just the average data, in many candidates there, which is more severe than the difference come.
Either 1.3 or 0.81, at 6-point scale exams are not small numbers, it could seriously affect the results of the candidates.
More than that, MIT little friends worked on a called BABEL algorithm, the complex collage of words together, the resulting article does not have any real meaning, but was GRE scores online tool ScoreItNow! Played 4 points good grade.
However, ETS said, AI is not alone graders, AI score of each essay, there is a human at the same time scoring. Then, the difference in scoring between man and machine, to a second class to the individual judgment, obtain the final score.
So, ETS think the candidates will not be adversely affected by AI defects.
But compare the traditional method is two humans simultaneously to an article in scoring; and when the AI ​​replace one of them scoring, the equivalent of that person's responsibilities into a review.
Probably cost down a lot, how much influence the outcome of hard to say, there are at least a score difference before and AI mechanisms involved.
Fortunately, GRE scores have with humans and AI.
There are many exams can be sentenced directly to the AI ​​essay:

GRE algorithm is more than a problem

For example, VICE survey found Utah AI as the main (Primary) essay scoring tool, has for some years.
An official in the state explains why:
In addition to time-consuming manual scoring it is also a major state spending.
So, with the AI ​​to score for the writing, while reducing costs, we can not be fair and impartial?
American Studies Association (American Institutes of Research,, AIR ) is a non-profit organization, is the most important exam Utah provider.
The AIR annually make a report assessing the fairness of a number of new topics.
A focus is the assessment of: girls and minority students, on a particular exam, is not better than the male / white performance worse . This indicator is called "functional differences questions (DIF)".
The report shows that 2017-2018 school year, grades three through eight questions in writing, there are 348 Dao problem is determined to be a slight DIF for girls and minority students; by contrast, a slight DIF for boys and white students topic has 40 Dao .
In addition, there are three questions to be judged: to have the girls and minority students serious DIF . These topics will be referred to a special audit committee.
There are several possible reasons for DIF, and the algorithm bias is a factor we are most concerned about.
One parent from Utah (@dahart), occupies the top floor of Hacker News discussion boards.
He did not like to listen to officials talk about the "cost." He felt that education has always been time-consuming, can not be fast and cheap.
He said the child's essay scoring machine, the whole family is not satisfied AI to score, wife and children will cry.

Guess you like

Origin www.cnblogs.com/shangke0975/p/11770774.html