[Competition Introduction]Feedback Prize - Predicting Effective Arguments

insert image description here
contest link

Competition introduction

The goal of this competition is to classify the argument elements in students' writing as "valid ( effective)", "sufficient ( adequate)", or "invalid ( ineffective)". You'll create a model trained on data representative of the US population in grades 6-12 to minimize bias. The models derived from this competition will help pave the way for students to receive more feedback on their argumentative writing. With automated guidance, students complete more assignments and ultimately become more confident and proficient writers.

Writing is the key to success. In particular, argumentative writing develops critical thinking and civic engagement skills and can be strengthened through practice. Yet only 13 percent of eighth grade teachers required their students to write persuasively each week. In addition, resource constraints disproportionately affect black and Hispanic students, so they are more likely to write at a "below basic" level than their white peers. Automated feedback tools are one way to make it easier for teachers to grade writing tasks assigned to students, which will also improve their writing skills.

There are many automated writing feedback tools available, but they all have limitations, especially when it comes to argumentative writing. Existing tools often fail to assess the quality of elements of an argument, such as organization, evidence, and idea development. On top of that, many of these writing tools are out of reach for educators because of the high cost, which largely affects already underserved schools.

Georgia State University (GSU) is an undergraduate and graduate urban public research institution in Atlanta. U.S. News & World Report ranks GSU as one of the nation's most innovative universities. GSU awards more bachelor's degrees to African Americans than any other nonprofit college or university in the country. GSU and The Learning Agency Lab, an independent nonprofit based in Arizona, focus on developing learning-based science tools and programs for social good.

To best prepare all students, GSU ​​and the Learning Institutions Laboratory have teamed up to encourage data scientists to improve automated writing assessments. This public effort could also encourage higher quality and more accessible automated writing tools. If successful, students will receive more feedback on the argument elements of their writing and will apply this skill across many disciplines.

assessment method

The first game of this competition focuses on classification accuracy. Submissions for this track are evaluated using a multiclass log loss. Each row in the dataset is labeled with a ground-truth validity label. For each row, you must submit the predicted probability that the item belongs to each quality label. The formula is:
log loss = − 1 N ∑ i = 1 N ∑ j = 1 M yijlog ( pij ) log\ loss = -\frac{1}{N}\sum_{i=1}^{N}\sum_{ j=1}^{M}y_{ij}log\left ( p_{ij} \right )log loss=N1i=1Nj=1Myijlog(pij)
whereNNN is the number of rows in the test set,MMM is the number of class labels,log logl o g is the natural logarithm, if the observed valueiii is in classjjj , thenyij y_{ij}yijis 1, otherwise 0, and pij p_{ij}pijis the observed value iii belongs tojjThe predicted probability of class j , in this case the log is the natural log.

The submission probabilities for a given utterance element do not need to be summed: they are rescaled before scoring, dividing each row by the row sum. To avoid log logIn the extreme case of the l o g function, the predicted probability is replaced by:

m a x ( m i n ( p , 1 − 1 0 − 5 ) , 1 0 − 15 ) max\left ( min\left ( p, 1-10^{-5} \right ), 10^{-15} \right ) max(min(p,1105),1015 )
You must submit a CSV file containing the discourse_id for each utterance element and the predicted probability for each element in the three effectiveness ratings. The order of the lines does not matter. The file must have a header and should have the following format:

discourse_id,Ineffective,Adequate,Effective
a261b6e14276,0.2,0.6,0.4
5a88900e7dc1,3.0,6.0,1.0
9790d835736b,1.0,2.0,3.0
75ce6d68b67b,0.33,0.34,0.33
93578d946723,0.01,0.24,0.47
2e214524dbe3,0.2,0.6,0.4

data description

The dataset provided by Kaggle contains argumentative essays written by students in grades 6-12 in the United States. These passages are annotated by expert raters for discourse elements commonly found in argumentative essays:

  • Lead- An introduction that begins with statistics, citations, descriptions, or some other means to grab the reader's attention and point to the paper
  • Position- Comments or conclusions on major issues
  • Claim- Claims in support of the position
  • Counterclaim- A claim that refutes another claim or makes contrary grounds to that position
  • Rebuttal- Refutation of counterclaim claims
  • Evidence- Ideas or examples that support a claim, counterclaim or refutation.
    Concluding Statement- Concluding statement restating the statement

The contestant's task is to predict the quality level of each utterance element. Human readers rated each rhetorical or argumentative element as one of the following, in order of increasing quality:

  • Ineffective
  • Adequate
  • Effective

Training Data
The training set consists of a .csv file containing annotated utterance elements for each article, including quality ratings, and a .txt file containing the full text of each article. It is important to note that some parts of the paper will have no annotations (i.e. they do not fall into one of the above categories) and they will lack a quality rating. Uncommented parts of train.csv are not included.

  • train.csv- Contains annotated utterance elements for all articles in the test set.
    • discourse_id- ID code of the utterance element
    • essay_id- ID - ID code of the paper response. This ID code corresponds to the name of the full-text file in the train/ folder.
    • discourse_text- The text of the utterance element.
    • discourse_type- The class label for the utterance element.
    • discourse_type_num- Discusses the enum class label of the element.
    • discourse_effectiveness- Quality rating of utterance elements, targets.

To help contestants code their submissions, Kaggle provides some example instances selected from the test set. When contestants submit notebooks for scoring, this sample data will be replaced with actual test data, including sample_submission.csvthe .

  • test/- A folder containing example articles from the test set. The actual test set contains about 3,000 papers in a format similar to the training set papers. Test set papers are different from training set papers.
    • test.csv- Comments for test set articles, containing train.csvall fields of , except target discourse_effectiveness.
    • sample_submission.csv- A well-formed example submission file.

insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/cjw838982809/article/details/132100284