README_EN.md 11.8 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<h2>Content:</h2>
<ul>  
   <li><a href="#1"> What is ASYST? </a> </li>
   <li><a href="#2"> Which languages are supported by ASYST? </a></li> 
   <li><a href="#3"> How do I use ASYST? </a></li> 
   <ol>
      <li><a href="#4"> What does the input look like?</a> </li>
      <li><a href="#5">  How do I run ASYST on Windows 11? </a></li>
      
   </ol>
   <li><a href="#6"> How do I continue with the output from ASYST? </a></li> 
   <li><a href="#7"> How do I run ASYST if I don't use Windows 11? </a> 
         <ul><li><a href="#8"> Running ASYST in the Pycharm development environment</a> </li></ul></li>
</ol></ul>


<h2 id=1>What is ASYST?</h2>

ASYST is a program designed to support teachers as they grade free-text answers in tests: With the help of Artificial Intelligence,
ASYST makes grade suggestions that can be reviewed and, if necessary, modified by the teachers.

ASYST is intended for the evaluation of short answers that are one to three sentences in long. It is not intended to be used for longer responses.

ASYST helps the teacher by suggesting a grade. This assessment may well be incorrect in individual cases; the teacher can check and correct it. 
This saves time in two ways compared to completely manual grading: First, reviewing grades is generally faster than grading from scratch; 
and second, we recommend reviewing mostly those ASYST grades that are most prone to errors (see Section <a href="#6"> How do I continue with the output from ASYST? </a>).

The program is written in Python; the source code is publicly available. To make ASYST easier to use, the Python scripts have been 
converted into an executable that is usable in Windows 11.

Pado's avatar
Pado committed
31
32
The technical background and development history of ASYST are described in <a href="https://rdcu.be/dxPLg"> Pado, Eryilmaz and Kirschner, IJAIED 2023</a> along with a worked example for German data. For English data, a similar example is available in <a href="https://nlpado.de/~ulrike/papers/Pado22.pdf">Pado, AIED 2022</a>

33
34
<h2 id=2>Which languages are supported by ASYST?</h2>

Pado's avatar
Pado committed
35
ASYST has been tested for <a href="https://rdcu.be/dxPLg">German</a> and <a href="https://nlpado.de/~ulrike/papers/Pado22.pdf">English</a>. 
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203

The language model that covers German can in principle handle other languages, as well. So, in principle, you could select "German" as language setting and upload data in one of the other languages covered by the model. If you try this, please check the results carefully, as this is untested! (<a href="https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models">According to the model developers,</a> the covered languages are: ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt-br, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw.)


<h2 id=3>How do I use ASYST?</h2>
We developed ASYST to be as user-friendly as possible.

<h3 id=4>What does the input look like?</h3>

The program works based on your data in Excel's .xlsx format (which can also be generated by Libre Office Calc and other programs). This is an example table:

![table_input.png](images%2Ftable_input.png)

The following information needs to be included in the **correct order** and with the **correct column headings**:
1) **question**: The question that was asked
2) **referenceAnswer**: A correct answer /  reference answer
3) **studentAnswer**: The student answer that is to be evaluated
5) (optional) **observed grade**: The grade given by the teacher can be entered here in order to evaluate the accuracy of the ASYST predictions.

The example table can be found at <a href="https://transfer.hft-stuttgart.de/gitlab/ulrike.pado/ASYST/-/blob/main/DE_Demo_Daten.xlsx">DE_Demo_Daten.xlsx</a>. It contains some questions and answers from the CSSAG corpus (Computer Science Short Answers in German) of HFT Stuttgart. The corpus is licensed as CC-BY-NC.

<h3 id=5>How do I run ASYST on Windows 11? </h3>

First, download <a href="https://transfer.hft-stuttgart.de/gitlab/ulrike.pado/ASYST/-/blob/main/ASYST.exe">_ASYST.exe_ </a>.
The program can be started by double-clicking its icon.

The program will take a while to start (approx. 1 minute). During this time the system is initialized.

**Note**: Windows Defender may warn against running the application because the program does not have a security certificate.
By selecting _more information_ and then _Run anyway_ the error message disappears and ASYST can be executed. ASYST's source code is open source so you can verify the code is not malicious.


<img src="images/win_def_de_1.JPG" width="450">
<img src="images/win_def_de_2.JPG" width="450">

After the program has been started, a window appears. First, select the language of the answers to be evaluated.
The table to be processed can then be selected by clicking on the “Input File” field. The data should be arranged as described above.
After clicking on the “Start” field, ASYST begins processing the data. Again, this can take a while (1-2 minutes, relatively independent of the amount of data being processed).

Once the program has processed all answers and made predictions, a table view headed "Results" opens.

The results table contains all columns of the input table, as well as the grades suggested by ASYST -- see the the "predicted grade" column. The grades for answers classified as incorrect are highlighted in red. You cannot make change in this table. Instead, save the data by clicking on “Save as” and then open the resulting .xlsx file with a spreadsheet program.

![table_results.png](images%2Ftable_results.png)

As soon as the result table is displayed, ASYST can read and process the next input table.

**ATTENTION: The results table is not saved automatically.** 
Therefore, to save the results, the **"Save as"** button should be clicked and the output table should be saved at the desired location.


<h2 id=6>How do I continue with the output from ASYST?</h2>

We recommend the following **process** when using ASYST:

(Further information and a concrete example of the procedure can be found in <a href="https://nlpado.de/~ulrike/papers/Pado22.pdf">_Assessing the Practical Benefit of Automated Short-Answer Graders_</a>.)

1) **Define requirements**: How accurate does the grading need to be in my current use case?
    <ul>
    <li>When evaluating free text questions in low-stakes test situations (not in a final exam or similar), <b>human grades</b> differ in around 15% of cases! </li>
    <li>In such situations, we therefore recommend aiming for a grading accuracy of at least 85% after using ASYST plus human review. </li>
    <li>The distribution of grading errors (excessive strictness/leniency) should also be taken into account</li>
    <li>Ultimately, using the tool should save users time: set the available time budget or a minimum requirement for time saved. </li>
    </ul>

2) **Collect** manually evaluated **test data:**

    To get an idea of the accuracy and reliability of the automated grader, annotated test data is needed.
    This is input data for which a grade has already been determined. This can be answers from previous tests, for example.
    To make the data set as robust as possible against random fluctuations, it should ideally contain a few hundred responses -- but smaller data sets can of course also be used.

4) **Analyze** the performance of the automated grading
   
    The manually graded test data can be used to measure how reliable and accurate the automated grader is for the test data. This will give you an idea of how well the grade prediction works for your own data set.

    For this purpose, process the questions and answers from the test data set using ASYST and compare the grade predictions with the manual assessment (e.g. in a spreadsheet such as Excel or Libre Office Calc).

    The proportion of correctly classified answers in the entire data set gives you the system accuracy (which should be at around 85% or higher, which means disagreement between the manual and machine grades of 15% or less). 
    
    You can also calculate the precision for each grade ("correct"/"incorrect"). Precision measures how many predictions of a given grade were actually correct, i.e. how trustworthy the ASYST's predictions are for a given label. A precision of 75% for the rating "correct" means that three quarters of all predictions of "correct" were in fact right, but for a quarter of the cases the answer was actually wrong according to the manual grades.

    _(We plan to add functionality to automatically generate these reliability parameters in the future.)_

5) **Decide** how to use ASYST's predictions.
   
    A usage decision can now be made based on the reliability parameters collected for the criteria mentioned above.
    <ul>
    <li> How large is the proportion of correctly predicted reviews? If it is >85%, you can even adopt the ASYST predictions unchanged if you wish. </li>
    <li> What is the precision of the grade labels ("correct"/"incorrect")? If one of the grade labels is likely to be significantly more reliable, you can decide to accept these predictions without review and <b>only</b> check the predictions for the less reliable grade predictions. In practice, this leads to significant time savings. </li>
    <li>How is the grading error distributed? Are correct answers frequently predicted to be incorrect, or vice versa? Is this acceptable for your situation? </li>
    <li> How much evaluation effort would you have saved for the example data set, e.g. by accepting the more reliable grade label without review?
    </ul>



<h3 id=7>How can I run ASYST if I don't use Windows 11?</h3>
The clickable application “ASYST.exe” is only suitable for running on Windows 11.

On other operating systems, ASYST can be run from a development environment.
The ASYST source code is written in Python and can therefore be robustly executed in various development environments.
For users who are not familiar with running Python programs, one option is described in more detail below.
<h4 id=8>Running ASYST in the Pycharm development environment </h4>
<ol>
<li>If you haven't already done so, the Pycharm development environment from the Internet
     Download <a href="https://www.jetbrains.com/pycharm/download/?section=mac"> </a> and install.
     For more information and troubleshooting see
     <a href="https://www.jetbrains.com/help/pycharm/installation-guide.html">Pycharm installation guide</a>.</li>
<li>Install Python

ASYST was developed in Python 3.10 - therefore this version is recommended for execution.
The version that matches the operating system can be selected and installed at https://www.python.org/downloads.
</li>

<li> Download the source code from Gitlab to the development environment:

Get from VCS


<img src="images/get_from_vcs.png" width="450">


Enter the following path in the _url_ field: git@transfer.hft-stuttgart.de:ulrike.pado/ASYST.git


<img src="images/svn_url.png" width="450">


Then click on _clone_ and wait
</li>

<li>Configure development environment
**Configure Python interpreter:**

Navigate to _Settings >> Project ASYST >> Python Interpreter >> Add Interpreter >> Add local Interpreter_

![add_interpreter.png](images%2Fadd_interpreter.png)


![create_venv.png](images%2Fcreate_venv.png)


_Location_: [Project Path]/[Project Name]/Source,


_Base interpreter_: Path to the installed Python version


*Install required packages:*
If Pycharm does not itself suggest installing the packages listed in the requirements.txt,
manually run the following commands in the PyCharm terminal:

'''
> cd Source
> 
> 
> pip install -r requirements.txt

'''

</li>
<li>Run ASYST

![run_button.png](images%2Frun_button.png)


After the file _main.py_ has been selected via the project directory on the left, the greyed out _Start button_ at the top right of the window will appear green. A single click is enough to execute the ASYST code.

</li>
Pado's avatar
Pado committed
204
</ol>