The Evaluating Director

By C.Alexander


A report of an evaluation research study of

Accent ELT summer school at Bradfield College (2003)



Accent is a British Council-approved ‘seasonal’ summer EFL school. It runs summer residential courses at a number of centres in England mainly for Italian students.


This paper presents a Bristol University EdD evaluation of Accent school.





1.0       For whom was this evaluation report written?                            1-3

2.0       Evaluation background                                                              3

2.1       Goals                                                                                                   3-4

2.2       Who evaluates?                                                                                    4-5

2.3       When and why?                                                                                    5-6

2.4              Who and what were evaluated?                                                            6

2.5              How the evaluation was undertaken?                                                     6-7

3.0       Validity and reliability                                                                7

4.0       Data analysis                                                                                        7

4.1       General impressions summative feedback from teachers             7

4.2       The Italian and Accent questionnaire feedback.                         7

4.3       Formative anecdotal feedback during the course                                    8

5.0       Data interpretation and recommendations                                              8-9

5.1       Blind spots and other relevant problems                                                9

6.0       Conclusion                                                                                           10

7.0       References                                                                                           10-11

Six Appendices                                                                                11-19



1.         For whom was this evaluation report written? Who were the audiences?



This report clearly was written for an EdD audience as part of an EdD assignment. It was also presented to the two Accent senior staff. An attempt therefore has been made to find a balance between critiquing, problematising and engaging in the literature, and providing a lucid non over-theorised account for the ‘non-evaluation expert’ Accent senior staff. Some areas of this report e.g. evaluation background, had to be explained to the EdD audience, whilst others, such as some of the evaluation meta-language were described for the non-EdD audience. As there were two discrete audiences and a 4000-word assignment word limit, some of the text appears as appendices. 


 Permission was sought and got from Accent to undertake an evaluation of the type described in this paper, though not enough time was set aside for practical reasons to discuss the evaluation in detail with Accent senior staff. Stufflebeam (1990, 95) notes persons evaluating should be both trustworthy and disclosure should be full and frank. Harlen and Elliot (1982) with regard to questions for reviewing evaluations, hold that evaluations should be communicated effectively with the intended audience. Rea- Dickins (2003, EdD lecture) maintained that evaluation findings should be presented ethically to stakeholders and should be transparent for the intended audience(s). NB Weir and Roberts (1994, 13) define stakeholders as people who have a stake in the evaluation i.e. people who it may affect to varying degrees. Murphy and Rea-Dickins (1999, 90) however argue that ‘there are many ways of defining the concept of stakeholders and many of these refer to individuals or members of groups involved in or affected by a project or evaluation.


Even though initially the Accent staff did not enquire who else would read the evaluation report, I felt it would be unethical not to inform them that it was also a Bristol University EdD assignment. A serious weakness (i.e. error on my part) in this assignment is, I did not consider the following two factors from the beginning of the evaluation (i.e. I did not realise they may have a profound effect on the project and my relationship with Accent):


(1)   When should Accent have been informed that the evaluation was also an EdD assignment? They were informed unintentionally towards the end of the project.

(2)   Did Accent have any objections to having an evaluation report about their British Council school sent to Bristol University? Did they feel threatened? This will be discussed in section 5.1.


Weir and Roberts (1994, 213-5), appertaining to the political and personal dynamics of evaluation discuss the need for intelligible reporting to audience(s) capable of indicating remedies. Were the EdD audience stakeholders, it was they who asked me to undertake the evaluation as an EdD assignment not Accent?  Was I (or why was I) undertaking an evaluation for an audience that had no well-defined stake in the evaluation findings? Murphy and Rea-Dickins (1999, 91) state ‘that when stakeholders are defined by their working role within a programme, or by their contribution to the programme, the definition is usually unclear about whether the definition is specifically to do with their place in the project or whether this classification refers only to their association with the evaluation’ (please note Rossi and Freeman 1993, 408 and Weiss 1986, 151 provide detailed lists of stakeholders-this was noted in Murphy and Rea-Dickins 1999, 91). The EdD audience however had no clear working role in this programme nor could it make a contribution to the programme. 


Could the EdD audience be defined as ‘regulators’ (i.e. any agency which directly or indirectly regulates the project) in the Aspinwall et al. (1992, 84-85) sense (noted in Murphy and Rea-Dickins 1999, 92)? Were they in the Hopkins’ (1989) sense (noted in Murphy and Rea-Dickins 1999, 92) the ‘profession’ i.e. the evaluating community. Rossi and Freeman (1993: 409) noted in Murphy and Rea-Dickins (1999, 92) assert that ‘evaluators may be unsure whose perspective they should take in designing an evaluation’. I held, as EdD student evaluator, that the EdD audience had a stake in ensuring as educators and experts in the field of evaluation that the evaluator might experience and understand what skills were necessary for good evaluation. Yet another weakness in this research is I did not ask the Bristol University EdD audience why they had asked me to undertake this evaluation, i.e. what were they actually interested to find out? Lynch (1996, 3) states that identification of the evaluation audience leads to determining the evaluation goals and purposes. Lynch argues (ibid) that depending on the evaluation audience the answers to questions such as ‘why is the evaluation being conducted?’ and ‘what information is being requested and why?’ vary. 


Giddens (1989, 72) noted in Murphy and Rea-Dickins (1999, 94) maintain that the amount of power an individual or group is able to achieve governs how far they are able to put their wishes into practice. Murphy and Rea-Dickins (1999, 93) note that power is dependent on knowledge and that evaluation is about generating knowledge and therefore it has its own power. In this evaluation both the EdD audience and Accent held power, yet to what degree would they (or would they in fact) resist the process of the evaluation activity (see section 5.1)?


In sum, I feel mainly for time reasons, I did not clarify a number of issues with Accent and the EdD audience e.g. stakeholder perspectives, expertise (i.e. learning or the integrity of the evaluation?), control, status, implications for managing the evaluation (these issues are mentioned briefly in Murphy and Rea-Dickins 1999, 93-98).


2.0       Evaluation background


For the EdD audience readers only, please refer to appendix one for information about the evaluation background.


2.1       Goals


The reason why this evaluation took place was to marshal both formal/informal data and arguments that might enable Accent senior staff to participate in a critical debate about the Accent programme (Kemmis 1986, noted in Rea-Dickins 2003). The purpose of this evaluation was to assess whether the aims of the school were being, or could be, met using the data elicitation instruments available. I felt this was important for two reasons: (1) all Accent teachers in all the Accent centres were expected to use Accent teaching materials in such a way so as to achieve the aims of the school; (2) all Accent course directors were supposed to help their teachers to do this. 


Rea-Dickins and Germaine (1993, 55) state that there are three principal reasons to evaluate. Firstly for assessment and accountability ‘where information obtained can be used primarily for administrative purposes. The second and third reasons are, the ‘evaluation can serve a developmental function where it can be used for purposes of curriculum development and teacher self development. Even though this evaluation purpose might suggest primarily an accountability focus, the focus was also developmental’ i.e. it was hoped that the evaluation may lead to some changes in (1) the way the aims of the school were formulated; (2) the way school teachers/directors try to achieve these goals; (3) teaching materials used; (4) activities provided to students; (5) the data elicitation provided. Weir and Roberts (1994, 5) state that a summative focus/evaluation examines the effects of a programme at significant end points and is usually conducted for the benefit of an external audience. A formative evaluation watches the programme as it develops (i.e. raises awareness); formative evaluators try to ensure that the programme is implemented as effectively as possible. This evaluation was both summative and formative (this is explained in detail in appendix 3 and section 2.5). The summative research procedures used were end-of-course questionnaires and some informal anecdotal teacher feedback. The formative research procedures used were mainly director-led anecdotal feedback and student diaries.    


The stakeholders that possibly had immediate interest in the findings of this evaluation were Accent senior staff, though the ‘evaluation contracting’ EdD audience may also be interested in these findings. The evaluation could also have been broadened further to include any party potentially interested in the summative and formative findings of this evaluation e.g. the tour operator, the INPDAP, other INPDAP tour operators, or ELT British Council summer schools; a peripheral audience might comprise program administrators, curriculum developers, teachers, researchers from other research settings. In this assignment I did not attempt to assess the degree to which the evaluation was developmental, accountability or awareness raising.


2.2       Who evaluates?


‘Inside’ evaluators work within an organisation and are thought to be more biased than ‘outside’ evaluators noted in Weir and Roberts (1994, 23). It is also held (ibid) that what is required is an acceptable mix of outsider and insider perspectives and contributions. The Accent summer school director and EdD student undertook the evaluation. It was however pertinent to ask where the director’s loyalties lay i.e. would his evaluation be objective and/or totally transparent to Accent who were paying him a salary to be course director and fulfil the Accent job description (NB this job description did not include undertaking an evaluation), or would he be more interested in presenting the type of evaluation report he thought his ‘evaluating and contracting’ EdD audience might want to read? The director was employed for the duration of the course i.e. a six-week period. He was an insider and outsider. ‘Inside’ evaluators work within an organisation and are thought to be more biased than ‘outside’ evaluators. Though Weir and Roberts (1994, 23) hold that what is required is an acceptable mix of outsider and insider perspectives and contributions.


 The evaluator had years of experience as a teacher trainer, project manager and also possessed some evaluation expertise. Two important questions therefore pertained to whether the evaluator had a vested interest in showing the objectives of the school were being met for Accent and whether the evaluator might put the need to present evaluation findings of ‘supposed’ integrity and quality for the EdD audience before the need to address why the evaluation took place (discussed in 2.1). Consequently the credibility of this evaluation may be questioned NB the course director possibly might have had a stake in showing the school goals had been successfully achieved i.e. the Accent senior staff might have felt he was doing an ‘exceptional’ job as director. There were however a number of advantages in having an ‘evaluating’ director e.g. as the director post was short-term, it was thought more likely that any insider ‘blind-spots’ or ‘uncomfortable’ issues would be brought out into the open (see section 5.1). Collaboration with insider staff was ‘thought’ to be more likely as there were regular staff meetings. An external evaluator might have been more objective, though in this context it was not practicable i.e. too many external parties would have had to be approached for permission.  



2.3       When and why?



The evaluation, for practical reasons, took place during the second two-week cohort session, though the data elicited during this ‘optimal’ second fortnight may not have been representative of the whole period. For example, firstly, teachers and Italian group leaders might have been under more stress during the first two-week cohort session because time was required to get used to the residential college facilities. Secondly, because of the intensive nature of such fully residential summer EFL courses, teachers and Italian group leaders during the third two-week cohort session might have been more tired or possibly less enthusiastic. Even though it would have been interesting to assess whether such mismatch existed, analysing this was beyond the scope of the evaluation. The director-evaluator by the second cohort session, on the other hand, had had more time to familiarise himself with the setting and so was more able to undertake the evaluation effectively without the possible drawbacks associated with tiredness or initial stress.


Weir and Roberts (1994, 14) hold that evaluating a project at the start (appraisal), during its life (monitoring) and as it ends (summative) are all important parts in an evaluation. This evaluation was undertaken at the beginning, during, and end of the second two-week cohort session; it was thought that all parts in the evaluation process were important. One advantage of undertaking an evaluation at different intervals during the two-week cohort session was that the evaluation was both summative and formative. The evaluation was formative because the evaluating-director observed lessons formally and informally and teachers received immediate feedback on their lessons. Teachers were regularly reminded to consider the three goals of the school, to use school materials and to follow induction-lesson advice; formative evaluation during the project allowed staff to take necessary steps in terms of re-adjustment (in-progress). See appendix two for another example of the formative nature of this evaluation.


The evaluation was also summative because students had to fill in two end-of-course questionnaires. It was held that the summative end-of-programme questionnaires might not give much useful process information. Lynch (1996, 32) holds that formative evaluation looks at a programme while it is developing in order to make improvements i.e. the concern is for what is happening within the programme rather than on focusing exclusively on programme outcomes. It was thought that undertaking the evaluation may have helped to make the programme more responsive to the needs of the students within the ‘objectives’ framework set by the school.





2.4       Who and what were evaluated?


Italian students aged between 13 and 16 of mixed sex and socio-economic status (i.e. from the North and south of Italy), with varying experience of learning English as a foreign language were admitted to the school and allotted to classes on the basis of placement test results. There was no comparison group, nor was it possible in this context to have a control group. This project was not funded by any agency therefore there was no need to demonstrate efficiency in terms of value for money.


The scope of evaluations can vary significantly (noted in Weir and Roberts 1994, 18) and Sanders (1992, 5-6) presents a number of foci. With regard to what was evaluated in this evaluation (i.e. the Accent objectives), the question of how the Accent school objectives were to be conceived was fundamental to this evaluation. Stern (1983) noted in Lawrence (1995, 76), advises that curriculum goals and content that have been selected should be sound and educationally justifiable. To this end, the Accent principal and DOS were asked to explain how they were to be interpreted for the purposes of the evaluation. In the first objective (see appendix one), the words ‘fulfilling’ and ‘enjoyable’ were understood to mean ‘did they like/enjoy their stay in England? ’. The words ‘cultural experience’, in objective one, were thought to be a vague concept; England is, after all, a cosmopolitan country. In the second objective, ‘improve students’ spoken production of English’, denoted practising speaking (i.e. speaking and being corrected by a native speaker). In the third objective ‘consolidate’ meant ‘practise/improve’. Therefore in order to proceed, the evaluation goals had to be reformulated/clarified. The reformulated goals were:


1.   to provide students with an enjoyable stay/course in England (Britain).

2.   to practise/improve students’ spoken production of English

3.   to practise/improve grammar and functions (particularly excursion-related functions) and learn new vocabulary.


These reformulated objectives were discussed in detail with Accent teachers before they started teaching and regularly during the course. The evaluation therefore led to a reformulation/clarification of the initial objectives.


2.5       How the evaluation was undertaken


The evaluation comprised two parts: first, a formative part and second, a summative accountability-orientated section (i.e. the student end-of-course questionnaires and teachers’ general impressions feedback). With regard to objectives 2 and 3, one way of possibly determining whether there had been any improvement could have been to have an achievement test. It however was not possible to have an end-of-course achievement-based test, as there was no fixed syllabus for the two-week period on which to base such a test. It was also felt that the placement test was not a reliable way of initially measuring student competence as: (1) there were several ambiguous items in the test; (2) no cut-off points were given; (3) the test rubrics were not written in Italian. The risks however of misplacement were compensated for by using first-impressions’ oral test sheets i.e. teachers determined whether there had been any disparities between students’ test scores and oral abilities, subsequent changes, if necessary, were made to student groups. It was not possible to issue the same placement test at the end of the course to assess whether there had been any improvement; this might in turn have suggested that the Accent course language practice had helped the students to improve. Even if the same placement test had been used as an end-of-course test, only a very tentative connection at best could have been made about possible language improvement. Some literature suggests, with regard to test-retest reliability, that some students may do better second time when they are accustomed to the test method, or worse when they are suffering from exhaustion or irritation (e.g. Alderson et al. 2001, 294). Harrison (1993, 26) also notes that with regard to placement tests, students may be more nervous when they start the course, this might affect test results. Devising a test based on a fixed syllabus was also impracticable, because having a formative element in the evaluation that was responsive to student needs presupposed changes would be made to any syllabus.


Teachers were asked at the end of the course to comment on (summatively/give a general impression of): (1) students’ speaking/pronunciation (i.e. had it improved?); (2) grammar and functional language practice (i.e. what had they noticed? Were students more fluent/accurate?); (3) whether students’ vocabulary had expanded. The two evaluation stages (i.e. formative and summative) with a full explanation of the flow diagram are presented in appendix 3. The Jacobs’ (2000, 261-279) flow diagram, which pertains to the evaluation of educational innovation, incorporates formative, summative and also illuminative goals within the context and policy framework of its operation, differs from mine. It focuses on the goals of respective audiences and best methods of meeting their disparate needs. Even though my flow diagram introduces some minor innovations, in the Jacobs’ (ibid) there is dialogue with stakeholders and the evaluator regularly revisits goals. The Accent teachers were not aware i.e. during the two-week evaluation, that I was undertaking an evaluation and so could not be considered stakeholders. Also the ‘contracting’ EdD-audience stakeholder status was unclear (these issues are also discussed in section 5.1). Accent senior staff were not aware of the complexity of the evaluation I was undertaking, and had little time during the ‘frantic’ summer period to discuss it in any detail. 


3.0       Validity and reliability


Mainly for the EdD audience readers, please refer to appendix four for information about validity and reliability.


4.0       Data analysis


4.1       General impressions summative feedback from teachers


All the teachers felt that there had been improvement regarding reformulated objectives 2 and 3. It was thought that students’ spoken English had improved and that students were more confident. With regard to grammar/functional English, teachers held that fluency and accuracy had improved. The teachers also stated that student vocabulary had expanded as a result of the course. Teachers however did not provide any concrete evidence about how they thought students’ vocabulary had improved and there was very little time in this context to assess this. This suggested (very tentatively) that all Accent goals had been achieved, though the degree to which they had been achieved could not be established.



4.2       The Italian and Accent questionnaire feedback.


Please see appendix 5


4.3       Formative anecdotal feedback during the course.


Feedback regarding course satisfaction elicited by the director from students and group leaders on the whole was very positive. Though some feedback regarding some of the excursions (e.g. Brasenose college, the rural life museum in Reading) possibly led to some changes in pre/post-excursion lesson plans and excursion itineraries (see point 2 below). The students and Italian team leaders made no complaints about the quality of teaching. Teachers used general feedback from students and student diaries and some feedback was thought to have led to a number of changes (NB examples of such changes are not described in this paper).


5.0       Data interpretation and recommendations


With regard to the summative/accountability aspect of the evaluation, there was evidence that directly and indirectly suggested (i.e. the questionnaire did refer directly to objective 1 but indirectly to objectives 2 and 3) the school objectives had been achieved. However, it is stressed that the findings are tentative because of the inherent weaknesses of the data elicitation instruments (discussed in appendices 3-4). Both students and teachers were in agreement that their English had improved. Overall course satisfaction (aim one) was high (i.e. average 4.1). However, it was not possible to assess the degree to which improvement had taken place and/or whether improvement had really taken place. 


The following findings are more awareness-raising/developmental (some additional findings are presented in appendix 6):


(1)        The first day was stressful for many students; it is recommended that students are introduced to each other, in class (icebreaker lessons) and out-of-class as soon as they arrive. This information was elicited from the Accent questionnaire and could have ramifications for objective one.


(2)        With regard to the ‘negotiated’ changes brought about by the formative evaluation (discussed in 2.3 and appendix 2), the fact that there was very little resistance to changing lesson plans from teachers did not mean ‘change’ was taking place in the classroom. Therefore more observation is required to assess whether teachers really do what they say they are doing. A formative evaluation of the type described in this paper might be more effective if teachers had some contractual one-to-one (i.e. director-teacher) time per week devoted to discussing student feedback and lesson alterations. Even though a lot of the change literature suggests that change is a long term process (e.g. Palmer 1993, 166, Irmisher 1992, 2, Fullan 1985 in Guskey 1989, 446) I hold that even over a short period of time teachers could become more aware and critical of their perceptions and beliefs, more responsive to student and director-led feedback, and also more aware of the need to implement and assess change. This finding is developmental/awareness raising in nature and has ramifications for all three school objectives, because improving the quality of the formative process may make the course more responsive to student needs.


5.1       Blind spots and other relevant problems


Even though most of the findings in section 5.1 did not come about during the evaluation itself (i.e. during the second two-week cohort), I have included them because I feel they are pertinent to both evaluation audiences. It was thought that undertaking the evaluation would not detract the director from his normal duties and possibly would even enable the director to focus more carefully on his contractual duties. Even though the summative and formative feedback was positive, six of teachers in the Accent end-of-course teachers’ satisfaction questionnaires complained that they had not received enough practical help from the director with regard to preparing practical lesson materials. Five of these teachers were newly qualified and three of them did not have a first degree. I chose not to tell the teachers about my evaluation because: (1) they themselves were not the object of the evaluation (NB the aims of the evaluation were discussed in section 2.1 and what was evaluated was discussed in section 2.4); (2) I did not want to overly worry the newly qualified teachers; (3) no time had been allotted by Accent to do this. It was made known in week 6 (to the teachers) that I had been undertaking an EdD evaluation assignment. After the course had been completed I received an email from one of the teachers that had not complained mentioning that because I had been working on/undertaking an EdD assignment, many of the other teachers 'felt' I had not been fulfilling my responsibilities as Course Director properly. The director also had to do a lot of non-contractual work for the Italians who simply could not get much done because only four or five of the 25 Italian team leaders could speak English to any degree of proficiency (i.e. it is held that this was the thing that actually detracted the course director from his contractual duties). It is therefore felt that in future, Accent should not employ so many newly qualified teachers who require so much basic practical assistance (i.e. they should be experienced enough to be able to plan lessons, choose textbooks themselves and above all, be more open to quality-control  evaluations of the type described in this paper). Also a full teacher training session/workshop into the aims of such an evaluation should be introduced so teachers can understand and appreciate the possible advantages of such an evaluation.


 Another problem that developed particularly in weeks five and six and led at times to serious discipline problems was conflict between some of the teachers and director.

Where ‘such’ temporary TESOL teachers come from, what emotional baggage they bring with them, and whether they are able to adjust to the extremely stressful and tiring environment of such ‘seemingly idyllic’ summer ELT residential courses needs addressing. It is suggested that aptitude tests be introduced for all Accent staff.


It was also felt that Accent did not appreciate or feel happy about the fact that the evaluation was an EdD assignment, i.e. it may have felt that working on the assignment detracted their employee director from his ‘contractual’ duties, or may have found this evaluation, as a British Council approved summer school, ‘threatening’. All the questionnaires were the property of Accent. Permission has been sought from Accent (though, as yet, has not been granted) for a copy of the student end-of-course questionnaire. The diaries were the property of the students; the permission of the student, Italian tour operator director, Accent senior staff, and the Italian government INPDAP inspector would be required in order to submit them as part of an EdD assignment. I therefore did not pursue this matter formally and no diary entries have been submitted in this paper.


 I would also like to suggest that in light of what happened to me at Bradfield College, an option should be given to EdD students in future to critique other people’s evaluations rather than undertake their own. Also, it is recommended that more discussion with, and guidance from the ‘contracting’ EdD audience be given to those students that decide to undertake an evaluation in their place of work.         


6.0       Conclusion


In this evaluation I have attempted to disclose/communicate all pertinent evaluation findings openly and honestly drawing attention to study limitations. I have noted that there may be a conflict of interest between being an evaluator, course director and EdD student. The degree to which the formative evaluation led (or whether it in fact led) to the ‘apparent’ positive summative findings is not clear, and further research may provide some useful insights into whether such a formative evaluation process is, or can be effective. With regard to both evaluation audiences, it is hoped that this report will provoke a positive reaction.





7.0       References



Alderson, C.J., C. Clapham and D, Wall (2001) Language test construction and

evaluation Cambridge: Cambridge University Press


Aspinwall, K., Simkins, T., Wilkinson, J., and McAuley, M. (1992). Managing

evaluation in education in the 1990’s (Vol 27). Singapore: RELC


Fullan, M. (1985) Change processes and strategies at the local level. Elementary

School Journal, 85, 391-421


Giddens (1989) details not given in EdD handout



Guskey, Thomas. (1989) ‘Attitude and Perceptual Change in Teachers’ International

                        Journal of Education 13:7, 439-453


Harlen, W., and Elliot, J. (1982). A checklist for planning or reviewing an evaluation.

In McCormick, r. Et al. 1982 (eds): 296-304


Harrison (1993). Language Testing Handbook’. London: ELTS


Hopkins’ (1989). Evaluation for School Development. Milton Keynes: Open

University Press.


Irmsher, Karen (1997) Educational Reform and Students at Risk (on-line). Available:



Jacobs, C. (2000) The evaluation of Education innovation. Evaluation, 6 (3), 261-280.


Kemmis, S. (1986). Seven principles for programme evaluation in curriculum

development and innovation. In E.R House (Ed), New Directions in Educational Evaluation: Falmer Press


Lawrence, L. (1995). Using evaluation to improve teacher education programmes. In

a. L. Rea-Dickins, P., A. F (Ed.), Evaluation for Development in English Language Teaching :Modern English Publications / British Council.


Lynch, B. (1996). Evaluation Program Evaluation. Theory and Practice. Cambridge:



Murphy, G. F., and Rea-Dickins, P. (1999) Identifying Stakeholders. In V. McKay &

C. Treffgarne Evaluationg Impact.  Serial Number 35 Department for International Development pp89-98


Palmer, Christopher (1993) ‘Innovation and the experienced teacher’ English

Language Teaching Journal  47:2, 166-171


Rea-Dickins, Pauline., and Germaine, Kevin. (1993). Evaluation. Oxford: OUP


Rea-Dickins, P., and Germaine, Kevin. (1998). Managing Evaluation and Innovation

in Language Teaching: Building Bridges. London: Longman


Rea Dickins (June 2003) 'Evaluation of innovation and programmes in English

language teaching' , Bristol University EdD lecture notes)


Rossi and Freeman 1993 –details not given in EdD handout


Sanders, J. R. (1992). Evaluating School Programs: an Educator’s Guide. Newbury

Part, California: Corwin Press.


Stern, H. H. (1983) Fundamentals in Language Teaching. Oxford: OUP


Stufflebeam, D, L. (1990). Professional Standards for Educational Evaluation. In

Walberg, H. J. and Haertel, G. D. (eds) 1990: 94-106


Weir, Cyril., and Roberts, Jon. (1994). Evaluation in ELT. Oxford: Blackwell


Weiss 1986 –details not given in EdD handout.



Appendix One




Three consecutive, approximately 200-strong, two-week cohorts of Italian students aged between 13-16 attended full residential summer courses run at Bradfield College. The students were accompanied by Italian group leaders; the student study holidays were subsidised by the Italian government (INPDAP- Istituto Nazionale Prudenza Dipendenti Administrazione Pubblica). INPDAP takes part of the pay-related social security fund to provide study holidays for dependants of state employees; 1000’s of Italian students come to England each year through INPDAP-approved tour operators. Accent was responsible for the teaching and partly responsible for the excursion programme. The Accent course director (Chris Alexander) was responsible for the programme administration/teacher training and for creating a relationship of co-operation with the Italian Co-ordinator and the on-site INPDAP inspector; the Italian co-ordinator was the on-site representative of the Accent client (i.e. the Italian tour operator). The language of communication for the sports and evening activities was Italian; these activities were organised by the Italian group leaders.    


Students received intensive three-hour lessons most weekdays during the two-week study holiday; they also went on several excursions and took part in ‘Italian-group-leader-organised’ evening activities. The nine Accent EFL teachers were fully qualified with varying teaching experience and Accent adhered strictly to documentation requirements stipulated by the British Council. The three-hour lessons were sub-divided into four, forty-five minute period lessons. The first two-period lessons were pre-and post-excursion related (NB students were told to keep diaries of the places they had visited and the things they had liked while on excursions); the third and fourth period lessons were general English classes based on Accent teaching materials. Teachers initially received an induction course and teaching materials; the induction course comprised a discussion on suitable teaching techniques, professionalism, rapport, what constituted positive and negative attitudes, and general advice. Teachers also attended regular director-led staff-cum-teacher training meetings during the course. Accent school had three main course objectives:


  1. to provide students with a fulfilling and enjoyable cultural experience


2.   to improve students’ spoken production of English through work on phonology (i.e. pronunciation, intonation, stress)


  1. to consolidate students’ existing knowledge of grammar and functions, and also expand student vocabulary.






Appendix two- an example of how the evaluation was formative.





Another formative element in the evaluation concerned the regular informal feedback sought by the director from students, teachers and the Italian group leaders during the two-week period. This informal data was used for ongoing lesson-plan and possible excursion modification. For example, if students complained that excursions were boring, which might have led, or partly led, to possible negative attitudes towards Britain or to the course in general, teachers were encouraged to devise different pre- and post-excursion tasks and some changes were made to the excursion itineraries. If Italian group leaders or teachers mentioned that students felt unhappy about aspects of the taught course, alternative lesson plans were first discussed with all the teachers and subsequently changes were made and implemented into the programme. These changes were then group assessed or discussed individually for effectiveness in subsequent teacher meetings. In this way change possibly came about during and as a result of the evaluation. For example, an Accent teacher noted that her pupils said they felt unhappy/de-motivated learning so much about British culture and as a result felt a little homesick (this data was elicited informally from students during and after lessons). As a consequence, lesson plans were given a different slant i.e. teachers also discussed the role Italy had played in shaping British history and modern British.




























Appendix three



















































Once the objectives had been reformulated, lessons were planned, within the school’s objectives’ framework, basing them on the Accent course materials. These lessons were quality controlled for three reasons: (1) it was necessary to assess whether teachers were doing what they said they had been doing. Lessons were observed (using British-Council observation sheets) and compared to daily and weekly lesson plans. There were also informal ‘drop-in’ observations; (2) teachers were interviewed informally to discuss any methodological/classroom management-related problems; (3) staff meetings/teacher training sessions were held regularly to maintain motivation levels, encourage staff input, and discuss teaching issues.


The director sought informal formative feedback from teachers, Italian leaders and partly from students regarding the following:


(1)   how students felt about being in Britain (i.e. objective one);

(2)   whether students had any problems/worries concerning grammar, functional English, vocabulary, phonology i.e. objectives 2-3;

(3)   how students felt about the College facilities in general e.g. food, accommodation etc.


Student diaries were summaries of excursions and places students had enjoyed visiting. Even though diary entries on average were not longer than a short paragraph, the diaries were a valuable source of qualitative data for teachers regarding student satisfaction. This data was also compared to informal group leader feedback regarding excursions i.e. if students wrote in their diaries that they had enjoyed, or not enjoyed, an excursion, this information was compared to what group leaders had heard from students. It was not possible within this context to introduce more reliable research instruments such as structured interviews with students or detailed questionnaires. This formative data was used as the basis for making changes/adjustments, where necessary, to lesson plans and excursion itineraries. This part of the evaluation was possibly responsive to the ‘apparent’ real-needs of the students; it also attempted to address needs/concerns as they arose. Teachers were also asked at the end of the course to comment on (summatively/give a general impression of): (1) students’ speaking/pronunciation (i.e. had it improved?); (2) grammar and functional language practice (i.e. what had they noticed? Were students more fluent/accurate?); (3) whether students’ vocabulary had expanded.


The end-of-course questionnaires were used to assess summatively student/customer satisfaction levels; there were two questionnaires, an Accent questionnaire (in English), and the Italian group-leader end-of-course questionnaire (in Italian). Neither the Accent nor the Italian questionnaire elicited data regarding whether students felt they had expanded their vocabulary, practised grammar, functional English and speaking. The questions were more general. The Italian questionnaire used a ranked scale from 1 (min) –4 (max) for a number of programme-related questions. Two close-ended questions pertained to the teaching programme i.e. (1) ‘course content (rate from 1-4) and (2) ‘ rate your teacher from 1-4’ NB there was also an ‘open-ended’ opportunity to comment on aspects of the teaching programme. The Accent questionnaire asked students several programme-related questions; one question related to the teaching programme. Students were asked to choose one of four responses i.e. from the programme was:(1) ‘L Not very useful. I feel my English has not improved during the course’; (2) ‘K OK, I feel that my English has improved a little’; (3) ‘J Useful, I have made progress with my English and feel more confident speaking the language’; (4)JJ Very useful. My English has improved a lot during the course’. 



Appendix four: Validity and reliability



It was seen as important to explain what was going to be measured and ensure that data collection procedures provided data necessary for this purpose. However, a fundamental weakness of the evaluation design described above was the anecdotal/informal nature of the formative data elicited from students by teachers, group leaders and the director. Would students really say what they thought? Would students say the same things to different people? Would students of this age group be a reliable source of data? Would the place in which the students filled the questionnaires affect the reliability of the data? (i.e. the Accent questionnaires were given to students during class time, whereas the Italian group leader questionnaires were given to students on their buses during excursions). In this context, it was not possible to research the above questions.


The 1-12 supervision ratio of Italian group leaders to students meant the group leaders had a lot of contact with the students. Italian group leaders were seen as a good source of anecdotal/informal data, as it was thought that students were more likely to mention lesson-related issues they were unhappy about to them, rather then to their teachers. The director regularly asked Accent teachers/group leaders informally what kind of feedback they had been getting from students. Even though student diaries provided teachers with some qualitative feedback regarding excursions, there was no way of being sure, within this context, whether students had expressed what they really thought. With regard to the end-of-course (summative) teacher general impressions feedback appertaining to the school objectives (discussed in appendix 3), this data might have been unreliable, because negative feedback regarding student improvement might have reflected badly on the teachers themselves.


The Accent end-of-course questionnaire did not elicit data regarding school objectives 2 and 3 i.e. students were only asked whether they thought they had improved or not. This was felt to be a subjective question, after all, feeling you have improved does not necessarily mean you have improved (vice versa). The Accent questionnaire used happy and sad faces (LJ) instead of a ranked scale i.e. this may have been easier to interpret. The Accent questionnaire vocabulary however was quite advanced and some of it had to be pre-taught. A possible weakness in the Accent questionnaire was it was not anonymous i.e. would student feedback (especially feedback regarding the teaching programme) really be reliable? The Italian group-leader questionnaire was in Italian, it was partly anonymous (i.e. students had to write in a contact address but no name) and it used a ranked scale (NB it is possible that the ranked scale may have been misunderstood by some of the students). 


All the non-lesson activities were conducted in Italian in a closed ‘Italian-leader’ supervised environment. Students therefore had practically no other contact with native speakers of English (NB students did not have TV’s or radios in their rooms, though there were some opportunities to use English when the students went shopping during excursions). It was therefore likely that any improvement in English during this two-week period might have mainly been due to the Accent programme. The sample size comprised the entire cohort. There was no time in this setting to carry out formal structured interviews with staff/Italian group leaders (NB very few of the Italian group leaders could communicate in fluently English).



Appendix 5- data analyses



Accent questionnaire findings



179 completed questionnaires were received and analysed. 166 students responded to question one i.e. how they felt about the usefulness of their lessons (in appendix 3). There were four possible responses, response one (i.e. not very useful) to response four (very useful, my English had improved a lot). The average response was 3.3 i.e. between ‘useful’ and ‘very useful’. With regard to other feedback on this questionnaire e.g. question 8—‘ when you take everything into consideration, the college, the course, the excursion, the staff, the social programme—did you enjoy yourself: rate 1(not at all) –5 (enormously) ’, the average for the 170 student responses was 4.1. Question 4 pertained to whether students had enjoyed their excursions (i.e. rated from 1 ‘not enjoyable’ to 4 ‘excellent’). The average for the 178 student responses was 3.48. 24% of the students however stated that the Oxford excursion was boring; 12% of the students noted the Museum of Archaeology in Reading was the worst part of their visit. 15% of the students disliked visiting Winchester. 83% of the students enjoyed visiting London the best. All the students were asked to rate the food from 1 (poor) - 4 (excellent); the average for 179 students was 1.75 i.e. between poor and ok. Approximately 15% of students stated that the arrival day was very stressful because they did not know anyone.



Italian questionnaire findings



All 181 questionnaires were made available for analysis by the Italian course co-ordinator. 181 students with regard to course content i.e. the course was ranked from 1 (min) to 4 (max) responded: the average was 3.12. Accent teachers were also rated from 1 to 4; all 181 students responded to this question, the average for all the students was 3.5. Excursions were also ranked from 1 to 4; all 181 students responded to this question, the average for all the students was 3.27. 160 students rated the food between 1 and 2 (i.e. out of 4). None of the students wrote anything for the open-ended question. There were no significant disparities between the questionnaires i.e. things that were ranked high or low on the Italian questionnaire were similarly ranked on the Accent questionnaire.




Appendix 6 additional recommendations based on overall course observations



Most of the findings below are developmental in nature



(1)   Even though the welfare of the students was of primary importance to the Italian tour operator, it is argued that students might use more English in a part/full English-speaking environment for all non-lesson activities. It is therefore suggested that group leaders speak English during such activities or some native speakers take part in sports/excursion-related activities. This finding came about as a result of analysing validity and reliability (NB see appendix four)


(2)   It is recommended that the placement test be changed, as there were several ambiguous items (e.g. Nelson Tests). Cut-off points should be given, and if possible the test rubrics should be written in Italian and English. Appertaining to misplacement, it is recommended that group sizes be smaller (e.g. max 12), thus making it easier to move students to different groups. This issue was discusses in section 2.5.


(3)   As most students complained in their questionnaires about the type of food that was served in the College canteen (NB this could affect overall course satisfaction), it might be worth finding out what Italian students would prefer eating. The College was willing make some changes to its canteen menu and was keen to find out what the students wanted to eat. This finding was discussed in appendix four i.e. the Accent and Italian questionnaires, it may indirectly appertain to Accent objective one.


(4)   The Accent questionnaire rubrics could be accompanied with an Italian translation for lower level learners, and the questionnaire should be anonymous. It is also suggested that some questions should refer to Accent aims 2 and 3 e.g. ‘do you think your speaking, grammar, or vocabulary has improved?’. In this way more reliable data might be elicited regarding school objectives. This finding was discussed in Appendix four (i.e. validity and Reliability).


(5)   Some excursions were not seen as ‘enjoyable’: (a) the Brasenose College/Oxford excursion may need re-planning i.e. possibly a full day in Oxford with more sightseeing; (b) the Winchester Cathedral and Reading Museums’ excursions (especially the Museum of Greek Archaeology) might be made more relevant student interest. Also, it might be worth considering other locations e.g. Brighton, Cambridge, Windsor, Cotswold Wildlife Park, or half-day London excursions. It is also recommended that a more flexible coach-company hiring system exist that can guarantee ‘more’ air-conditioned coaches in hot weather. These observations were elicited from both questionnaires and are pertinent mainly to Accent school objective one.



Total word count (7545)