Validity and Reliability of the Two-tier Diagnostic Test to Identify Students’ Alternative Conceptions of Intermolecular Forces

: Alternative conceptions or misconceptions is believed as the main barrier for most students who learn chemistry. Students who hold alternative conceptions will struggle to understand the advanced concepts. Therefore, identifying those alternative conceptions is important. In this study a diagnostic test was designed and developed to identify students’ alternative conceptions about intermolecular forces (IMFs). The diagnostic test developed in this study consist of 19 multiple choice questions with open reason. The test was administered to 88 university students who learn chemistry. Data collected were analyzed using SPSS. The validity and the reliability of the test were 0.526 and 0.878 respectively. The result indicates that the test is valid and reliable. The mean difficulty value and discriminatory index are 0.623 and 0.45 respectively. The findings of the item analysis showed that the test is in moderate difficulty and the discriminatory of the test is in good category. Meanwhile, the average distractor effectivity for the diagnostic test is 3 options (out of 4 options), which means the distractors are in good category and well function. Those findings indicate that the two-tier diagnostics test could be used to identify students’ alternative conceptions and understanding about IMFs.


Introduction
Alternative conceptions or well known as misconceptions is believed as the main issue of teaching and learning chemistry. Alternative conception is known by other terms such as misconceptions, alternate conceptions, alternative frameworks, and pre conceptions, but in this study the author prefer the term alternative conceptions to address students' understanding. According to Cho et al. (1985), alternative conception is "any conceptual idea whose meaning deviates from the one commonly accepted by scientific consensus". Due to the nature of the subject, chemistry relies on a number of models that rely on abstract concepts (e.g. models of atomic and molecular structure). These models are fundamental in understanding the more complex topics that students encounter in the later stages of their education. A means of identifying and resolving misconceptions in fundamental topics like these is an important consideration when designing a chemistry curriculum. In addition to the difficulty of interpreting representations, Keig & Rubba (1993) found that that majority of students were struggling to translate among formulae, electron configurations, and ball-and-stick models. Nakhleh (1992) proposing a potential answer to that which is many students are not constructing decent comprehension of fundamental chemical concepts from the beginning of their studies, thus they cannot fully understand the more advanced concepts that build upon the basic concepts. This is also supported by the work of 4376 Taber (2009). A number of studies have shown that misconception occurs in the teaching of many fundamental topics in chemistry (Nicoll, 2001;Taber, 2003). In addition to that, Taber (2009) suggested students' prior educational experience is the primary contributory factor to misconceptions.
Understanding the physical basis, the principles required to describe intermolecular forces and consequences of Intermolecular Forces (IMFs) is an essential element in core chemistry education (Cooper et al., 2015;Kind, 2004;Tarhan et al., 2008). An understanding of intermolecular forces helps students predict a number of physical properties of substances, such as relative boiling points, changes in states of matter Schmidt et al. (2009) and the ability to predict whether a given solute will be soluble in a particular type of solvent. However, many students find it difficult to understand the concepts (Birk & Kurtz, 1999;Ma'rufah et al., 2022;Tan & Chan, 2003;Widarti et al., 2019). Therefore, students develop a wide range of alternative conceptions (Coll & Taylor, 2001). Tan & Chan (2003) found that some students at grade 5 and 6 (age 16-17 years and 17-18 years) had difficulty in understanding the nature of hydrogen bonding and dipole-dipole interactions. Tan & Chan (2003) found that some students find it hard to describe the intermolecular forces involved within the molecules. Students also found it hard to differentiate between intra and intermolecular forces (Vladušić et al., 2016).
Identify students' alternative conception is crucial to improve the teaching and learning process (Sadler & Sonnert, 2016). When students have misconceptions, they may have difficulty understanding new information or applying what they have learned to solve problems. Addressing students' alternative conception will help teachers create more effective learning experiences in developing accurate and complete understanding of the concepts.
In this study, the diagnostics test developed is a two-tier multiple-choice test. The multiple-choice questions on the two-tier test are divided into items with two sub-questions (tiers) (Ivanjek et al., 2021). Moreover, (Ivanjek et al., 2021) mentioned that two-tier instruments have the benefits of being simple to use and of giving insight into students' thought processes.
The first tier of the diagnostics test is the multiple choices test, while the second tier is the open-ended question to state their reason about the answer in the first tier. Thus, students were asked to choose the answer in the first tier and mention the reason in the second tier. By doing this, students' alternative conceptions will reveal (Treagust, 1988) as well as determined whether the misconceptions that students held relate to their previous alternative conceptions (Loh et al., 2014;Mann & Treagust, 1998;Uyulgan et al., 2014). The explanations provided by students are crucial for teaching scientific concepts. In two-tier assessments, they have the chance to choose an answer and its justification, and teachers also discover the causes of the students' misconceptions (Cengiz, 2009). In this article we present the development of the two-tier diagnostics test of Intermolecular Forces and assessing the quality of the test in term of item analysis, consist of validity, reliability, item difficulty, discriminatory index, and distractor information.

Participants
There were 88 students of 3 different cohort of Chemistry Education Study Program of Department of Science and Mathematics Education, Universitas Tanjungpura who voluntarily participated in this study. The number of participants for each cohort can be seen on Table 1.

Methods and Procedures
This study employed a quantitative method in order to answer the research questions. All data collected were analyzed quantitively using IBM SPSS 27 to measure the quality of the diagnostic test developed. The procedures of the developing the diagnostics test were adapted to procedures proposed by (Peterson, Treagust, & Garnett, 1989) with some modifications.  proposing 3 steps, which are, initial testing, paper and pencil test, and test development validation. In this study, researcher proposing another two steps, therefore the procedures are consisting of: 1) Reviewing the concepts, in this step reviewing the chemistry curriculum and relevant studies on the topic of intermolecular forces to extract the important 4377 concepts that will be asked. This process is the initial process in preparing the initial test. 2) Initial testing, this stage starts by administering a set of questions based on the first stage. The questions could be some open questions, or a multiple choice with open reason, to gather students' initial understanding. 3) Developing the prototype of the diagnostics test, based on students' answer in the previous stage the prototype of questions was design and developed to address all the possible alternative conceptions found in the second stage. 4) Validating the test, the prototype of the resulting diagnostics test will then be content validated by experts and empirically by involving participants to analyze the quality in terms of validity, reliability, difficulty index, discriminatory index and the effectiveness of the distractor. 5) The finalization of diagnostics test, in this step, the diagnostic test instrument will be revised based on the results of the empirical validity in the previous step, so that the final product is obtained, namely a twotier test instrument that is feasible to use to identify students' alternative conceptions.

Development Process of the Diagnostics Test Validity
Validating the diagnostics test developed in this study consist of two types, the first is content validity by the experts (the experienced lecturer who taught chemistry), the second is the empirical validity by administering the test to students who already learn the topic. Both validities are aimed to measure the quality of the diagnostics test. The result of the content validity then analyzed using experts judgement score suggested by (Gregory, 2015). The score ranges from 0 to 1 (see table 3). It is done by making contingency tables on two experts, with the first category that is not relevant and less relevant become the weak relevancy category, and the second category which is for quite relevant and very relevant that is created in a new strong relevant category. The experts' judgement score for content validity is a comparison of the number of items of the two experts with strong relevance category of overall items. The experts judgement for content validity is a comparison of the numbers of items from two experts as validators with strong relevance to the overall items category (Gregory, 2015). While the results of the relevancy tabulation (contingency tables) are presented in Table 2, the validity coefficient is presented in Formula 1.  Meanwhile, in the empirical validity, students' scores were analyzed using SPSS. The R-value of Pearson Correlation obtained from the SPSS analysis compared to R-table to conclude whether each item is valid or not.

Reliability
Internal consistency reliability method was used in the reliability study of the diagnostic test. In this method, each included in the test is analyzed after the diagnostic test is administered. After the analysis, Cronbach Alpha coefficient, which indicates to what extent the items consistent to each other is calculated. If the coefficient is higher than 0.60 the test is reliable (Ghozali, 2016). The item analysis is measured using IBM SPSS 27. In the item analysis difficulty index, discrimination index, and distractor effectiveness were analysis separately in addition to the reliability of the diagnostics test.
The category of difficulty index presented in table 4. Meanwhile, category of discrimination index and distractor effectiveness presented in table 5. The analysis was done using Microsoft Excel. The findings then compared to those categories in table 4 and 5.  (Arifin, 2016;Sudijono, 2020)

Result and Discussion
The quality of the diagnostic test Based on the review of chemistry curriculum and some relevant published papers, the concepts of IMFs for the diagnostic test consist of Intermolecular forces, London forces, Dipole-dipole forces, and Hydrogen bond. Nine indicators of the questions were included based on those concepts. Detail of concepts covered in the diagnostic test along with the indicators are presented in Table 6.  , 7, 14* 8, 9, 18, 19 Note: item no 14 is not valid

The validity Content validity
Based on the analysis of expert judgements, the value of the content validity is 1 which is categorized as very good (see table 3). The result indicates that the content validity of the diagnostic test is high. Therefore, the test could be the continue for further analysis, which are empirical validity and reliability.

Empirical validity
The validity of the diagnostics test was calculated using IBM SPSS 27. The result of the average R-value is 0.526 while the R-table is 0.209, it means that the test categorized as valid. Detail of R-value of each item can be seen on table 7. Among those items, item no 14 has the lowest R value which is -0.080 which below the Rtable, which mean the item is not valid. Due to item No. 14's lack of distinctiveness and usability, it was eliminated from the test. Thus, the item no 14 was removed from the diagnostic test. Detail of the empirical validity is tabulated in Table 7.

The reliability
Based on data analysis gained from SPSS, the coefficient of Cronbach Alpha is 0.878 (see table 8). According to (Ghozali, 2016) the coefficient of Cronbach Alpha > 0.60 is considered as very reliable. The result indicates that the diagnostic test is providing internal consistency (Eisingerich & Rubera, 2010). In addition to that, this result means that the diagnostic test could be used to measure students' understanding and alternative conceptions about IMFs. Based on the item analysis of students' score of the diagnostic test, the result of the difficulty and discriminatory index is tabulated in table 9. As can be seen on table 9, the range of difficulty index is 0.39 -0.82 between easy to moderate (see table 4). The percentage of the easy and moderate of the items are 21% and 79% (see Figure 1). Meanwhile, the range of the discriminatory index is 0.14 -0.68, categorized as less, satisfactory, and good. Details of each category is presented in Figure 2. The average of difficulty and discriminatory index are 0.623 and 0.45 respectively. The findings of both indexes illustrated that the test is in 4379 moderate difficulty and the discriminatory of the test is in good category.  The last analysis is the distractor effectiveness. The percentage of each option for every category are 42% (excellent), 16% (good), 37% (acceptable) and 5% (remediable) (see figure 3). The "excellent" category means that 4 options (out of 4) in the item are well function, "good" means 3 options (out of 4) are well function, "acceptable" means only 2 options are well function, and "remediable" means that only 1 option is well function. Even though, there is 1 (out of 19) item in the acceptable category, the average of well function options for the diagnostics test is 3 options for the rest of the items (18 out of 19), which means the distractors are in good category. Number of the items for each category is presented in Table 10. These findings implies that the diagnostics test is feasible to use to identify students' understanding and alternative conceptions of IMFs.

Conclusion
The feasibility of the diagnostic test is measured by item analysis, consist of validity, reliability, difficulty index, discriminatory index, and distractor effectiveness. The validity and the reliability of the test were 0.526 and 0.878 respectively. The result implies that the test is valid and reliable. Then, the results of the mean difficulty value and discriminatory index with the value of 0.623 and 0.45 respectively. The findings of the item analysis showed that the test is moderate difficulty and the discriminatory of the test is in good category. the average distractor effectivity for the diagnostic test is 3 out of 4 options which means the distractors are in good category and well function. Thus, it can be concluded that the two-tier diagnostic test can be used to identify students' alternative conceptions and understanding of Intermolecular Forces.

Acknowledgments
We would like to thank to Universitas Tanjungpura for funding the research and for all students who voluntarily participated in this study.

Author Contributions
The main author, contributed to designing research, conducting research, and writing research articles. The second author, played a role in guiding the research to writing articles. The third author, played a role in assisting in the implementation of the research and preparing the research instruments used in data collection. The fourth author, assisted in the data collection process. All authors have read and agree to the published version of the manuscript.