banner



How To Find The Hypothesis In A Journal Article

  • Research article
  • Open Access
  • Published:

Identification of research hypotheses and new knowledge from scientific literature

  • 7565 Accesses

  • 38 Citations

  • vii Altmetric

  • Metrics details

Abstract

Groundwork

Text mining (TM) methods accept been used extensively to extract relations and events from the literature. In addition, TM techniques accept been used to extract various types or dimensions of interpretative data, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods accept focussed on the extraction of private dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to excerpt new MK dimensions that encode Inquiry Hypotheses (an author'south intended cognition proceeds) and New Knowledge (an author'due south findings). The method incorporates various features, including a combination of unproblematic MK dimensions.

Methods

Nosotros identify previously explored dimensions and and so utilize a random woods to combine these with linguistic features into a classification model. To facilitate evaluation of the model, nosotros have enriched two existing corpora annotated with relations and events, i.eastward., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or result corresponds to Inquiry Hypothesis or New Cognition. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated.

Results

We prove that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to ameliorate upon the previously reported state of the fine art operation for an existing dimension, i.due east., Knowledge Type. Secondly, nosotros too demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, Eu-ADR 0.802) and New Knowledge (GENIA: 0.829, European union-ADR 0.836).

Conclusion

Nosotros accept presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

Peer Review reports

Groundwork

The goal of data extraction (IE) is to automatically distil and structure associations from unstructured text, with the aim of making it easier to locate information of interest in huge volumes of text. Within biomedical research articles, the textual context of a particular piece of knowledge often provides clues as to its electric current status forth the 'inquiry journey' timeline. Sentences (i)–(iii) below exemplify a number of dissimilar points forth the inquiry timeline regarding the establishment of an clan between Interleukin-17 (IL-17) and psoriasis. The association is firstly introduced in (i) equally a hypothesis to exist investigated. In (2), which is taken from the same paper [ane], the putative association is backed up past initial experimental evidence. Sentence (3) comes from a paper published x years later on [ii], by which fourth dimension the association is presented every bit widely accepted knowledge, presumably on the basis of many further positive experimental results.

(1) 'To investigate the role of Interleukin-17 (IL-17) in the pathogenesis of psoriasis...'

(2) 'These findings indicate that up-regulated expression of IL-17 might exist involved in the pathogenesis of psoriasis.'

(3) 'IL-17 is a critical cistron in the pathogenesis of psoriasis and other inflammatory diseases.'

There is a strong need to identify different types of emerging knowledge, such as those shown in sentences (1–2), in a number of different scenarios. It has been shown elsewhere that incorporating this type of information improves the automatic curation of biomedical networks and models [3].

In processing sentences (1)–(3) above, a typical IE system would firstly detect that Interleukin-17 and IL-17 are phrases that depict the aforementioned gene concept and that psoriasis represents a affliction concept. Subsequently, the system would recognise that a specific clan exists betwixt these concepts. These associations may be binary relations between concepts, which encode that a specific blazon of association exists, or they may be events, which encode complex northward-ary relations between a trigger give-and-take and multiple concepts or other events. Figure 1 shows the specific characteristics of both a relation and an event using the visualisation of the brat rapid annotation tool [four]. The output of the IE arrangement would allow the location of all sentences within a large certificate collection, regardless of their varied phrasing, that explicitly mention the same association, or those mentioning other related types of associations, eastward.1000., to find different genes that have an association with psoriasis. The structured associations that are extracted may subsequently exist used as input to further stages of reasoning or data mining. Many IE systems would consider that sentences (one)–(3) each conveys exactly the aforementioned data, since most such systems merely take into account the key information and non the wider context. Recently, notwithstanding, at that place has been a trend towards detecting diverse aspects of contextual/interpretative data (such as negation or speculation) automatically [5–8].

Fig. 1
figure 1

An example of two sentences, one containing events and the other containing one relation. The first sentence shows two events. The first event in the sentence concerns the term 'activation' which is a blazon of positive regulation. The theme of this event is 'NF-kappaB', indicating that this protein is being activated. The next event in the judgement is centered around 'dependent' which is a type of positive regulation. This event has the cause 'oxidative stress' and its theme is the outset event in the sentence. The example of a relation between two entities is, in dissimilarity to the event, clearly much more simple. The relation indicates that NPTN is related to Schizophrenia in a relation that can be categorised every bit 'Target-Disorder'

Full size image

In this piece of work, we focus on the automatic assignment of two interpretative dimensions to relations and events extracted by text mining tools. Specifically, nosotros aim to determine whether or not each relation and event corresponds to a Inquiry Hypothesis, equally in sentence (ane), or to New Noesis, as in sentence (2). To the best of our knowledge, this work represents the first effort to apply a supervised arroyo to detect this type of information at such a fine-grained level.

We envisage that the recognition of these ii interpretative dimensions is valuable in tasks where the discovery of emerging knowledge is important. To demonstrate the utility and portability of our method, nosotros show that it can exist used to enrich instances of both events and relations.

Related work

The task of automatically classifying knowledge contained within scientific literature according to its intended interpretation has long been recognised as an important step towards helping researchers to brand sense of the information reported, and to allow important details to be located in an efficient manner. Previous work, focussing either on full general scientific text or biomedical text, has aimed to assign interpretative information to continuous textual units, varying in granularity from segments of sentences to consummate paragraphs, but most frequently concerning consummate sentences. Specific aspects of interpretation addressed have included negation [v], speculation [half-dozen–8], general information content/rhetorical intent, e.m., background, methods, results, insights, etc. [9–12] and the distinction between novel information and background knowledge [13, 14].

Despite the demonstrated utility of approaches such as the higher up, performing such classifications at the level of continuous text spans is not straightforward. For example, a single sentence or clause tin can introduce multiple types of information (east.1000., several interactions or associations), each of which may have a unlike interpretation, in terms of speculation, negation, research novelty, etc. As can exist seen from Fig. 1, events and relations can structure and categorise the potentially complex information that is described in a continuous text span. Following on from the successful development of IE systems that are able to extract both gene-disease relations [fifteen–17] and biomolecular events [18, xix], there has been a growing involvement in the task of assigning interpretative data to relations and events. All the same, given that a unmarried sentence may contain mutiple events or relations, the challenge is to determine whether and how the interpretation of each of these structures is affected by the presence of detail words or phrases in the sentence that denote negation or speculation, etc.

IE systems are typically developed by applying supervised or semi-supervised methods to annotated corpora marked up with relations and events. There have been several efforts to manually enrich corpora with interpretative data, such that it is possible to railroad train models to decide automatically how particular types of contxtual information in a judgement affect the estimation of different events and relations. Nigh work on enriching relations and events has been focussed on one or two specific aspects of interpretation (e.1000., negation [20, 21] and/or speculation [22, 23]). Subsequent work has shown that these types of data can exist detected automatically [24, 25].

In contrast, piece of work on Meta-Noesis (MK) captures a wider range of contextual information, integrating and building upon diverse aspects of the in a higher place-mentioned schemes to create a number of separate 'dimensions' of information, which are aimed at capturing subtle differences in the estimation of relations and events. Domain-specific versions of the MK scheme have been created to enrich complex event structures in two dissimilar domain corpora, i.due east., the ACE-MK corpus [26], which enriches the general domain news-related events of the ACE2005 corpus [27], and the GENIA-MK corpus [28], which adds MK to the biomolecular interactions captured every bit events in the GENIA event corpus [22]. Recent work has focussed on the detection of uncertainty around events in the GENIA-MK Corpus. Uncertainty was detected using a hybrid approach of rules and machine learning. The authors were able to show that incorporating doubt into a pathway modelling chore led to an improvement in curator functioning [3].

The GENIA-MK annotation scheme defines five singled-out core dimensions of MK for events, each of which has a number of possible values, equally shown in Fig. 2:

  1. one.

    Knowledge Type, which categorises the knowledge that the writer wishes to limited into one of: Observation, Investigation, Analysis, Method, Fact or Other.

    Fig. two
    figure 2

    The GENIA-MK annotation scheme. At that place are 5 Meta-Knowledge dimensions introduced by Thompson et al. as well as 2 further hyperdimensions

    Total size prototype

  2. 2.

    Knowledge Source, which encodes whether the writer presents the knowledge every bit office of their own work (Electric current), or whether it is referring to previous work (Other).

  3. 3.

    Polarity, which is set to Positive if the event took identify, and to Negative if it is negated, i.due east., it did not have identify.

  4. 4.

    Manner, which denotes the effect's intensity, i.due east., High, Low or Neutral.

  5. 5.

    Certainty Level or Dubiousness, which indicates how sure an upshot is. Information technology may exist certain (L3), probable (L2) or possible (L1).

These five dimensions are considered to exist independent of one another, in that the value of 1 dimension does not affect the value of whatever other dimension. There may, however, exist emergent correlations between the dimensions (i.e., an outcome with the MK value 'Knowledge Source=Other' is more frequently negated), which occur due to the characteristics of the events. Previous work using the GENIA-MK corpus has demonstrated the feasibility of automatically recognising one or more of the MK dimensions [29–31]. In add-on to the v core dimensions, Thompson et al. [28] introduced the notion of hyperdimensions, (i.eastward., New Cognition and Hypothesis) which represent higher level dimensions of information whose values are determined according to specific combinations of values that are assigned to different core MK dimensions. These hyperdimensions are too represented in Fig. two. We build upon these approaches in our ain work to develop novel techniques for the recognition of New Knowledge and Hypothesis, which take into account several of the core MK dimensions described above, as well as other features pertaining to the structure of the consequence and sentence.

Methods

Our piece of work took equally its starting point the MK hyperdimensions defined by Thompson et al. [28], since we are also interested in idenfifying relations and events that describe hypotheses or new knowledge. However, nosotros institute a number of issues with the original piece of work on these hyperdimensions. Firstly, Thompson et al. [28] did not provide clear definitions for of 'Hypothesis' and 'New Knowledge'. In response, we take formulated curtailed definitions for each of them, as shown below. Secondly, past performing an analysis of events that takes into account these definitions, we found that it was non possible to reliably and consistently place events that describe new noesis or hypotheses based only on the values of the core MK dimensions. As such, we decided to carry out a new annotation effort to mark up both 'Inquiry Hypothesis' and 'New Cognition' as independent MK dimensions (i.due east., their values do not necessarily have any dependence on the values of other core MK dimensons), and to explore supervised, rather than rule-based methods, to facilitate their automatic recognition.

Annotation guidelines

The starting point for our novel annotation effort was our tightened definitions of Inquiry Hypothesis and New Knowledge; our initial definitions were refined throughout the process of notation. Every bit the definitions and guidelines evolved, we asked the annotators to revisit previously annotated documents in each new round. Our final definitions are presented below:

Inquiry Hypothesis: A relation or upshot is considered equally a Research Hypothesis if it encompasses a statement of the authors' anticipated cognition proceeds. This is shown in examples (1) and (2) in Table 1.

Tabular array 1 Examples of sentences containing research hypotheses and new knowledge

Full size table

New Knowledge: A relation or effect is considered as New Cognition if information technology corresponds to a novel research result resulting from the work the author is describing, as per examples (3) and (4) in Table 1.

Whereas the value assigned to each of the core MK dimensions of Thompson et al. is completely independent of the values assigned to the other core dimensions, our newly introduced dimensions do non maintain this independence. Rather, Research Hypothesis and New Noesis possess the property of mutual exclusivity, every bit an consequence or relation cannnot exist simultaneously both a Inquiry Hypothesis and New Knowledge. We chose to enrich two different corpora with attributes encoding Research Hypothesis and New Cognition, i.eastward., a subset of the biomolecular interactions annotated as events in the GENIA-MK corpus [28], and the biomarker-relevant relations involving genes, diseases and treatments in the European union-ADR corpus [23]. Leveraging the previously-added core MK annotations in the GENIA-MK corpus, we explored how these can contribute to the accurate recognition of New Knowledge and Research Hypothesis. Specifically, nosotros have introduced new approaches for predicting the values of the cadre Cognition Type and Knowledge Source dimensions, demonstrating an improvement over the former state of the fine art for Knowledge Blazon. We subsequently utilise supervised methods to automatically discover New Knowledge and Inquiry Hypothesis, incorporating the values of Noesis Blazon, Knowledge Source and Uncertainty as features into the trained models.

Corpora

The GENIA-MK corpus consists of one yard MEDLINE abstracts on the subject field of transcription factors in human being claret cells, which have been annotated with a range of entities and events that provide detailed, structured information about diverse types of biomolecular interactions that are described in text. In the GENIA-MK corpus, values for all five cadre MK dimensions are already manually annotated for all of the 36,000 events. The MK annotation effort also involved the identification of 'clue words', i.e., words or phrases that provide show for the assignment of values for item MK dimensions. For case, the word 'propose' would be annotated as a inkling both for Uncertainty and Cognition Type, as it indicates that the information encoded in the event is stated based on a speculative analysis of results.

The EU-ADR corpus consists of three sets of 100 MEDLINE abstracts, each obtained using dissimilar PubMed queries aimed at retrieving abstracts that are likely to contain three specific types of relations (i.due east., gene-disease, gene-drug and drug-affliction), the erstwhile ii of which can be important in discovering how unlike types of genetic data influence disease susceptibility and treatment response. The original annotation task involved identifying three types of entities, i.east., targets (proteins, genes and variants), diseases and drugs, together with relationships between these entity types, where these are present. In contrast to the richness of the event representations in the GENIA-MK corpus, each relation note in the European union-ADR corpus consists only of links between entities of two specific types. Relations were annotated in 159 of the 300 abstracts selected for inclusion in the corpus.

Annotation of new knowledge and enquiry hypothesis

As an initial step of our piece of work, subsets of GENIA-MK and European union-ADR were manually enriched with additional annotations, which place those events or relations respective to Enquiry Hypotheses or New Noesis. Since loftier quality annotations are key to ensuring that accurate supervised models can be trained, we engaged with a number of experts and carried out an exploratory annotation practice prior to the the final notation effort, in society to ensure the highest possible inter-analyst agreement (IAA).

Initially, we worked with two domain experts, a text mining researcher and a medical professional. They added the novel MK annotations to events that had been automatically detected in sentences from full-text papers. We found, however, that there were some issues with this annotation set up-up. Firstly, we found that events denoting Research Hypotheses and New Cognition were very sparse in full papers. Secondly, we establish that isolated sentences often provided insufficient context for annotators to decide accurately whether or non the upshot described new noesis or a hypothesis. Finally, nosotros found that errors in the automatically detected events were detracting the annotators' attention from the chore at manus. Based on these findings, we decided not to pursue this apporach, and instead focussed our anotation efforts on annotating Research Hypotheses and New Knowledge in abstracts containing gold-standard, expert-annotated events and relations, whose quality had previously been verified. Since abstracts also generally contain denser and more consolidated statements of New Knowledge and Research Hypotheses than full papers [32], we also expected that this approach would produce more useful training data.

We then employed ii PhD students (both working in disciplines related to biological sciences) to carry out the adjacent round of annotation piece of work. We held regular meetings to discuss new annotations and provided feedback as necessary. A subset of the abstracts was doubly annotated past both annotators, assuasive united states of america to evaluate the annotation quality past computing IAA using Cohen'southward Kappa [33].

Table 2, which shows IAA at 3 different points during the annotation process, illustrates a steady increment in IAA as fourth dimension progressed and equally more discussions were held, demonstrating a convergence towards a common agreement of the guidelines by the two annotators. Nosotros get a terminal agreement of above 0.eight on most dimensions, indicating a potent level of agreement [34]. Note of Research Hypothesis in the Eu-ADR corpus achieved slightly lower agreement of 0.761, indicating moderate understanding between the annotators [34]. At the finish of the annotation process, the annotators were asked to revisit their earlier annotations to brand revisions based on their enhanced understanding of the guidelines. Remaining discrepancies were resolved by the atomic number 82 author after consultation with both annotators.

Table ii Inter-annotator agreement across several rounds of corpus note as measured past Cohen's Kappa

Full size table

Each annotator marked up 112 abstracts from the European union-ADR corpus (70 of which were doubly annotated), and 100 abstracts from the GENIA-MK corpus (50 of which were doubly annotated). This resulted in a total of 150 GENIA-MK abstracts and 159 EU-ADR abstracts annotated with New Knowledge and Research Hypothesis. Statistics on the final corpus are shown in Table 3.

Table iii Statistics comparing our versions of the GENIA-MK and European union-ADR corpora, both annotated with new cognition and research hypothesis labels

Full size tabular array

Baseline method for new noesis and enquiry hypothesis

Thompson et al. [28] suggest a method for detecting new knowledge and hypothesis based on automated inferences from core MK values. Their inferences state that an issue will be an instance of new knowledge if the Knowledge Source dimension is equal to 'Electric current', the Dubiousness dimension is equal to 'L3' (equivalent to 'Certain' in our work, see beneath) and the Cognition Type dimension is equal to either 'Observation' or 'Analysis'. Similarly, according to their inferences, an event volition be an instance of Hypothesis if the Noesis Type dimension is equal to 'Assay' and Incertitude is equal to either 'L2' or 'L1' (which are both equivalent to 'Uncertain' in our work, come across below).

We employ these automatic inferences as a baseline for our techniques. To best reflect the work of Thompson et al. [28], we apply their manually annotated values of Knowledge Type, Uncertainty and Knowledge Source for the GENIA-MK corpus. This allows usa to compare our own piece of work with previous efforts, as well as providing a lower bound for the functioning of a dominion based system, which we contrast with our supervised learning organisation, as introduced in the next section.

A supervised method for extracting new noesis and research hypothesis

Nosotros took a supervised arroyo to annotating events with instances of our target dimensions of New Knowledge and Research Hypothesis. According to the previously mentioned intrinsic links to the core MK dimensions of Knowledge Source, Knowledge Blazon and Uncertainty, we incorporated the values of these dimensions as features that are used by our classifiers.

Uncertainty

For the Dubiousness dimension, nosotros used an existing arrangement [3]. Adopting their treatment of Doubt, nosotros differ from Thompson et al. [28] equally we use only take ii levels (certain and uncertain), equally opposed to their three levels (L3 = certain, L2 = probable and L1 = possible). Since our development of the original MK scheme, we have experimented and discussed different levels of granularity for this dimension with domain experts, and take ended that the differences betwixt the two unlike levels of uncertainty in our original scheme (i.eastward., L1 and L2) are often also subtle to exist of benefit in practical scenarios. Therefore, information technology was decided to focus instead on the binary stardom between certainty and uncertainty.

Noesis source

The Cognition Source dimension distinguishes events that encode data originating from an author'south own work (Knowledge Source = Current), from those describing piece of work from an alternative source (Knowledge Source = Other). Such information is relevant to the identification of New Noesis, equally a relation or event that corresponds to information reported in groundwork literature definitely cannot be classed as New Cognition. Attribution by citation is a well-established practice in the scientific literature. Citations can be expressed heterogeneously between documents, only are typically expressed homogeneously within a single certificate, or a collection of similarly-sourced documents. Nosotros used regular expressions to place citations following the work of Miwa et al. [35], in conjunction with a gear up of inkling expressions that aim to detect background knowledge in cases where no citation is given. These include statements such as 'we previously showed…' or 'every bit seen in our former work'. Whereas Miwa et al. apply a supervised learning method to detect Knowledge Source, we found that supervised learning approaches overfitted to the overwhelming majority class (Source =Current) in the GENIA-MK dataset. This meant that we suffered poor performance on unseen data, such as the EU-ADR corpus. To convalesce this, we just used the regular expression feature equally described above as an indicator of Knowledge Source existence 'Other'. A list of our regular expressions and clue expressions is made available as office of the Additional files.

Cognition type

For Knowledge Type, we used an implementation of the random woods algorithm [36] from the WEKA library [37]. We used the standard parameters of the random forest in the WEKA implementation. Nosotros used ten-fold cantankerous validation for all experiments, and results are reported as the macro-average across the ten folds. We care for the identification of Knowledge Type as a multi-grade nomenclature problem and we took a supervised approach to categorising relations and events in the ii corpora according to the values of the Knowledge Type dimension. To facilitate this, we used the following seven types of features to generate data about each event from GENIA-MK and relation from Eu-ADR:

  1. 1

    Sentence features describing the sentence containing the relation or upshot.

  2. ii

    Structural features, inspired past the structural differences of events.

  3. three

    Participant features, representing the participants in the relation or event.

  4. 4

    Lexical features, capturing the presence of clue words.

  5. v

    Constituency features, corresponding to relationships between a clue and the relation or result, based on the output of a parser.

  6. half-dozen

    Dependency features, which capture relationships between a clue and the relation or event based on the dependency parse tree.

  7. 7

    Parse tree features, which pertain to the structure of the dependency parse tree.

These features are farther described in Table 4. To generate these features, nosotros made use of the GENIA Tagger [38] to obtain part-of-speech (POS) tags, and the Enju parser [39] to compute syntactic parse trees.

Table 4 Types of features used in preparation the Knowledge Blazon classification model

Total size table

Research hypotheses and new knowledge

We followed a similar approach to predicting Research Hypothesis and New Cognition values to that described above for the recognition of Knowledge Type. We used the same features and besides a random forest classifier. We incorporated additional features encoding the Cognition Source, Knowledge Type and Uncertainty of each relation and upshot.

Clue lists, adult by the authors, were used for the detection of Knowledge Blazon, Noesis Source and Dubiety. For the detection of New Knowledge and Hypothesis, a combination of clues for Knowledge Type, Cognition Source and Uncertainty was used. The verbal inkling lists are available in the Boosted files.

Results

In this section, we present our experiments to find the core Knowledge Type dimension, in which we determine the almost advisable feature subset to use, and also compare our approach to previous work. We and so extend this arroyo to recognise New Cognition and Research Hypothesis, and to evaluate our results in terms of precision Footnote 1, recollect, Footnote ii and F1-score. Footnote three

Our experiments to predict the correct values for the Noesis Type dimension were carried out only using the events in the GENIA-MK corpus, given that Knowledge Blazon is only annotated in this corpus and not in EU-ADR. Nosotros performed an analysis of each feature subset to assess its impact on classifier performance, as shown in Table 5. It was established that removing each of the participant, dependency and parse tree features individually leads to a small increment in F1-score. However, in subsequent experiments, we constitute that removing all 3 features does not lead to an additional increase in performance. We therefore used all feature subsets except for the participant features in subsequent experiments, every bit this gave us the best overall score. By observing the isolated performance of each characteristic subset, we also determined that the lexical and structural features are both meaning individual contributors to the final classification score. In Table 6, we compare the performance of our classifier in predicting each Knowledge Type value with the results obtained past the state-of-the-art method adult by Miwa et al. [31]. The results reveal that our approach achieves an increase in F1-score over Miwa et al. [31] by a minimum of 0.063 for the Other value, and a maximum of 0.113 for Method. We also see respective performance boosts in terms of precision and recall. Although we observe a small driblet in call back for Fact and Method, this is offset by an increase in precision of 0.210 and 0.299, respectively.

Table five Effects of each feature subset on the concluding classification performance for Knowledge Type

Full size table

Table half-dozen A comparison of the Knowledge Type results produced by our classifier against the results of the nearly straight comparable work

Total size table

To farther investigate our improvement over Miwa et al., we swapped our classifier for an SVM, but used all the aforementioned features. The results of this are shown in Table vi. This experiment allowed u.s. to compare the performance of our features with the same classification algorithm (SVM), equally used past Miwa et al. We note that using the SVM with our features leads to a similar, but slightly worse operation in terms of F1 score than Miwa et al. on all categories except for Analysis. Yet we do note an increase in Precision for certain categories (Method, Investigation, Analysis) and Call back for others (Ascertainment, Analysis). As our features are tuned for performance with a Random Forest, this experiment demonstrates that different types of classifiers may require different characteristic sets to attain optimal operation.

To further empathise the touch of our characteristic categories, we analysed the correlation of each feature with each Noesis Type value. This allowed us to determine the virtually informative features for each Knowlegde Type value, as displayed in Table 7. In improver to this, we calculated the average rank of each feature beyond all Knowledge Type values. This measure shows the states the well-nigh globally useful features. The top features co-ordinate to average rank are displayed in Table 8.

Table 7 The top-ten virtually informative features for each Noesis Blazon value

Total size table

Table viii The 10 meridian ranked features, averaged across all classes for Noesis Type

Full size table

For the identification of New Knowledge and Research Hypothesis, we firstly performed ten-fold cross validation on each corpus (GENIA-MK and EU-ADR) and for each dimension of involvement, yielding the results in Table 9. In our presentation of results, we term the negative class for New Knowledge as "Other Knowledge", as information technology covers a number of categories that we wish to exclude (due east.g., groundwork knowledge, irrelevant knowledge, supporting noesis, etc.). We were able to allocate Knowledge Type for relations in the EU-ADR corpus past setting the effect and participant features to sensible static values — eastward.g., the number of participants in a relation is always 2.

Table nine Results of x-fold cross validation on both datasets for Research Hypothesis and New Knowledge

Total size table

Word

In Table five, we observed the effects of each characteristic subset on the overall classification score for Knowledge Type. We found that the structural, lexical and sentence features had particularly strong contributions. The structural features encoded information nearly the construction of the event and were particularly useful for identifying events that participate in other events. The lexical features depended on the identification of clue words that appeared in the context of relations and events, which provided important evidence to decide the most appropriate MK values to assign. However, the usefulness of this feature is directly tied to the comprehensiveness of the listing of clues associated with each MK value.

In improver to the feature analysis in Table v, nosotros also provided additional analysis of each specific feature in Tables vii and eight. In line with the results from Table 5, these tables demonstrate that the structural features were particularly informative for nearly classes, as well as the lexical, dependency and constituency features. It is interesting to annotation from Table vii that no individual feature is particularly strongly correlated with each course label. This supports our ensemble approach and indicates that multiple characteristic sources are needed to reach a high classification accuracy. In addition, nosotros tin can see that the correlations drib fairly chop-chop for all classes - indicating that not all features are used for every class. Finally, nosotros tin can see that different features occur in each column (with some repetition), indicating that certain features were more useful for specific classes.

For the classification of New Noesis and Hypothesis, we incorporated features denoting the existing meta-knowledge values of the event for Knowledge Source, Knowledge Type and Uncertainty. Cognition Source indicates whether an event is electric current to the research in question, or whether it describes groundwork work. This may be especially helpful for the detection of new knowledge, since it is clear that whatever background piece of work cannot be classified as new knowledge. Cognition Blazon classifies events as falling into one of six categories, i.eastward., Fact, Method, Assay, Investigation, Observation or Other. The Investigation category may have contributed to the classification of Hypothetical events, whereas Observation and Assay may accept helped to contribute to the detection of New Knowledge events. The Fact, Method and Other categories could have helped the system to decide that events did not convey either hyperdimension. Finally, Uncertainty describes whether an author presented their results with confidence in their accuracy, or with some hedging (east.g., utilise of the words may, possibly, perhaps, etc.). This dimension could take helped to contribute to the classification of hypotheses (where an author states that an event may occur) and new knowledge, where nosotros await an author to be certain nigh their results.

We compared our results to those of Miwa et al. (2012) in Table half-dozen, where we showed a consistent improvement of precision, recall and F1-score across all categories. Their system used back up vector machines (SVMs) for nomenclature, with a set up of features similar to our lexical and structural features. Notwithstanding, our work used an enhanced set up of features too as a random woods classifier, which is typically robust in loftier dimensional nomenclature problems [36]. These two factors contributed to our organization'southward improved performance. Our system yielded an average increment in precision of 0.156, merely but yielded an average increase in remember of 0.04. This implies that the use of a random woods and additional features mainly helped to ensure that the system returned results which are consistently correct. For both the 'Fact' and 'Method' Knowledge Type values, our system yielded a slight dip in recollect compared to previous work. Notwithstanding, this was coupled with an increase in precision of 0.210 and 0.298, respectively.

To understand the relative contributions fabricated past our switches in both feature prepare and blazon of classifier, compared to previous piece of work, we analysed the performance of our organisation when using an SVM with our features instead of a Random Wood. We attained a similar performance to Miwa et al. using our feature set and SVM, although some values were lower than those reported by Miwa et al. This implies that our decision to use a different type of classifier to Miwa et al. (i.eastward., Random Forest instead of SVM) was the primary reason behind our improved performance. Different feature sets are ameliorate suited to different types of classifiers, and our feature prepare was carefully selected (as documented in Table v) to be performant with a Random Forest. Miwa et al.'south features were equally selected to perform well with an SVM. We have shown similar results in prior piece of work for a task on detecting metaknowledge for negated bio-events [29], where we showed that tree-based methods, including the Random Wood, outperformed other techniques such equally the SVM for detecting the negation dimension of metaknowledge.

We illustrated our results for the identification of the novel dimensions New Knowledge and Inquiry Hypothesis in Table 9. These showed stiff performance beyond both corpora and association types (events and relations). The results for the GENIA-MK corpus (events) outperformed those for the EU-ADR corpus (relations). This was about likely due to the divergence in size between the corpora. There are over 10 times more annotated events in the subset of GENIA-MK that nosotros annotated than relations in the subset of EU-ADR (6899 events vs. 622 relations). The fact that we annotated all of the 159 abstracts bachelor in the EU-ADR corpus and merely 150 abstracts from GENIA-MK indicates that event structures are more densely packed in GENIA-MK than relations in Eu-ADR.

In detail, the Eu-ADR corpus yielded a poor recall value for Research Hypotheses. There were merely 38 examples of relations annotated equally Research Hypothesis in the European union-ADR corpus. Our annotators reported that several relations occuring in hypothetical contexts appeared to have been missed by the original annotators of the EU-ADR corpus, which may be the cause of this sparsity. All the same, adding additional relations to the corpus was beyond the scope of the current work. The precision for the prediction of Research Hypothesis in the European union-ADR corpus was 1.00, indicating that of those relations automatically classified as Research Hypothesis, all were indeed Research Hypotheses (i.e., there were no false positives). It is usually the case in minority grade situations that a classifier will tend towards classifying instances equally the majority form (i.e., favouring false negatives over false positives), so this result is expected. We chose not to perform subsampling of the majority course, equally the density of Enquiry Hypotheses or New Cognition in our training data is reflective of the density we would expect in other biomedical abstracts.

Our corpus has focussed on identifying Enquiry Hypotheses and New Cognition in biomedical abstracts. Nonetheless, it has been shown elsewhere that full texts contain more than information than abstracts lone [40]. Whilst our future goal is to additionally facilitate the recognition of New Cognition and Research Hypothesis in full papers, our decision to focus initially on abstracts was motivated by the findings of our before rounds of annotation. These initial note efforts revealed that the density of the types of MK that form the focus of the current paper are very low in full papers and are consequently difficult for annotators to reliably place. Therefore we chose to use abstracts, where the density was higher, since the availability of as many examples as possible of relevant MK was of import for the evolution of our methods. We noted that abstracts fairly consistently mention the main Research Hypotheses and New Knowledge outcomes from a newspaper. However, farther information may be available in the full paper that has not been mentioned in the abstract. To admission this data we will demand to further adapt our techniques and develop annotated corpora of full papers — this is left for future work.

Error analysis

Finally, we present an analysis of some common errors that our organisation makes and strategies for overcoming these in future piece of work. In the following sentence, the event centred on "regulation" was marked as Non-Hypothetical past the annotators, but our system recognised it as a Hypothetical effect.

To continue our investigation of the cellular events that occur following human CMV (HCMV) infection, we focused on the regulation of cellular activation post-obit viral binding to homo monocytes.

Event ID: E1
Trigger: regulation
Theme: activation following viral binding
Cause: N/A
Clue: focused

Information technology is likely that this event was marked every bit a hypothesis by the system because of the words 'investigation' and 'focused' that occur earlier it. However in this example, the main hypothesis that the annotators have marked is on the upshot centred on 'occur' preceding the upshot centred effectually 'focused'. To overcome this in futurity piece of work, we could implement a classification strategy that takes into account MK data that has already been assigned to other events that occur in the context of the focussed outcome. A conditional random field or deep learning model could be used for sequence labelling to attain this.

The second error, which concerns the effect centred on "effects" in the post-obit sentence, was marked as Hypothetical by our annotators, just was classified as Not-Hypothetical by our organisation.

MATERIAL AND METHODS: In the present report, nosotros analyzed the effects of CyA, aspirin, and indomethacin \(\dots \)

Result ID: E2
Trigger: effects
Theme: Cya, aspirin, and indomethacin
Crusade: N/A
Clue: present study

This event is conspicuously stating the subject of the authors' investigation, and so should exist marked equally hypothesis. It is likely that our system was confused past the preceding section heading, which led information technology to believe that this was role of the background or methods, and not a statement of the authors' intended inquiry goals. To overcome this, we could place these department headings automatically and either exclude them from the text to be analysed, or utilize them as actress features in our nomenclature scheme.

In our tertiary example mistake, the result in the sentence below is centred on the phrase "result in decreased". The event was marked as new knowledge by the annotators, but the arrangement was non able to recognise it equally such.

Downward-regulation of MCP-1 expression by aspirin may outcome in decreased recruitment of monocytes into the arterial intima beneath stressed EC.

Event ID: E3
Trigger: upshot in decreased
Theme: recruitment of monocytes
Crusade: Down-regulation of MCP-1 expression
by aspirin
Inkling: N/A

Nosotros believe that the cause of this classification errors is the unusual event trigger - the bulk of events only have a unmarried verb as their trigger. To help the system to better determine cases in which such events announce new cognition, it would be necessary to further increment our corpus size, such that the training set includes a wider variety of trigger types. A further factor affecting the inability of the organisation to determine the new noesis classification may have been be the lack of an appropriate new knowledge clue. In this case, the annotators most probable determined this every bit an example of new knowledge due to information from the wider context of the discourse. We could amend our classifier past looking for clues in a wider window, or past looking for discourse clues that might indicate that the author is drawing their conclusions.

The last example below concerns an event (centred on the verb "enhanced"), which was marked as 'other knowledge' by the annotators, but which the system determined to be an example of new cognition.

Taken together, these data indicate that the unexpected expression of megakaryocytic genes is a specific belongings of immortalized cells that cannot be explained simply by enhanced expression of Spi-one and/or Fli-one genes

Event ID: E4
Trigger: expression
Theme: megakaryotic genes
Cause: N/A
Clue: indicate
Upshot ID: E5
Trigger: enhanced
Theme: expression of Spi-1 and…
Cause: E4
Clue: N/A

In this example, the result is somewhat problematic as regards the assignment of MK. Although it is clear both that the sentence is a concluding statement, and that in that location is some new knowledge contained within it, the annotators chose non to mark the outcome with the trigger "enhanced" as new knowledge, indicating that they did non consider it to convey the main attribute of new cognition in this sentence. Interestingly, however, both annotators agreed with the organisation that the event centred on the commencement case of "expression" should exist marked equally an case of new knowledge. The presence of the clue 'indicate' may be affecting the organization's classification decision in both cases. A human annotator can distinguish that indicate is virtually relevant to 'expression', rather than 'enhanced', whereas our system was unable to brand this distinction.

Conclusions

We have presented a novel application of text mining techniques for the discovery of Research Hypotheses and New Noesis at the level of events and relations. This constitutes the first report into the awarding of supervised methods to assign these interpretative aspects at such a fine-grained level. We firstly showed that by applying a Random Forest classifier using a new feature prepare, we were able to reach a better performance than previous efforts in detecting Knowledge Type. We afterwards showed that the core MK dimensions of Noesis Type, Noesis Source and Uncertainty could feed into the training of classifiers that can predict whether events and relations represent Enquiry Hypotheses and New Noesis, with a high caste of accuracy. Our techniques can be incorporated into a arrangement that allows researchers to quickly filter information contained within the abstracts of research articles, every bit shown in previous literature [3]. Our methods by and large favour precision on the positive class (i.e., Research Hypothesis or New Cognition). Specifically, nosotros reach a precision of between 0.863 and 1.00 on all of the corpus experiments. This demonstrates that our approach is successful in avoiding the identification of false positives, thus allowing researchers to be confident that instances of Research Hypothesis or New Cognition identified by our method will usually be correct.

Notes

  1. the proportion of results returned by the system which are correct.

  2. the proportion of correct results returned by the system every bit a fraction of all the correct results that should take been plant.

  3. the balanced harmonic mean between precision and recall, providing a single overall measure of performance.

Abbreviations

ADR:

Agin Drug Reaction

F1:

F1 Score (The harmonic mean between Precision and Retrieve)

IE:

Data Extraction

IAA:

Inter-Annotator Agreement

MK:

Meta-Noesis

P:

Precision

R:

Recall

SVM:

Support Vector Machine

TM:

Text Mining

References

  1. Jiawen 50, Dongsheng L, Zhijian T. The expression of interleukin-17, interferon-gamma, and macrophage inflammatory protein-three alpha mRNA in patients with psoriasis vulgaris. J Huazhong Academy Sci Technol [Med Sci]. 2004; 24(3):294–6. https://doi.org/10.1007/BF02832018.

    Article  Google Scholar

  2. Scharffetter-Kochanek One thousand, Singh G, Tasdogan A, Wlaschek Chiliad, Gatzka M, Hainzl A, Peters T. Reduction of CD18 promotes expansion of inflammatory gd T cells collaborating with CD4 T cells in chronic murine psoriasiform dermatitis. J Immunol. 2013; 191:5477–88. https://doi.org/10.4049/jimmunol.1300976.

    Article  PubMed  CAS  Google Scholar

  3. Zerva C, Batista-Navarro R, Day P, Ananiadou S. Using uncertainty to link and rank bear witness from biomedical literature for model curation. Bioinformatics. btx466. https://doi.org/ten.1093/bioinformatics/btx466.

  4. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou Southward, Tsujii J. BRAT: a web-based tool for NLP-assisted text notation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics: 2012. p. 102–107.

  5. Agarwal S, Yu H, Kohane I. BioNØT: A searchable database of biomedical negated sentences. BMC Bioinformatics. 2011; 12(i):420. https://doi.org/ten.1186/1471-2105-12-420.

    Article  PubMed  PubMed Central  Google Scholar

  6. Medlock B, Briscoe T. Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Commonwealth: Clan for Computational Linguistics: 2007. p. 992–ix. http://world wide web.aclweb.org/anthology/P07-1125.

    Google Scholar

  7. Vincze V, Szarvas G, Farkas R, Móra Yard, Csirik J. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics. 2008; 9(11):1–9.

    Google Scholar

  8. Malhotra A, Younesi Due east, Gurulingappa H, Hofmann-Apitius Chiliad. 'HypothesisFinder:' a strategy for the detection of speculative statements in scientific text. PLOS Comput Biol. 2013; 9(seven):1–x. https://doi.org/x.1371/periodical.pcbi.1003117.

    Article  CAS  Google Scholar

  9. Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbühler A, Fabry P, Gobeill J, Pillet 5, Rebholz-Schuhmann D, Lovis C, et al. Using argumentation to extract cardinal sentences from biomedical abstracts. Int J Med Inform. 2007; 76(2):195–200.

    Article  PubMed  Google Scholar

  10. Teufel Southward, Carletta J, Moens M. An notation scheme for discourse-level argumentation in enquiry articles. In: Proceedings of the 9th Briefing on European Affiliate of the Association for Computational Linguistics. EACL '99. Stroudsburg: Association for Computational Linguistics: 1999. p. 110–7. https://doi.org/10.3115/977035.977051.

    Google Scholar

  11. Mizuta Y, Collier Due north. Zone identification in biological science articles as a basis for information extraction. In: Proceedings of the International Joint Workshop on Tongue Processing in Biomedicine and Its Applications. JNLPBA '04. Stroudsburg: Association for Computational Linguistics: 2004. p. 29–35. http://dl.acm.org/commendation.cfm?id=1567594.1567600.

    Google Scholar

  12. Burns One thousand, Dasigi P, de Waard A, Hovy EH. Automated detection of discourse segment and experimental types from the text of cancer pathway results sections. Database. 2016; 2016:122. https://doi.org/ten.1093/database/baw122.

    Article  Google Scholar

  13. Liakata M, Saha S, Dobnik South, Batchelor C, Rebholz-Schuhmann D. Automatic recognition of conceptualization zones in scientific articles and ii life science applications. Bioinformatics. 2012; 28(7):991. https://doi.org/10.1093/bioinformatics/bts071.

    Article  PubMed  PubMed Central  CAS  Google Scholar

  14. Simsek D, Buckingham Shum S, Sandor A, De Liddo A, Ferguson R. Xip dashboard: visual analytics from automated rhetorical parsing of scientific metadiscourse. In: 1st International Workshop on Soapbox-Centric Learning Analytics. Leuven: 2013.

  15. Bundschus M, Dejori Yard, Stetter Yard, Tresp V, Kriegel HP. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008; nine(1):207.

    Article  PubMed  PubMed Fundamental  CAS  Google Scholar

  16. Bravo A, Piñero J, Queralt-Rosinach N, Rautschka LIM. Furlong: Extraction of relations between genes and diseases from text and large-scale data assay: implications for translational research. BMC Bioinformatics. 2015; xvi(1):55.

    Commodity  PubMed  PubMed Central  Google Scholar

  17. Verspoor KM, Heo EG, Kang KY, Song M. Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts. BMC Med Inf Decis Mak. 2016; xvi(one):68.

    Article  Google Scholar

  18. Nedellec C. Learning language in logic-genic interaction extraction challenge. In: Proceedings of the ICML-2005 Workshop on Learning Language in Logic (LLL05): 2005. p. 31–7.

  19. Kim JD, Pyysalo S, Ohta T, Bossy R, Nguyen N, Tsujii J. Overview of BioNLP shared task 2011. In: Proceedings of the BioNLP Shared Task 2011 Workshop. Portland: Clan for Computational Linguistics: 2011. p. i–vi.

    Google Scholar

  20. Pyysalo S, Ginter F, Heimonen J, Björne F, Boberg F, Järvinen F, Salakoski T. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007; 8(1):50.

    Article  PubMed  PubMed Central  CAS  Google Scholar

  21. Sanchez-Graillet O, Poesio One thousand. Negation of protein—poly peptide interactions: analysis and extraction. Bioinformatics. 2007; 23(13):424. https://doi.org/10.1093/bioinformatics/btm184.

    Article  CAS  Google Scholar

  22. Kim JD, Ohta T, Tsujii J. Corpus annotation for mining biomedical events from literature. BMC Bioinformatics. 2008; 9(1):1–25.

    Article  CAS  Google Scholar

  23. Van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia G, Nieto A, Trifiro G, Kors JA, Furlong LI. The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012; 45(5):879–84.

    Commodity  PubMed  Google Scholar

  24. Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP'xi shared job. BMC Bioinformatics. 2012; xiii(xi):4.

    Article  Google Scholar

  25. Kilicoglu H, Bergler Southward. Biological outcome composition. BMC Bioinformatics. 2012; 13(11):7.

    Article  Google Scholar

  26. Thompson P, Nawaz R, McNaught J, Ananiadou S. Enriching news events with meta-cognition information. Lang Resour Eval. 2016:1–30. https://doi.org/10.1007/s10579-016-9344-9.

  27. Walker C, Strassel S, Medero J, Maeda K. ACE 2005 multilingual training corpus. Philadelphia: Linguistic Information Consortium; 2006.

    Google Scholar

  28. Thompson P, Nawaz R, McNaught J, Ananiadou S. Enriching a biomedical event corpus with meta-knowledge annotation. BMC Bioinformatics. 2011; 12(1):one–18.

    Article  Google Scholar

  29. Nawaz R, Thompson P, Ananiadou South. Negated BioEvents: Analysis and identification. BMC Bioinformatics. 2013; 14(1):14. https://doi.org/ten.1186/1471-2105-xiv-14.

    Commodity  PubMed  PubMed Cardinal  Google Scholar

  30. Nawaz R, Thompson P, Ananiadou S. Something former, something new: identifying knowledge source in bio-events. Int J Comput Linguist Appl. 2013; 4(1):129–44.

    Google Scholar

  31. Miwa Chiliad, Thompson P, McNaught J, Kell DB, Ananiadou S. Extracting semantically enriched events from biomedical literature. BMC Bioinformatics. 2012; 13:108. https://doi.org/10.1186/1471-2105-xiii-108. Highly Accessed.

    Article  PubMed  PubMed Fundamental  Google Scholar

  32. Nawaz R, Thompson P, Ananiadou Southward. Meta-cognition annotation at the event level: Comparison between abstracts and full papers. In: Proceedings of the 3rd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012): 2012. p. 24–31.

  33. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960; 20(1):37–46. https://doi.org/10.1177/001316446002000104.

    Article  Google Scholar

  34. McHugh ML. Interrater reliability: the kappa statistic. Biochemia medica. 2012; 22(3):276–82.

    Article  PubMed  PubMed Primal  Google Scholar

  35. Miwa M, Sætre R, Kim JD, Tsujii J. Event extraction with complex outcome classification using rich features. J Bioinforma Comput Biol. 2010; eight(01):131–46.

    Commodity  CAS  Google Scholar

  36. Breiman L. Random forests. Machine Learning. 2001; 45(1):five–32.

    Article  Google Scholar

  37. Hall One thousand, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: An update. SIGKDD Explor Newsl. 2009; eleven(i):10–18. https://doi.org/x.1145/1656274.1656278.

    Article  Google Scholar

  38. Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J. Developing a robust part-of-speech tagger for biomedical text. Berlin, Heidelberg: Springer; 2005, pp. 382–92. Advances in Informatics: 10th Panhellenic Briefing on Informatics, PCI 2005, Volas, Hellenic republic, Nov 11-13, 2005.

    Book  Google Scholar

  39. Miyao Y, Tsujii J. Characteristic wood models for probabilistic HPSG parsing. Comput Linguist. 2008; 34(1):35–80. https://doi.org/10.1162/coli.2008.34.ane.35.

    Article  Google Scholar

  40. Schuemie MJ, Weeber Yard, Schijvenaars BJA, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA. Distribution of data in biomedical abstracts and full-text publications. Bioinformatics. 2004; 20(xvi):2597–604. https://doi.org/x.1093/bioinformatics/bth291.

    Article  PubMed  CAS  Google Scholar

Download references

Acknowledgements

The authors wish to thank the annotators involved in creating the dataset for this newspaper, without whom this inquiry would non have been possible. Out thank you too go to the reviewers for their considered feedback on our research.

Funding

The authors of this work were funded by the European Commission (an Open up Mining Infrastructure for Text and Data. OpenMinTeD. Grant: 654021), the Medical Research Council (Manchester Molecular Pathology Innovation Centre. MMPathIC Grant: MR/N00583X/1) and the Biotechnology and Biological Sciences Enquiry Council (Enriching Metabolic PATHwaY models with prove from the literature. EMPATHY. Grant: BB/M006891/1). The funders played no function in either the design of the study or the collection, analysis, and interpretation of information, or in writing the manuscript.

Availability of information and materials

The datasets generated and analysed during the current study are bachelor as Boosted files to this paper.

Writer information

Affiliations

Contributions

MS ran the principal experiments, performed the assay of the results and participated in authoring the paper. RB helped with the design of the experiments and authoring the paper. PT contributed work on the preparation of the EU-ADR corpus as well equally participating in the authorship of the newspaper. RN contributed to the experimental design, guidelines for the annotators and participated in the authorship of the paper. JM and SA jointly supervised the enquiry and participated in authoring the paper. All authors read and approved the terminal version of this manuscript prior to publication.

Corresponding author

Correspondence to Sophia Ananiadou.

Ethics declarations

Ethics approval and consent to participate

No ethics approval was required for any element of this study.

Consent for publication

Non Applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher's Annotation

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Boosted files

Additional file 1

The annotation guidelines that were given to annotators for reference. (PDF 830 kb)

Additional file 2

A table providing an in depth description of each characteristic. (PDF 32 kb)

Additional file 3

Read me documentation explaining the structure of the inkling files. (TXT iv kb)

Boosted file 4

The clues used to notice the Analysis component of the Cognition Blazon meta-noesis dimension. (FILE iii kb)

Additional file 5

The clues used to notice the Fact component of the Knowledge Type meta-noesis dimension. (FILE 4 kb)

Boosted file 6

The clues used to detect the Investigation component of the Knowledge Type meta-noesis dimension. (FILE ii kb)

Additional file seven

The clues used to detect the Method component of the Knowledge Blazon meta-knowledge dimension. (FILE 4 kb)

Boosted file 8

The clues used to detect the Ascertainment component of the Knowledge Blazon meta-knowledge dimension. (FILE four kb)

Boosted file 9

The clues used to detect the Other component of the Knowledge Source meta-knowledge dimension. (FILE 1 kb)

Additional file 10

The clues used to detect the Uncertain component of the Certainty Level meta-knowledge dimension. (FILE 4 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution four.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted utilise, distribution, and reproduction in whatever medium, provided you requite appropriate credit to the original author(due south) and the source, provide a link to the Creative Eatables license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zilch/i.0/) applies to the information made bachelor in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this commodity

Shardlow, M., Batista-Navarro, R., Thompson, P. et al. Identification of research hypotheses and new knowledge from scientific literature. BMC Med Inform Decis Mak 18, 46 (2018). https://doi.org/ten.1186/s12911-018-0639-1

Download commendation

  • Received:

  • Accepted:

  • Published:

  • DOI : https://doi.org/10.1186/s12911-018-0639-one

Keywords

  • Text mining
  • Events
  • Meta-knowledge
  • Hypothesis
  • New knowledge

Source: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-018-0639-1

Posted by: hawkinsconory1967.blogspot.com

0 Response to "How To Find The Hypothesis In A Journal Article"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel