'x' 必须是至少两个维度的数组:Error while 运行 textmineR package for document clustering in R
'x' must be an array of at least two dimensions: Error while running textmineR package for document clustering in R
我是 运行 textmineR
执行文档聚类和生成词云的库,但是遇到以下错误消息。
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
这是数据结构:
dput(my_entrez_df_v1)
structure(list(PMID = c(32646047L, 32641214L, 32370561L, 32661206L,
30089512L, 26694452L, 26602089L, 25542463L, 20462354L, 16824203L,
16571117L, 16227300L, 16390004L, 15766558L, 15777647L, 15651759L,
15135736L), Title = c("Protein Coding and Long Noncoding RNA (lncRNA) Transcriptional Landscape in SARS-CoV-2 Infected Bronchial Epithelial Cells Highlight a Role for Interferon and Inflammatory Response.",
"SARS-CoV-2 infection risk assessment in the endometrium: viral infection-related gene expression across the menstrual cycle.",
"Multi-tiered screening and diagnosis strategy for COVID-19: a model for sustainable testing capacity in response to pandemic.",
"Assessment of risk conferred by coding and regulatory variations of TMPRSS2 and CD26 in susceptibility to SARS-CoV-2 infection in human.",
"Human _-defensin 2 plays a regulatory role in innate antiviral immunity and is capable of potentiating the induction of antigen-specific immunity.",
"Glucose-6-Phosphate Dehydrogenase Enhances Antiviral Response through Downregulation of NADPH Sensor HSCARG and Upregulation of NF-_B Signaling.",
"Middle East respiratory syndrome coronavirus shows poor replication but significant induction of antiviral responses in human monocyte-derived macrophages and dendritic cells.",
"Blood MxA protein as a marker for respiratory virus infections in young children.",
"Significance of the myxovirus resistance A (MxA) gene -123C>a single-nucleotide polymorphism in suppressed interferon beta induction of severe acute respiratory syndrome coronavirus infection.",
"Association of SARS susceptibility with single nucleic acid polymorphisms of OAS1 and MxA genes: a case-control study.",
"Inhibition of cytokine gene expression and induction of chemokine genes in non-lymphatic cells infected with SARS coronavirus.",
"Severe acute respiratory syndrome coronavirus fails to activate cytokine-mediated innate immune responses in cultured human monocyte-derived dendritic cells.",
"A case-control study on the mxA polymorphisms and susceptibility to severe acute respiratory syndromes",
"Polymorphisms of interferon-inducible genes OAS-1 and MxA associated with SARS in the Vietnamese population.",
"Microarray and real-time RT-PCR analyses of differential human gene expression patterns induced by severe acute respiratory syndrome (SARS) coronavirus infection of Vero cells.",
"Increased sensitivity of SARS-coronavirus to a combination of human type I and type II interferons.",
"The antiviral effect of interferon-beta against SARS-coronavirus is not mediated by MxA protein."
), Abstract_v1 = c("The global spread of COVID-19 caused by pathogenic severe acute respiratory syndrome coronavirus 2 SARS-CoV-2 underscores the need for an imminent response from medical research communities to better understand this rapidly spreading infection Employing multiple bioinformatics and computational pipelines on transcriptome data from primary normal human bronchial epithelial cells NHBE during SARS-CoV-2 infection revealed activation of several mechanistic networks including those involved in immunoglobulin G IgG and interferon lambda IFNL in host cells Induction of acute inflammatory response and activation of tumor necrosis factor TNF was prominent in SARS-CoV-2 infected NHBE cells Additionally disease and functional analysis employing ingenuity pathway analysis IPA revealed activation of functional categories related to cell death while those associated with viral infection and replication were suppressed Several interferon IFN responsive gene targets IRF9 IFIT1 IFIT2 IFIT3 IFITM1 MX1 OAS2 OAS3 IFI44 and IFI44L were highly upregulated in SARS-CoV-2 infected NBHE cell implying activation of antiviral IFN innate response Gene ontology and functional annotation of differently expressed genes in patient lung tissues with COVID-19 revealed activation of antiviral response as the hallmark Mechanistic network analysis in IPA identified 14 common activated and 9 common suppressed networks in patient tissue as well as in the NHBE cell model suggesting a plausible role for these upstream regulator networks in the pathogenesis of COVID-19 Our data revealed expression of several viral proteins in vitro and in patient-derived tissue while several host-derived long noncoding RNAs lncRNAs were identified Our data highlights activation of IFN response as the main hallmark associated with SARS-CoV-2 infection in vitro and in human and identified several differentially expressed lncRNAs during the course of infection which could serve as disease biomarkers while their precise role in the host response to SARS-CoV-2 remains to be investigated",
"To determine the susceptibility of the endometrium to infection by-and thereby potential damage from-SARS-CoV-2 Analysis of SARS-Cov-2 infection-related gene expression from endometrial transcriptomic data sets Infertility research department affiliated with a public hospital Gene expression data from five studies in 112 patients with normal endometrium collected throughout the menstrual cycle None Gene expression and correlation between viral infectivity genes and age throughout the menstrual cycle Gene expression was high for TMPRSS4 CTSL CTSB FURIN MX1 and BSG medium for TMPRSS2 and low for ACE2 ACE2 TMPRSS4 CTSB CTSL and MX1 expression increased toward the window of implantation TMPRSS4 expression was positively correlated with ACE2 CTSB CTSL MX1 and FURIN during several cycle phases TMPRSS2 was not statistically significantly altered across the cycle ACE2 TMPRSS4 CTSB CTSL BSG and MX1 expression increased with age especially in early phases of the cycle Endometrial tissue is likely safe from SARS-CoV-2 cell entry based on ACE2 and TMPRSS2 expression but susceptibility increases with age Further TMPRSS4 along with BSG-mediated viral entry into cells could imply a susceptible environment for SARS-CoV-2 entry via different mechanisms Additional studies are warranted to determine the true risk of endometrial infection by SARS-CoV-2 and implications for fertility treatments",
"Coronavirus disease 2019 COVID-19 caused by novel enveloped single stranded RNA coronavirus SARS-CoV-2 is responsible for an ongoing global pandemic While other countries deployed widespread testing as an early mitigation strategy the US experienced delays in development and deployment of organism identification assays As such there is uncertainty surrounding disease burden and community spread severely hampering containment efforts COVID-19 illuminates the need for a tiered diagnostic approach to rapidly identify clinically significant infections and reduce disease spread Without the ability to efficiently screen patients hospitals are overwhelmed potentially delaying treatment for other emergencies A multi-tiered diagnostic strategy incorporating a rapid host immune response assay as a screening test molecular confirmatory testing and rapid IgM/IgG testing to assess benefit from quarantine/further testing and provide information on population exposure/herd immunity would efficiently evaluate potential COVID-19 patients Triaging patients within minutes with a fingerstick rather than hours/days after an invasive swab is critical to pandemic response as reliance on the existing strategy is limited by assay accuracy time to results and testing capacity Early screening and triage is achievable from the outset of a pandemic with point-of-care host immune response testing which will improve response time to clinical and public health actionsKey messagesDelayed testing deployment has led to uncertainty surrounding overall disease burden and community spread severely hampering public health containment and healthcare system preparation effortsA multi-tiered testing strategy incorporating rapid host immune point-of-care tests can be used now and for future pandemic planning by effectively identifying patients at risk of disease thereby facilitating quarantine earlier in the progression of the outbreak during the weeks and months it can take for pathogen specific confirmatory tests to be developed validated and manufactured in sufficient quantitiesThe ability to triage patients at the point of care and support the guidance of medical and therapeutic decisions for viral isolation or confirmatory testing or for appropriate treatment of COVID-19 and/or bacterial infections is a critical component to our national pandemic response and there is an urgent need to implement the proposed strategy to combat the current outbreak",
"At present more than 200 countries and territories are directly affected by the coronavirus disease-19 COVID-19 pandemic Incidence and case fatality rate are significantly higher among elderly individuals age60 years type 2 diabetes and hypertension patients Cellular receptor ACE2 serine protease TMPRSS2 and exopeptidase CD26 also known as DPP4 are the three membrane bound proteins potentially implicated in SARS-CoV-2 infection We hypothesised that common variants from TMPRSS2 and CD26 may play critical role in infection susceptibility of predisposed population or group of individuals Coding missense and regulatory variants from TMPRSS2 and CD26 were studied across 26 global populations Two missense and five regulatory SNPs were identified to have differential allelic frequency Significant linkage disequilibrium LD signature was observed in different populations Modelled protein-protein interaction PPI predicted strong molecular interaction between these two receptors and SARS-CoV-2 spike protein S1 domain However two missense SNPs rs12329760 TMPRSS2 and rs1129599 CD26 were not found to be involved physically in the said interaction Four regulatory variants rs112657409 rs11910678 rs77675406 and rs713400 from TMPRSS2 were found to influence the expression of TMPRSS2 and pathologically relevant MX1 rs13015258 a 50 UTR variant from CD26 have significant role in regulation of expression of key regulatory genes that could be involved in SARS-CoV-2 internalization Overexpression of CD26 through epigenetic modification at rs13015258-C allele was found critical and could explain the higher SARS-CoV-2 infected fatality rate among type 2 diabetes",
"Antimicrobial peptides AMPs are primarily known for their innate immune defense against invading microorganisms including viruses In addition recent research has suggested their modulatory activity in immune induction Given that most subunit vaccines require an adjuvant to achieve effective immune induction through the activation of innate immunity AMPs are plausible candidate molecules for stimulating not only innate immune but also adaptive immune responses In this study we investigated the ability of human -defensin HBD 2 to promote antiviral immunity in vitro and in vivo using a receptor-binding domain RBD of Middle East respiratory syndrome-coronavirus MERS-CoV spike protein S RBD as a model antigen Ag When HBD 2-conjugated S RBD was used to treat THP-1 human monocytic cells the expression levels of antiviral IFN- IFN- MxA PKR and RNaseL and primary immune-inducing NOD2 TNF- IL-1 and IL-6 molecules were enhanced compared to those expressed after treatment with S RBD only The expression of chemokines capable of recruiting leukocytes including monocytes/macrophages natural killer cells granulocytes T cells and dendritic cells was also increased following HBD 2-conjugated S RBD treatment More important immunization of mice with HBD 2-conjugated S RBD enhanced the immunogenicity of the S RBD and elicited a higher S RBD-specific neutralizing antibody response than S RBD alone We conclude that HBD 2 activates the primary antiviral innate immune response and may also mediate the induction of an effective adaptive immune response against a conjugated Ag",
"Glucose-6-phosphate dehydrogenase G6PD-deficient cells are highly susceptible to viral infection This study examined the mechanism underlying this phenomenon by measuring the expression of antiviral genes-tumor necrosis factor alpha TNF- and GTPase myxovirus resistance 1 MX1-in G6PD-knockdown cells upon human coronavirus 229E HCoV-229E and enterovirus 71 EV71 infection Molecular analysis revealed that the promoter activities of TNF- and MX1 were downregulated in G6PD-knockdown cells and that the IB degradation and DNA binding activity of NF-B were decreased The HSCARG protein a nicotinamide adenine dinucleotide phosphate NADPH sensor and negative regulator of NF-B was upregulated in G6PD-knockdown cells with decreased NADPH/NADP ratio Treatment of G6PD-knockdown cells with siRNA against HSCARG enhanced the DNA binding activity of NF-B and the expression of TNF- and MX1 but suppressed the expression of viral genes however the overexpression of HSCARG inhibited the antiviral response Exogenous G6PD or IDH1 expression inhibited the expression of HSCARG resulting in increased expression of TNF- and MX1 and reduced viral gene expression upon virus infection Our findings suggest that the increased susceptibility of the G6PD-knockdown cells to viral infection was due to impaired NF-B signaling and antiviral response mediated by HSCARG",
"In this study we assessed the ability of Middle East respiratory syndrome coronavirus MERS-CoV to replicate and induce innate immunity in human monocyte-derived macrophages and dendritic cells MDDCs and compared it with severe acute respiratory syndrome coronavirus SARS-CoV Assessments of viral protein and RNA levels in infected cells showed that both viruses were impaired in their ability to replicate in these cells Some induction of IFN-1 CXCL10 and MxA mRNAs in both macrophages and MDDCs was seen in response to MERS-CoV infection but almost no such induction was observed in response to SARS-CoV infection ELISA and Western blot assays showed clear production of CXCL10 and MxA in MERS-CoV-infected macrophages and MDDCs Our data suggest that SARS-CoV and MERS-CoV replicate poorly in human macrophages and MDDCs but MERS-CoV is nonetheless capable of inducing a readily detectable host innate immune response Our results highlight a clear difference between the viruses in activating host innate immune responses in macrophages and MDDCs which may contribute to the pathogenesis of infection",
"Type I interferon induced MxA response can differentiate viral from bacterial infections but MxA responses in rhinovirus or asymptomatic virus infections are not known To study MxA protein levels in healthy state and during respiratory virus infection of young children in an observational prospective cohort Blood samples and nasal swabs were collected from 153 and 77 children with and without symptoms of respiratory infections respectively Blood MxA protein levels were measured by an enzyme immunoassay and PCR methods were used for the detection of respiratory viruses in nasal swabs Respiratory viruses were detected in 81 of symptomatic children They had higher blood MxA protein levels median interquartile range than asymptomatic virus-negative children 695 345-1370 g/L vs 110 55-170 g/L p 0001 Within asymptomatic children no significant difference was observed in MxA responses between virus-positive and virus-negative groups A cut-off level of 175 g/L had 92 sensitivity and 77 specificity for a symptomatic respiratory virus infection Rhinovirus respiratory syncytial virus parainfluenza virus influenza virus coronavirus and human metapneumovirus infections were associated with elevated MxA responses Asymptomatic virus-negative children vaccinated with a live virus vaccine had elevated MxA protein levels 240 120-540 g/L but significantly lower than children with an acute respiratory infection who had not received vaccinations 740 350-1425 g/L p0001 Blood MxA protein levels are increased in young children with symptomatic respiratory virus infections including rhinovirus infections MxA is an informative general marker for the most common acute virus infections",
"Myxovirus resistance A MxA is an antiviral protein induced by interferon alpha and beta IFN-alpha IFN-beta that can inhibit viral replication The minor alleles of the -88GT and -123CA MxA promoter single-nucleotide polymorphisms SNPs are associated with increased promoter activity and altered response to IFN-alpha and IFN-beta treatment Here we demonstrate that the -123A minor allele provided stronger binding affinity to nuclear proteins extracted from IFN-beta-untreated cells than did the wild-type allele whereas the -88T allele showed preferential binding after IFN-beta stimulation Endogenous IFN-alpha and IFN-beta induction can be suppressed in severe acute respiratory syndrome SARS coronavirus infection In support of our in vitro findings a large case-control genetic-association study for SARS coronavirus infection confirmed that the -123A minor-allele carriers were significantly associated with lower risk of SARS coronavirus infection whereas the -88T minor-allele carriers were insignificant after adjustment for confounding effects This suggests that -123CA plays a more important role in modulating basal MxA expression thus contributing more significantly to innate immune response against viral infections that suppress endogenous IFN-alpha and IFN-beta induction such as SARS coronavirus",
"Host genetic factors may play a role in susceptibility and resistance to SARS associated coronavirus SARS-CoV infection The study was carried out to investigate the association between the genetic polymorphisms of 25-oligoadenylate synthetase 1 OAS1 gene as well as myxovirus resistance 1 MxA gene and susceptibility to SARS in Chinese Han population A hospital-based case-control study was conducted A collective of 66 SARS cases and 64 close contact uninfected controls were enrolled in this study End point real time polymerase chain reaction PCR and PCR-based Restriction Fragment Length Polymorphism RFLP analysis were used to detect the single nucleic polymorphisms SNPs in OAS1 and MxA genes Information on other factors associated with SARS infection was collected using a pre-tested questionnaire Univariate and multivariate logistic analyses were conducted One polymorphism in the 3-untranslated region 3-UTR of the OAS1 gene was associated with SARS infection Compared to AA genotype AG and GG genotypes were found associated with a protective effect on SARS infection with ORs 95 CI of 042 020-089 and 030 009-097 respectively Also a GT genotype at position 88 in the MxA gene promoter was associated with increased susceptibility to SARS infection compared to a GG genotype OR 306 95 CI 125-750 The associations of AG genotype in OAS1 and GT genotype in MxA remained significant in multivariate analyses after adjusting for SARS protective measures OR 038 95 CI 014-098 and OR 322 95 CI 113-918 respectively SNPs in the OAS1 3-UTR and MxA promoter region appear associated with host susceptibility to SARS in Chinese Han population",
"SARS coronavirus SARS-CoV is the etiologic agent of the severe acute respiratory syndrome SARS-CoV mainly infects tissues of non-lymphatic origin and the cytokine profile of those cells can determine the course of disease Here we investigated the cytokine response of two human non-lymphatic cell lines Caco-2 and HEK 293 which are fully permissive for SARS-CoV A comparison with established cytokine-inducing viruses revealed that SARS-CoV only weakly triggered a cytokine response In particular SARS-CoV did not activate significant transcription of the interferons IFN-alpha IFN-beta IFN-lambda1 IFN-lambda2/3 as well as of the interferon-induced antiviral genes ISG56 and MxA the chemokine RANTES and the interleukine IL-6 Interestingly however SARS-CoV strongly induced the chemokines IP-10 and IL-8 in the colon carcinoma cell line Caco-2 but not in the embryonic kidney cell line 293 Our data indicate that SARS-CoV suppresses the antiviral cytokine system of non-immune cells to a large extent thus buying time for dissemination in the host However synthesis of IP-10 and IL-8 which are established markers for acute-stage SARS escapes the virus-induced silencing at least in some cell types Therefore the progressive infiltration of immune cells into the infected lungs observed in SARS patients could be due to the production of these chemokines by the infected tissue cells",
"Activation of host innate immune responses was studied in severe acute respiratory syndrome coronavirus SCV-infected human A549 lung epithelial cells macrophages and dendritic cells DCs In all cell types SCV-specific subgenomic mRNAs were seen whereas no expression of SCV proteins was found No induction of cytokine genes alpha interferon IFN-alpha IFN-beta interleukin-28A/B IL-28A/B IL-29 tumor necrosis factor alpha CCL5 or CXCL10 or IFN-alpha/beta-induced MxA gene was seen in SCV-infected A549 cells macrophages or DCs SCV also failed to induce DC maturation CD86 expression or enhance major histocompatibility complex class II expression Our data strongly suggest that SCV fails to activate host cell cytokine gene expression in human macrophages and DCs",
"To investigate the association between the genetic polymorphisms of myxovirus resistance 1 MxA gene and susceptibility to severe acute respiratory syndromes SARS A case-control study was conducted and polymerase chain reaction-restriction fragment length polymorphism PCR-RFLP was used to detect the T/G polymorphism at position-88 in the mxA gene promoter Information on related factors of SARS was collected using a pre-testing questionnaire Univariate and multivariate logistic analyses were conducted with SPSS software package Sixty-six cases and sixty-four controls were selected for the study Comparing with GG genotype the proportion of GT genotype were significantly higher in the case group 813 than that in the control group 625 with an OR 95 CI of 2700 1208-6037 Multivariate logistic regression analysis revealed that the significant association remained after factors as wearing masks protection gowns and eye-protection when contacting with SARS patient etc were adjusted with an OR 95 CI of 2911 1027-8250 mxA promoter-88G/T SNP might be confered to host genetic susceptibility to SARS in Chinese Han population",
"We hypothesized that host antiviral genes induced by type I interferons might affect the natural course of severe acute respiratory syndrome SARS We analyzed single nucleotide polymorphisms SNPs of 25-oligoadenylate synthetase 1 OAS-1 myxovirus resistance-A MxA and double-stranded RNA-dependent protein kinase in 44 Vietnamese SARS patients with 103 controls The G-allele of non-synonymous A/G SNP in exon 3 of OAS-1 gene showed association with SARS p00090 The G-allele in exon 3 of OAS-1 and the one in exon 6 were in strong linkage disequilibrium and both of them were associated with SARS infection The GG genotype and G-allele of G/T SNP at position -88 in the MxA gene promoter were found more frequently in hypoxemic group than in non-hypoxemic group of SARS p00195 Our findings suggest that polymorphisms of two IFN-inducible genes OAS-1 and MxA might affect susceptibility to the disease and progression of SARS at each level",
"Vero E6 African green monkey kidney cells are highly susceptible to infection with the newly emerging severe acute respiratory syndrome coronavirus SARS-CoV and they are permissive for rapid viral replication with resultant cytopathic effects We employed cDNA microarray analysis to characterize the cellular transcriptional responses of homologous human genes at 12 h post-infection Seventy mRNA transcripts belonging to various functional classes exhibited significant alterations in gene expression There was considerable induction of heat shock proteins that are crucial to the immune response mechanism Modified levels of several transcripts involved in pro-inflammatory and anti-inflammatory processes exemplified the balance between opposing forces during SARS pathogenesis Other genes displaying altered transcription included those associated with host translation cellular metabolism cell cycle signal transduction transcriptional regulation protein trafficking protein modulators and cytoskeletal proteins Alterations in the levels of several novel transcripts encoding hypothetical proteins and expressed sequence tags were also identified In addition transcription of apoptosis-related genes DENN and hIAP1 was upregulated in contrast to FAIM Elevated Mx1 expression signified a strong host response to mediate antiviral resistance Also expressed in infected cells was the C-terminal alternative splice variant of the p53 tumor suppressor gene encoding a modified truncated protein that can influence the activity of wild-type p53 We observed the interplay between various mechanisms to favor virus multiplication before full-blown apoptosis and the triggering of several pathways in host cells in an attempt to eliminate the pathogen Microarray analysis identifies the critical host-pathogen interactions during SARS-CoV infection and provides new insights into the pathophysiology of SARS",
"There is currently an urgent need to identify effective antiviral agents that will prevent and treat severe acute respiratory syndrome coronavirus SARS-CoV infection In this study we have investigated and compared the antiviral effect of different interferons IFNs on SARS-CoV replication in the epithelial kidney monkey Vero cell line The results showed that SARS-CoV grown in Vero cells is moderately sensitive to IFN-beta and only weakly sensitive to IFN-alpha and IFN-gamma in comparison to other IFN-sensitive viruses such as those for encephalomyocarditis vesicular stomatitis and Newcastle disease Simultaneous incubation of Vero cells with IFN-beta and IFN-gamma indicated that they may act synergistically against SARS-CoV replication The IFN-induced MxA protein was detected in the IFN-treated Vero cells The data however suggest that the antiviral activity of IFN against SARS-CoV virus is independent of MxA expression",
"Severe acute respiratory syndrome SARS is caused by a novel coronavirus termed SARS-CoV No antiviral treatment has been established so far Interferons are cytokines which induce the synthesis of several antivirally active proteins in the cell In this study we demonstrated that multiplication of SARS-CoV in cell culture can be strongly inhibited by pretreatment with interferon-beta Interferon-alpha and interferon-gamma by contrast were less effective The human MxA protein is one of the most prominent proteins induced by interferon-beta Nevertheless no interference with SARS-CoV replication was observed in Vero cells stably expressing MxA Therefore other interferon-induced proteins must be responsible for the strong inhibitory effect of interferon-beta against SARS-CoV"
)), class = "data.frame", row.names = c(NA, -17L))
library(textmineR)
library(dplyr) # pipes
library(stringi) # for stri_enc_isutf8
#The below code uses regular expressions to cleanse. May need to tinker with the last
#portion that selects the grammar to retain
my_entrez_df_v1<- my_entrez_df %>%
mutate(Abstract = gsub("[^[:alnum:][:blank:]?&/\-]", "", my_entrez_df$Abstract)) %>%
rename(Abstract_v1 = Abstract)
#this column is now utf 8.
all(stri_enc_isutf8(my_entrez_df_v1$Abstract_v1))
# create a document term matrix
dtm <- CreateDtm(doc_vec = my_entrez_df_v1$Abstract_v1, # character vector of documents
doc_names = my_entrez_df_v1$PMID, # document names
ngram_window = c(1, 2), # minimum and maximum n-gram length
stopword_vec = c(stopwords::stopwords("en"), # stopwords from tm
stopwords::stopwords(source = "smart")), # this is the default value
lower = TRUE, # lowercase - this is the default value
remove_punctuation = TRUE, # punctuation - this is the default
remove_numbers = TRUE, # numbers - this is the default
verbose = FALSE, # Turn off status bar for this demo
cpus = 2) # default is all available cpus on the system
# construct the matrix of term counts to get the IDF vector
tf_mat <- TermDocFreq(dtm)
# TF-IDF and cosine similarity
tfidf <- t(dtm[ , tf_mat$term ]) * tf_mat$idf
tfidf <- t(tfidf)
csim <- tfidf / sqrt(rowSums(tfidf * tfidf))
csim <- csim %*% t(csim)
cdist <- as.dist(1 - csim)
hc <- hclust(cdist, "ward.D")
clustering <- cutree(hc, 10)
p_words <- colSums(dtm) / sum(dtm)
cluster_words <- lapply(unique(clustering), function(x){
rows <- dtm[ clustering == x , ]
# for memory's sake, drop all words that don't appear in the cluster
rows <- rows[ , colSums(rows) > 0 ]
colSums(rows) / sum(rows) - p_words[ colnames(rows) ]
})
dtm 是一个稀疏矩阵。您的过滤器 rows <- dtm[ clustering == x , ]
为 x = 1 或 2 提供了一个新的稀疏矩阵,但在您的示例中,如果您选择 x = 3,它会变成一个法向量。那是在 colSums()
上触发错误时,它不能将矢量作为输入。
这可能是因为所有聚类= 3个案例都出现在矩阵的一行中?
我是 运行 textmineR
执行文档聚类和生成词云的库,但是遇到以下错误消息。
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
这是数据结构:
dput(my_entrez_df_v1)
structure(list(PMID = c(32646047L, 32641214L, 32370561L, 32661206L,
30089512L, 26694452L, 26602089L, 25542463L, 20462354L, 16824203L,
16571117L, 16227300L, 16390004L, 15766558L, 15777647L, 15651759L,
15135736L), Title = c("Protein Coding and Long Noncoding RNA (lncRNA) Transcriptional Landscape in SARS-CoV-2 Infected Bronchial Epithelial Cells Highlight a Role for Interferon and Inflammatory Response.",
"SARS-CoV-2 infection risk assessment in the endometrium: viral infection-related gene expression across the menstrual cycle.",
"Multi-tiered screening and diagnosis strategy for COVID-19: a model for sustainable testing capacity in response to pandemic.",
"Assessment of risk conferred by coding and regulatory variations of TMPRSS2 and CD26 in susceptibility to SARS-CoV-2 infection in human.",
"Human _-defensin 2 plays a regulatory role in innate antiviral immunity and is capable of potentiating the induction of antigen-specific immunity.",
"Glucose-6-Phosphate Dehydrogenase Enhances Antiviral Response through Downregulation of NADPH Sensor HSCARG and Upregulation of NF-_B Signaling.",
"Middle East respiratory syndrome coronavirus shows poor replication but significant induction of antiviral responses in human monocyte-derived macrophages and dendritic cells.",
"Blood MxA protein as a marker for respiratory virus infections in young children.",
"Significance of the myxovirus resistance A (MxA) gene -123C>a single-nucleotide polymorphism in suppressed interferon beta induction of severe acute respiratory syndrome coronavirus infection.",
"Association of SARS susceptibility with single nucleic acid polymorphisms of OAS1 and MxA genes: a case-control study.",
"Inhibition of cytokine gene expression and induction of chemokine genes in non-lymphatic cells infected with SARS coronavirus.",
"Severe acute respiratory syndrome coronavirus fails to activate cytokine-mediated innate immune responses in cultured human monocyte-derived dendritic cells.",
"A case-control study on the mxA polymorphisms and susceptibility to severe acute respiratory syndromes",
"Polymorphisms of interferon-inducible genes OAS-1 and MxA associated with SARS in the Vietnamese population.",
"Microarray and real-time RT-PCR analyses of differential human gene expression patterns induced by severe acute respiratory syndrome (SARS) coronavirus infection of Vero cells.",
"Increased sensitivity of SARS-coronavirus to a combination of human type I and type II interferons.",
"The antiviral effect of interferon-beta against SARS-coronavirus is not mediated by MxA protein."
), Abstract_v1 = c("The global spread of COVID-19 caused by pathogenic severe acute respiratory syndrome coronavirus 2 SARS-CoV-2 underscores the need for an imminent response from medical research communities to better understand this rapidly spreading infection Employing multiple bioinformatics and computational pipelines on transcriptome data from primary normal human bronchial epithelial cells NHBE during SARS-CoV-2 infection revealed activation of several mechanistic networks including those involved in immunoglobulin G IgG and interferon lambda IFNL in host cells Induction of acute inflammatory response and activation of tumor necrosis factor TNF was prominent in SARS-CoV-2 infected NHBE cells Additionally disease and functional analysis employing ingenuity pathway analysis IPA revealed activation of functional categories related to cell death while those associated with viral infection and replication were suppressed Several interferon IFN responsive gene targets IRF9 IFIT1 IFIT2 IFIT3 IFITM1 MX1 OAS2 OAS3 IFI44 and IFI44L were highly upregulated in SARS-CoV-2 infected NBHE cell implying activation of antiviral IFN innate response Gene ontology and functional annotation of differently expressed genes in patient lung tissues with COVID-19 revealed activation of antiviral response as the hallmark Mechanistic network analysis in IPA identified 14 common activated and 9 common suppressed networks in patient tissue as well as in the NHBE cell model suggesting a plausible role for these upstream regulator networks in the pathogenesis of COVID-19 Our data revealed expression of several viral proteins in vitro and in patient-derived tissue while several host-derived long noncoding RNAs lncRNAs were identified Our data highlights activation of IFN response as the main hallmark associated with SARS-CoV-2 infection in vitro and in human and identified several differentially expressed lncRNAs during the course of infection which could serve as disease biomarkers while their precise role in the host response to SARS-CoV-2 remains to be investigated",
"To determine the susceptibility of the endometrium to infection by-and thereby potential damage from-SARS-CoV-2 Analysis of SARS-Cov-2 infection-related gene expression from endometrial transcriptomic data sets Infertility research department affiliated with a public hospital Gene expression data from five studies in 112 patients with normal endometrium collected throughout the menstrual cycle None Gene expression and correlation between viral infectivity genes and age throughout the menstrual cycle Gene expression was high for TMPRSS4 CTSL CTSB FURIN MX1 and BSG medium for TMPRSS2 and low for ACE2 ACE2 TMPRSS4 CTSB CTSL and MX1 expression increased toward the window of implantation TMPRSS4 expression was positively correlated with ACE2 CTSB CTSL MX1 and FURIN during several cycle phases TMPRSS2 was not statistically significantly altered across the cycle ACE2 TMPRSS4 CTSB CTSL BSG and MX1 expression increased with age especially in early phases of the cycle Endometrial tissue is likely safe from SARS-CoV-2 cell entry based on ACE2 and TMPRSS2 expression but susceptibility increases with age Further TMPRSS4 along with BSG-mediated viral entry into cells could imply a susceptible environment for SARS-CoV-2 entry via different mechanisms Additional studies are warranted to determine the true risk of endometrial infection by SARS-CoV-2 and implications for fertility treatments",
"Coronavirus disease 2019 COVID-19 caused by novel enveloped single stranded RNA coronavirus SARS-CoV-2 is responsible for an ongoing global pandemic While other countries deployed widespread testing as an early mitigation strategy the US experienced delays in development and deployment of organism identification assays As such there is uncertainty surrounding disease burden and community spread severely hampering containment efforts COVID-19 illuminates the need for a tiered diagnostic approach to rapidly identify clinically significant infections and reduce disease spread Without the ability to efficiently screen patients hospitals are overwhelmed potentially delaying treatment for other emergencies A multi-tiered diagnostic strategy incorporating a rapid host immune response assay as a screening test molecular confirmatory testing and rapid IgM/IgG testing to assess benefit from quarantine/further testing and provide information on population exposure/herd immunity would efficiently evaluate potential COVID-19 patients Triaging patients within minutes with a fingerstick rather than hours/days after an invasive swab is critical to pandemic response as reliance on the existing strategy is limited by assay accuracy time to results and testing capacity Early screening and triage is achievable from the outset of a pandemic with point-of-care host immune response testing which will improve response time to clinical and public health actionsKey messagesDelayed testing deployment has led to uncertainty surrounding overall disease burden and community spread severely hampering public health containment and healthcare system preparation effortsA multi-tiered testing strategy incorporating rapid host immune point-of-care tests can be used now and for future pandemic planning by effectively identifying patients at risk of disease thereby facilitating quarantine earlier in the progression of the outbreak during the weeks and months it can take for pathogen specific confirmatory tests to be developed validated and manufactured in sufficient quantitiesThe ability to triage patients at the point of care and support the guidance of medical and therapeutic decisions for viral isolation or confirmatory testing or for appropriate treatment of COVID-19 and/or bacterial infections is a critical component to our national pandemic response and there is an urgent need to implement the proposed strategy to combat the current outbreak",
"At present more than 200 countries and territories are directly affected by the coronavirus disease-19 COVID-19 pandemic Incidence and case fatality rate are significantly higher among elderly individuals age60 years type 2 diabetes and hypertension patients Cellular receptor ACE2 serine protease TMPRSS2 and exopeptidase CD26 also known as DPP4 are the three membrane bound proteins potentially implicated in SARS-CoV-2 infection We hypothesised that common variants from TMPRSS2 and CD26 may play critical role in infection susceptibility of predisposed population or group of individuals Coding missense and regulatory variants from TMPRSS2 and CD26 were studied across 26 global populations Two missense and five regulatory SNPs were identified to have differential allelic frequency Significant linkage disequilibrium LD signature was observed in different populations Modelled protein-protein interaction PPI predicted strong molecular interaction between these two receptors and SARS-CoV-2 spike protein S1 domain However two missense SNPs rs12329760 TMPRSS2 and rs1129599 CD26 were not found to be involved physically in the said interaction Four regulatory variants rs112657409 rs11910678 rs77675406 and rs713400 from TMPRSS2 were found to influence the expression of TMPRSS2 and pathologically relevant MX1 rs13015258 a 50 UTR variant from CD26 have significant role in regulation of expression of key regulatory genes that could be involved in SARS-CoV-2 internalization Overexpression of CD26 through epigenetic modification at rs13015258-C allele was found critical and could explain the higher SARS-CoV-2 infected fatality rate among type 2 diabetes",
"Antimicrobial peptides AMPs are primarily known for their innate immune defense against invading microorganisms including viruses In addition recent research has suggested their modulatory activity in immune induction Given that most subunit vaccines require an adjuvant to achieve effective immune induction through the activation of innate immunity AMPs are plausible candidate molecules for stimulating not only innate immune but also adaptive immune responses In this study we investigated the ability of human -defensin HBD 2 to promote antiviral immunity in vitro and in vivo using a receptor-binding domain RBD of Middle East respiratory syndrome-coronavirus MERS-CoV spike protein S RBD as a model antigen Ag When HBD 2-conjugated S RBD was used to treat THP-1 human monocytic cells the expression levels of antiviral IFN- IFN- MxA PKR and RNaseL and primary immune-inducing NOD2 TNF- IL-1 and IL-6 molecules were enhanced compared to those expressed after treatment with S RBD only The expression of chemokines capable of recruiting leukocytes including monocytes/macrophages natural killer cells granulocytes T cells and dendritic cells was also increased following HBD 2-conjugated S RBD treatment More important immunization of mice with HBD 2-conjugated S RBD enhanced the immunogenicity of the S RBD and elicited a higher S RBD-specific neutralizing antibody response than S RBD alone We conclude that HBD 2 activates the primary antiviral innate immune response and may also mediate the induction of an effective adaptive immune response against a conjugated Ag",
"Glucose-6-phosphate dehydrogenase G6PD-deficient cells are highly susceptible to viral infection This study examined the mechanism underlying this phenomenon by measuring the expression of antiviral genes-tumor necrosis factor alpha TNF- and GTPase myxovirus resistance 1 MX1-in G6PD-knockdown cells upon human coronavirus 229E HCoV-229E and enterovirus 71 EV71 infection Molecular analysis revealed that the promoter activities of TNF- and MX1 were downregulated in G6PD-knockdown cells and that the IB degradation and DNA binding activity of NF-B were decreased The HSCARG protein a nicotinamide adenine dinucleotide phosphate NADPH sensor and negative regulator of NF-B was upregulated in G6PD-knockdown cells with decreased NADPH/NADP ratio Treatment of G6PD-knockdown cells with siRNA against HSCARG enhanced the DNA binding activity of NF-B and the expression of TNF- and MX1 but suppressed the expression of viral genes however the overexpression of HSCARG inhibited the antiviral response Exogenous G6PD or IDH1 expression inhibited the expression of HSCARG resulting in increased expression of TNF- and MX1 and reduced viral gene expression upon virus infection Our findings suggest that the increased susceptibility of the G6PD-knockdown cells to viral infection was due to impaired NF-B signaling and antiviral response mediated by HSCARG",
"In this study we assessed the ability of Middle East respiratory syndrome coronavirus MERS-CoV to replicate and induce innate immunity in human monocyte-derived macrophages and dendritic cells MDDCs and compared it with severe acute respiratory syndrome coronavirus SARS-CoV Assessments of viral protein and RNA levels in infected cells showed that both viruses were impaired in their ability to replicate in these cells Some induction of IFN-1 CXCL10 and MxA mRNAs in both macrophages and MDDCs was seen in response to MERS-CoV infection but almost no such induction was observed in response to SARS-CoV infection ELISA and Western blot assays showed clear production of CXCL10 and MxA in MERS-CoV-infected macrophages and MDDCs Our data suggest that SARS-CoV and MERS-CoV replicate poorly in human macrophages and MDDCs but MERS-CoV is nonetheless capable of inducing a readily detectable host innate immune response Our results highlight a clear difference between the viruses in activating host innate immune responses in macrophages and MDDCs which may contribute to the pathogenesis of infection",
"Type I interferon induced MxA response can differentiate viral from bacterial infections but MxA responses in rhinovirus or asymptomatic virus infections are not known To study MxA protein levels in healthy state and during respiratory virus infection of young children in an observational prospective cohort Blood samples and nasal swabs were collected from 153 and 77 children with and without symptoms of respiratory infections respectively Blood MxA protein levels were measured by an enzyme immunoassay and PCR methods were used for the detection of respiratory viruses in nasal swabs Respiratory viruses were detected in 81 of symptomatic children They had higher blood MxA protein levels median interquartile range than asymptomatic virus-negative children 695 345-1370 g/L vs 110 55-170 g/L p 0001 Within asymptomatic children no significant difference was observed in MxA responses between virus-positive and virus-negative groups A cut-off level of 175 g/L had 92 sensitivity and 77 specificity for a symptomatic respiratory virus infection Rhinovirus respiratory syncytial virus parainfluenza virus influenza virus coronavirus and human metapneumovirus infections were associated with elevated MxA responses Asymptomatic virus-negative children vaccinated with a live virus vaccine had elevated MxA protein levels 240 120-540 g/L but significantly lower than children with an acute respiratory infection who had not received vaccinations 740 350-1425 g/L p0001 Blood MxA protein levels are increased in young children with symptomatic respiratory virus infections including rhinovirus infections MxA is an informative general marker for the most common acute virus infections",
"Myxovirus resistance A MxA is an antiviral protein induced by interferon alpha and beta IFN-alpha IFN-beta that can inhibit viral replication The minor alleles of the -88GT and -123CA MxA promoter single-nucleotide polymorphisms SNPs are associated with increased promoter activity and altered response to IFN-alpha and IFN-beta treatment Here we demonstrate that the -123A minor allele provided stronger binding affinity to nuclear proteins extracted from IFN-beta-untreated cells than did the wild-type allele whereas the -88T allele showed preferential binding after IFN-beta stimulation Endogenous IFN-alpha and IFN-beta induction can be suppressed in severe acute respiratory syndrome SARS coronavirus infection In support of our in vitro findings a large case-control genetic-association study for SARS coronavirus infection confirmed that the -123A minor-allele carriers were significantly associated with lower risk of SARS coronavirus infection whereas the -88T minor-allele carriers were insignificant after adjustment for confounding effects This suggests that -123CA plays a more important role in modulating basal MxA expression thus contributing more significantly to innate immune response against viral infections that suppress endogenous IFN-alpha and IFN-beta induction such as SARS coronavirus",
"Host genetic factors may play a role in susceptibility and resistance to SARS associated coronavirus SARS-CoV infection The study was carried out to investigate the association between the genetic polymorphisms of 25-oligoadenylate synthetase 1 OAS1 gene as well as myxovirus resistance 1 MxA gene and susceptibility to SARS in Chinese Han population A hospital-based case-control study was conducted A collective of 66 SARS cases and 64 close contact uninfected controls were enrolled in this study End point real time polymerase chain reaction PCR and PCR-based Restriction Fragment Length Polymorphism RFLP analysis were used to detect the single nucleic polymorphisms SNPs in OAS1 and MxA genes Information on other factors associated with SARS infection was collected using a pre-tested questionnaire Univariate and multivariate logistic analyses were conducted One polymorphism in the 3-untranslated region 3-UTR of the OAS1 gene was associated with SARS infection Compared to AA genotype AG and GG genotypes were found associated with a protective effect on SARS infection with ORs 95 CI of 042 020-089 and 030 009-097 respectively Also a GT genotype at position 88 in the MxA gene promoter was associated with increased susceptibility to SARS infection compared to a GG genotype OR 306 95 CI 125-750 The associations of AG genotype in OAS1 and GT genotype in MxA remained significant in multivariate analyses after adjusting for SARS protective measures OR 038 95 CI 014-098 and OR 322 95 CI 113-918 respectively SNPs in the OAS1 3-UTR and MxA promoter region appear associated with host susceptibility to SARS in Chinese Han population",
"SARS coronavirus SARS-CoV is the etiologic agent of the severe acute respiratory syndrome SARS-CoV mainly infects tissues of non-lymphatic origin and the cytokine profile of those cells can determine the course of disease Here we investigated the cytokine response of two human non-lymphatic cell lines Caco-2 and HEK 293 which are fully permissive for SARS-CoV A comparison with established cytokine-inducing viruses revealed that SARS-CoV only weakly triggered a cytokine response In particular SARS-CoV did not activate significant transcription of the interferons IFN-alpha IFN-beta IFN-lambda1 IFN-lambda2/3 as well as of the interferon-induced antiviral genes ISG56 and MxA the chemokine RANTES and the interleukine IL-6 Interestingly however SARS-CoV strongly induced the chemokines IP-10 and IL-8 in the colon carcinoma cell line Caco-2 but not in the embryonic kidney cell line 293 Our data indicate that SARS-CoV suppresses the antiviral cytokine system of non-immune cells to a large extent thus buying time for dissemination in the host However synthesis of IP-10 and IL-8 which are established markers for acute-stage SARS escapes the virus-induced silencing at least in some cell types Therefore the progressive infiltration of immune cells into the infected lungs observed in SARS patients could be due to the production of these chemokines by the infected tissue cells",
"Activation of host innate immune responses was studied in severe acute respiratory syndrome coronavirus SCV-infected human A549 lung epithelial cells macrophages and dendritic cells DCs In all cell types SCV-specific subgenomic mRNAs were seen whereas no expression of SCV proteins was found No induction of cytokine genes alpha interferon IFN-alpha IFN-beta interleukin-28A/B IL-28A/B IL-29 tumor necrosis factor alpha CCL5 or CXCL10 or IFN-alpha/beta-induced MxA gene was seen in SCV-infected A549 cells macrophages or DCs SCV also failed to induce DC maturation CD86 expression or enhance major histocompatibility complex class II expression Our data strongly suggest that SCV fails to activate host cell cytokine gene expression in human macrophages and DCs",
"To investigate the association between the genetic polymorphisms of myxovirus resistance 1 MxA gene and susceptibility to severe acute respiratory syndromes SARS A case-control study was conducted and polymerase chain reaction-restriction fragment length polymorphism PCR-RFLP was used to detect the T/G polymorphism at position-88 in the mxA gene promoter Information on related factors of SARS was collected using a pre-testing questionnaire Univariate and multivariate logistic analyses were conducted with SPSS software package Sixty-six cases and sixty-four controls were selected for the study Comparing with GG genotype the proportion of GT genotype were significantly higher in the case group 813 than that in the control group 625 with an OR 95 CI of 2700 1208-6037 Multivariate logistic regression analysis revealed that the significant association remained after factors as wearing masks protection gowns and eye-protection when contacting with SARS patient etc were adjusted with an OR 95 CI of 2911 1027-8250 mxA promoter-88G/T SNP might be confered to host genetic susceptibility to SARS in Chinese Han population",
"We hypothesized that host antiviral genes induced by type I interferons might affect the natural course of severe acute respiratory syndrome SARS We analyzed single nucleotide polymorphisms SNPs of 25-oligoadenylate synthetase 1 OAS-1 myxovirus resistance-A MxA and double-stranded RNA-dependent protein kinase in 44 Vietnamese SARS patients with 103 controls The G-allele of non-synonymous A/G SNP in exon 3 of OAS-1 gene showed association with SARS p00090 The G-allele in exon 3 of OAS-1 and the one in exon 6 were in strong linkage disequilibrium and both of them were associated with SARS infection The GG genotype and G-allele of G/T SNP at position -88 in the MxA gene promoter were found more frequently in hypoxemic group than in non-hypoxemic group of SARS p00195 Our findings suggest that polymorphisms of two IFN-inducible genes OAS-1 and MxA might affect susceptibility to the disease and progression of SARS at each level",
"Vero E6 African green monkey kidney cells are highly susceptible to infection with the newly emerging severe acute respiratory syndrome coronavirus SARS-CoV and they are permissive for rapid viral replication with resultant cytopathic effects We employed cDNA microarray analysis to characterize the cellular transcriptional responses of homologous human genes at 12 h post-infection Seventy mRNA transcripts belonging to various functional classes exhibited significant alterations in gene expression There was considerable induction of heat shock proteins that are crucial to the immune response mechanism Modified levels of several transcripts involved in pro-inflammatory and anti-inflammatory processes exemplified the balance between opposing forces during SARS pathogenesis Other genes displaying altered transcription included those associated with host translation cellular metabolism cell cycle signal transduction transcriptional regulation protein trafficking protein modulators and cytoskeletal proteins Alterations in the levels of several novel transcripts encoding hypothetical proteins and expressed sequence tags were also identified In addition transcription of apoptosis-related genes DENN and hIAP1 was upregulated in contrast to FAIM Elevated Mx1 expression signified a strong host response to mediate antiviral resistance Also expressed in infected cells was the C-terminal alternative splice variant of the p53 tumor suppressor gene encoding a modified truncated protein that can influence the activity of wild-type p53 We observed the interplay between various mechanisms to favor virus multiplication before full-blown apoptosis and the triggering of several pathways in host cells in an attempt to eliminate the pathogen Microarray analysis identifies the critical host-pathogen interactions during SARS-CoV infection and provides new insights into the pathophysiology of SARS",
"There is currently an urgent need to identify effective antiviral agents that will prevent and treat severe acute respiratory syndrome coronavirus SARS-CoV infection In this study we have investigated and compared the antiviral effect of different interferons IFNs on SARS-CoV replication in the epithelial kidney monkey Vero cell line The results showed that SARS-CoV grown in Vero cells is moderately sensitive to IFN-beta and only weakly sensitive to IFN-alpha and IFN-gamma in comparison to other IFN-sensitive viruses such as those for encephalomyocarditis vesicular stomatitis and Newcastle disease Simultaneous incubation of Vero cells with IFN-beta and IFN-gamma indicated that they may act synergistically against SARS-CoV replication The IFN-induced MxA protein was detected in the IFN-treated Vero cells The data however suggest that the antiviral activity of IFN against SARS-CoV virus is independent of MxA expression",
"Severe acute respiratory syndrome SARS is caused by a novel coronavirus termed SARS-CoV No antiviral treatment has been established so far Interferons are cytokines which induce the synthesis of several antivirally active proteins in the cell In this study we demonstrated that multiplication of SARS-CoV in cell culture can be strongly inhibited by pretreatment with interferon-beta Interferon-alpha and interferon-gamma by contrast were less effective The human MxA protein is one of the most prominent proteins induced by interferon-beta Nevertheless no interference with SARS-CoV replication was observed in Vero cells stably expressing MxA Therefore other interferon-induced proteins must be responsible for the strong inhibitory effect of interferon-beta against SARS-CoV"
)), class = "data.frame", row.names = c(NA, -17L))
library(textmineR)
library(dplyr) # pipes
library(stringi) # for stri_enc_isutf8
#The below code uses regular expressions to cleanse. May need to tinker with the last
#portion that selects the grammar to retain
my_entrez_df_v1<- my_entrez_df %>%
mutate(Abstract = gsub("[^[:alnum:][:blank:]?&/\-]", "", my_entrez_df$Abstract)) %>%
rename(Abstract_v1 = Abstract)
#this column is now utf 8.
all(stri_enc_isutf8(my_entrez_df_v1$Abstract_v1))
# create a document term matrix
dtm <- CreateDtm(doc_vec = my_entrez_df_v1$Abstract_v1, # character vector of documents
doc_names = my_entrez_df_v1$PMID, # document names
ngram_window = c(1, 2), # minimum and maximum n-gram length
stopword_vec = c(stopwords::stopwords("en"), # stopwords from tm
stopwords::stopwords(source = "smart")), # this is the default value
lower = TRUE, # lowercase - this is the default value
remove_punctuation = TRUE, # punctuation - this is the default
remove_numbers = TRUE, # numbers - this is the default
verbose = FALSE, # Turn off status bar for this demo
cpus = 2) # default is all available cpus on the system
# construct the matrix of term counts to get the IDF vector
tf_mat <- TermDocFreq(dtm)
# TF-IDF and cosine similarity
tfidf <- t(dtm[ , tf_mat$term ]) * tf_mat$idf
tfidf <- t(tfidf)
csim <- tfidf / sqrt(rowSums(tfidf * tfidf))
csim <- csim %*% t(csim)
cdist <- as.dist(1 - csim)
hc <- hclust(cdist, "ward.D")
clustering <- cutree(hc, 10)
p_words <- colSums(dtm) / sum(dtm)
cluster_words <- lapply(unique(clustering), function(x){
rows <- dtm[ clustering == x , ]
# for memory's sake, drop all words that don't appear in the cluster
rows <- rows[ , colSums(rows) > 0 ]
colSums(rows) / sum(rows) - p_words[ colnames(rows) ]
})
dtm 是一个稀疏矩阵。您的过滤器 rows <- dtm[ clustering == x , ]
为 x = 1 或 2 提供了一个新的稀疏矩阵,但在您的示例中,如果您选择 x = 3,它会变成一个法向量。那是在 colSums()
上触发错误时,它不能将矢量作为输入。
这可能是因为所有聚类= 3个案例都出现在矩阵的一行中?