I am sorry if this is basic, I am new to R and starting from scratch here.
I would like to plot a histogram of the following data ( each sample has a Shannon diversity metric representing the richness and abundance of species in each sample).
Here is my data, currently a data frame (called shannon_divplot
) with one column called shannon_diversity
and 6 observations.
shannon_diversity
Control1 3.309361
Control2 3.664494
Control3 3.269842
Disease1 2.572888
Disease2 1.530877
Disease3 2.357401
I would like to plot a histogram that shows the Shannon diversity value for each sample. I then wish to compute an one way ANOVA followed by a post-hoc Tukey test. Here is the code which I have used, strangely the hist()
function is just creating a data frame not an actual graph.
hist(shannon_divplot$shannondiversity,
main="Shannon Diversity",
xlab="Samples", breaks=15)
Will I need to convert my data frame to this
Samples shannon_div
1 Control1 3.309361
2 Control2 3.664494
3 Control3 3.269842
4 Disease1 2.572888
5 Disease2 1.530877
6 Disease3 2.357401
And use code such as
plot(shannon_div ~Samples,
data=shannon_divplot,
main="Shannon Diversity", xlab="Sample" )
?
When calculating the ANOVA I am also going wrong as p values are not computed. It just gives me this;
Df Sum Sq Mean Sq
Samples 5 3.084 0.6168
aov.shannon = aov(shannon_div ~Samples, data=shannon_divplot)
summary(aov.shannon)
TukeyHSD(aov.shannon)
Apologies again if this is too basic, any help would be appreciated.
Edit: If I wanted to compare collectively control vs disease, so there would be more than one value for each group, how would I need to arrange my data frame or code in order to do this?
What you want is more usually called a bar plot: "histogram" is usually reserved for a plot that shows the frequency distribution of a continuous variable. barplot()
will do what you want, you don't need to change the data format:
## input data
dd <- read.table(header=TRUE,text="
shannon_diversity
Control1 3.309361
Control2 3.664494
Control3 3.269842
Disease1 2.572888
Disease2 1.530877
Disease3 2.357401
")
barplot(dd$shannon_diversity,names.arg=rownames(dd),
ylab="Shannon diversity")
Your other question is harder (and it's a statistical question, not a programming question). You can't do an ANOVA across groups unless you have replication within the groups. Since you only have one data point per treatment, ANOVA doesn't produce a p-value.
If you want to compare control vs disease (3 observations each):
dd$grp <- sub("[1-3]","",rownames(dd))
anova(lm(shannon_diversity ~ grp, data=dd))
(There's no point doing a Tukey post-hoc on an ANOVA with two groups [and in my opinion Tukey post-hoc tests are overused anyway ...])
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments