Posts

Final Project: disordR

  disordR: a minimal IDP toolkit in R Project Goal  I built disordR to provide a small toolkit for analyzing intrinsically disordered proteins (IDPs). The package includes functions for amino-acid properties, classic Uversky charge-hydropathy metrics/plots, and a simple consensus combiner for per-residue disorder scores. A small, bundled dataset makes testing the functions simple and reproducible. Design Reasoning I chose four  functions to cover common IDP tasks: aa_props() → mean Kyte–Doolittle hydropathy, net charge, and fraction charged residues (FCR). uversky_metrics() → scaled hydropathy (0–1) + mean net charge per residue with a simple IDP vs Ordered call uversky_plot() → classic Uversky scatter with the boundary line for visual interpretation. consensus_disorder() → mean consensus or predictor scores I limited dependencies to tibble and ggplot2 to make installs reliable and simple. A f uture direction of this package could include lig...

Assignment #12 - R Markdown

For this week's assignment, we were tasked with learning the basics of R Markdown authoring and document structure by practicing embedding R code, narrative text, and LaTeX math in a single file. Here are some of the things I learned:  While building this document, I learned that Markdown has specific syntax to follow. Including "##" will generate a small header, and any text written without that will display in a paragraph font. To create a code chunk, you have to encase it in ``` ```, and include {r chunk_name}. If you want to hide the code in the rendered output, you would write it as {r chunk_name, include=FALSE}. The code chunk generated with the document (shown below) essentially creates a rule that the R code will be displayed in each chunk by default after knitting.  LaTeX math has two modes, inline or display. The inline equation mode uses a single $ on each end of the equation (e.g., $a^2 + b^2 = c^2$), whereas the displayed mode uses two $$ on each end (e....

Assignment #11 - Debugging and Defense programming in R

Image
This week, we learned about debugging and defensive programming in R. We were provided with the code below and asked to test it on a matrix to determine if any issues arose when running it.  The initial error code that RStudio displayed was: Error in tukey_multiple(test_mat) :    could not find function "tukey_multiple" When running the test matrix below, I discovered two bugs in the code. set.seed(123) test_mat <- matrix(rnorm(50), nrow = 10) tukey_multiple(test_mat) The first issue was that tukey.outlier( ), had not yet been defined. I first defined that function using these lines to ensure that R knew what functions I was trying to call.  tukey.outlier <- function(x) {   q <- quantile(x, c(0.25, 0.75))   H <- 1.5 * IQR(x)   x < (q[1] - H) | x > (q[2] + H) } The second issue, came from the use of && in this line " outliers[, j] <- outliers[, j] && tukey.outlier(x[, j])". The issue with this is that it is scalar lo...

Assignment #10 - Building your own R package

This week, our objectives were: - Learn the structure of an R package and the role of the DESCRIPTION file - Practice writing machine-readable metadata (authors, dependencies, versioning) - Draft a coherent package proposal for your final project - Publish your package scaffold proposal online (GitHub and blog) My R package: disordR Purpose and scope of the project The disordR package is a small toolkit designed to be used for students or researchers studying intrinsically disordered proteins (IDPs). It has a simple pipeline that can be used in class projects or advanced research. First, it will calculate charge-hydropathy metrics (based on the Uversky plot). It will then create a concise output of disorder predictors, and finally, it will map AlphaFold pLDDT to assumed intrinsically disordered regions (IDRs).  Key functions I plan to implement aa_props() - mean hydropathy, net charge, and fraction of charged residues uversky_metrics() - hydropathy and charge per residue; classi...

Assignment #9 - Visualization in R – Base Graphics, Lattice, and ggplot2

Image
R code: GitHub link Generated plots:  Scatter plot:  Histogram: Conditional scatter plot (lattice): Scatter with linear trend (ggplot2): Discussion:  How do the syntax and workflow differ between base, lattice, and ggplot2? Base R is very simple and step-by-step, where you first call 'plot( )', and then begin to add what you want line by line, such as points, lines, or legends. With lattice, you establish the description of the plot once with a formula 'xyplot(y ~ x | group, data=df)', with optional formatting lines, and then it renders the plot from that. It is more rigid with edits and requires you to edit the actual code rather than additional lines. Finally, ggplot2 is built together, data + aesthetics + layers 'ggplot(df, aes(x, y, color=group)) + geom_point( ) + geom_smooth( )', therefore allowing it to be easily tweaked.  Which system gave you the most control or produced the most “publication‑quality” output with minimal code? I would say that ggplot2 pr...

Assignment #8: Input/Output, String Manipulation, and the plyr Package

Image
  R script:  GitHub link:  Assignment_08_Input_Output_String_Manipulation Outputs for each step:  Descriptions of each step: Import data :  I loaded the Assignment 6 dataset file using file.choose() so the file can be selected manually. It then ensures the data is made into a data frame with proper headers and types. Load plyr and compute mean of grade by sex : I installed and loaded in the package 'plyr', grouped the rows by 'Sex', then calculated the average 'Grade' to summarize the performance by gender. Write grouped means to text file :  I exported the summary to a tab-delimited "gender_mean.txt" file making the results easy to read. Filter names containing "i" : Using grepl("i", Name, ignore.case=TRUE), I selected only students whose names contained "i" or "I" to practice filtering. Export names only to a CSV : I saved the filtered names to "i_students.csv" to produce a compact output using the re...

Assignment #7 - Exploring R's Object Oriented System (S3 & S4)

Image
 R code and outputs:  Github repo link:  https://github.com/hannahcardenas4/r-programming-assignments/tree/main/R_Programming_Fall2025_Cardenas_Hannah/Assignment_07_Object_Oriented_Systems Questions: How can you tell whether an object uses S3 or S4? (Which functions inspect its class system?) To determine if an object uses S3 or S4, the first check can be done with the isS4( ) function. If it returns TRUE, then it is an S4 object; otherwise, it is likely an S3. Another check can be performed with the class(x) function to determine the labeling format. S3 classes are typically simple, lists, and data frames. S4 classes generally are more complex, with slots and data types.  How do you determine an object's underlying type (e.g., integer vs list)? To determine an object's underlying type, you can use the function typeof( ). An example output of the object's underlying type would be "list", "double", or "character". To specifically test for an int...