
Content count
38 
Joined

Last visited

Days Won
28
Saurabh Jain last won the day on July 4
Saurabh Jain had the most liked content!
Community Reputation
5 NeutralAbout Saurabh Jain

Rank
Advanced Member
Recent Profile Visitors

45 Analytic Techniques Used by Data Scientists
Saurabh Jain posted a blog entry in Analytics techniques
These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools The 45 data science techniques Linear Regression Logistic Regression Jackknife Regression * Density Estimation Confidence Interval Test of Hypotheses Pattern Recognition Clustering  (aka Unsupervised Learning) Supervised Learning Time Series Decision Trees Random Numbers MonteCarlo Simulation Bayesian Statistics Naive Bayes Principal Component Analysis  (PCA) Ensembles Neural Networks Support Vector Machine  (SVM) Nearest Neighbors  (kNN) Feature Selection  (aka Variable Reduction) Indexation / Cataloguing * (Geo) Spatial Modeling Recommendation Engine * Search Engine * Attribution Modeling * Collaborative Filtering * Rule System Linkage Analysis Association Rules Scoring Engine Segmentation Predictive Modeling Graphs Deep Learning Game Theory Imputation Survival Analysis Arbitrage Lift Modeling Yield Optimization CrossValidation Model Fitting Relevancy Algorithm * Experimental Design 
Saurabh Jain started following 45 Analytic Techniques Used by Data Scientists

Here are 11 basic Excel tricks that will change your life
Saurabh Jain posted a topic in Excel Fundamentals
Microsoft Excel is an amazing piece of software, and even regular users might not be getting as much out of it as they can. Improve your Excel efficiency and proficiency with these basic shortcuts and functions that absolutely everyone needs to know. 1. Jump from worksheet to worksheet with Ctrl + PgDn and Ctrl + PgUp 2. Jump to the end of a data range or the next data range with Ctrl + Arrow Of course you can move from cell to cell with arrow keys. But if you want to get around faster, hold down the Ctrl key and hit the arrow keys to get farther: 3. Add the Shift key to select data Ctrl + Shift +Arrow will extend the current selection to the last nonblank cell in that direction: 4. Double click to copy down To copy a formula or value down the length of your data set, you don't need to hold and drag the mouse all the way down. Just double click the tiny box at the bottom righthand corner of the cell: 5. Use shortcuts to quickly format values For a number with two decimal points, use Ctrl + Shift + !. For dollars use Ctrl + Shift + $. For percentages it's Ctrl + Shift + %. The last two should be pretty easy to remember: 6. Lock cells with F4 When copying formulas in Excel, sometimes you want your input cells to move with your formulas BUT SOMETIMES YOU DON'T. When you want to lock one of your inputs you need to put dollar signs before the column letter and row number. Typing in the dollar signs is insane and a huge waste of time. Instead, after you select your cell, hit F4 to insert the dollar signs and lock the cell. If you continue to hit the F4 key, it will cycle through different options: lock cell, lock row number, lock column letter, no lock. 7. Summarize data with CountIF and SumIF CountIF will count the number of times a value appears in a selected range. The first input is the range of values you want to count in. The second input is the criteria, or particular value, you are looking for. Below we are counting the number of stories in column B written by the selected author: COUNTIF(range,criteria) SumIF will add up values in a range when the value in a corresponding range matches your criteria. Here we want to count the total number of views for each author. Our sum range is different from the range with the authors' names, but the two ranges are the same size. We are adding up the number of views in column E when the author name in column B matches the selected name. SUMIF(range,criteria,sum range) 8. Pull out the exact data you want with VLOOKUP VLOOKUP looks for a value in the leftmost column of a data range and will return any value to the right of it. Here we have a list of law schools with school rankings in the first column. We want to use VLOOKUP to create a list of the top 5 ranked schools. VLOOKUP(lookup value,data range,column number,type) The first input is the lookup value. Here we use the ranking we want to find. The second input is the data range that contains the values we are looking up in the leftmost column and the information we're trying to get in the columns to the right. The third input is the column number of the value you want to return. We want the school name, and this is in the second column of our data range. The last input tells Excel if you want an exact match or an approximate match. For an exact match write FALSE or 0. 9. Use & to combine text strings Here we have a column of first names and last names. We can create a column with full names by using &. In Excel, & joins together two or more pieces of text. Don't forget to put a space between the names. Your formula will look like this =[First Name]&" "&[Last Name]. You can mix cell references with actual text as long as the text you want to include is surrounded by quotes: 10. Clean up text with LEFT, RIGHT and LEN These text formulas are great for cleaning up data. Here we have state abbreviations combined with state names with a dash in between. We can use the LEFT function to return the state abbreviation. LEFT grabs a specified number of characters from the start of a text string. The first input is the text string. The second input is the number of characters you want. In our case, we want the first two characters: LEFT(text string, number of characters) If you want to pull the names of the states out of this text string you have to use the RIGHT function. RIGHT grabs a number of characters from the right end of a text string. But how many characters on the right do you want? All but three, since the state names all come after the state's twoletter abbreviation and a dash. This is where LEN comes in handy. LEN will count the number of characters or length of the text string. LEN(text string) Now you can use a combination of RIGHT and LEN to pull out the state names. Since we want all but the first three characters, we take the length of our string, subtract 3, and pull that many characters from the right end of the string: RIGHT(text string,number of characters) 11. Generate random values with RAND You can use RAND() function to generate a random value between 0 and 1. D0 not include any inputs, just leave the parentheses empty. New random values will be generated every time the workbook recalculates. You can force it to recalculate by hitting F9. But be careful. It also recalculates when you make other changes to the workbook: RAND() 
Hi Friends, This is my first video tutorial for VLookUp. Hope you all will like it Download the exercise file Vlookup example.xlsx

Standardization vs. normalization ?
Saurabh Jain posted a topic in Data Management  Data Cleaning & Data Plumbing
In the overall knowledge discovery process, before data mining itself, data preprocessing plays a crucial role. One of the first steps concerns the normalization of the data. This step is very important when dealing with parameters of different units and scales. For example, some data mining techniques use the Euclidean distance. Therefore, all parameters should have the same scale for a fair comparison between them. Two methods are usually well known for rescaling data. Normalization, which scales all numeric variables in the range [0,1]. One possible formula is given below: On the other hand, you can use standardization on your data set. It will then transform it to have zero mean and unit variance, for example using the equation below: Both of these techniques have their drawbacks. If you have outliers in your data set, normalizing your data will certainly scale the “normal” data to a very small interval. And generally, most of data sets have outliers. When using standardization, your new data aren’t bounded (unlike normalization). So my question is what do you usually use when mining your data and why? 
These 15 Excel keyboard shortcuts will help you become a spreadsheet master
Saurabh Jain posted a topic in Excel Fundamentals
There are tons of shortcuts for Excel out there—here’s a list of 200 for Excel 2013 alone. But trying to learn such a large number can be overwhelming, confusing, and ultimately inefficient. Instead, we’ve put together a list of 15 shortcuts that will be immediately useful for most users. This keyboard shortcut lists includes quick ways to format cells, navigate the program, and carry out a few operations. The list is based on Excel 2016, but most will also work on Excel 2013. When that’s not the case we’ve noted it. Keyboard access to the ribbon: Similar to the Viminspired addons for Chrome andFirefox, Excel 2013 and 2016 have a feature called Key Tips. When Key Tips appears by pressing Alt the Ribbon menu is overlaid with letters. Pressing a letter launches the corresponding menu item. Ctrl + PgDn: Switch between worksheet tabs, moving left to right. Ctrl + PgUp: Switch between worksheet tabs, moving right to left. F12: Display the “Save As” dialog. Ctrl + Shift + $: (Excel 2016) Current cell formatted as currency, with two decimal places and negative numbers in parentheses. Ctrl + Shift + %: (Excel 2016) Current cell formatted as percentage with no decimal places. Ctrl + Shift + #: (Excel 2016) Current cell formatted as date with day, month, year. Ctrl + Shift + “:”: Insert current time. Ctrl + Shift + “;”: Insert current date. F4: Repeats the last command or action, if possible. Shift + Arrow key: Extends your current cell selection by one addition cell in the direction specified. Ctrl + F1: Display or hide the Ribbon. Alt + Shift + F1: Insert a new worksheet tab. Ctrl + F4: Close the current workbook. Ctrl + D: Launches the Fill Down command for the selected cells below. Fill Down copies contents and format of the topmost cell in the column. 
15 simple, yet powerful Excel functions you need to know
Saurabh Jain posted a topic in Formulas & Functions
Power users love to talk about how powerful and awesome Excel is, what with its Pivot Tables, nested formulas, and Boolean logic. But many of us barely know how to find the Autosum feature, let alone use Excel’s functions to create powerful formulas. If that’s you, here are 15 handy Excel functions that will get you well on your way to spreadsheet mastery. =SUM() The first function anyone should know tells Excel to carry out basic arithmetic. Let’s say you wanted the numbers in cell A2 and cell B2 to be added together in cell B3. All you’d do is enter =SUM into B3, click the =SUM entry in the popup that appears, Control + Click in cell A2 and B2 to highlight both, then hit Enter. In an instant you’d see sum of the two cells appear. You can use this function, and most of the functions listed here, to encompass as many cells as you require—just select all the ones you need included. =AVERAGE Average does exactly what it says and works similarly to SUM. If you have a worksheet containing your monthly earnings over the past year in columns A2 through A13, type =AVERAGE(A2:A13) into A14 to get the average. Again, you can also use your mouse cursor to clickanddrag highlight a range of cells, or use Control + Click to handpick individual cells. =MIN() If you need to find the smallest number in a range of cells, MIN can help you do that. Using =MIN(B3:B39) will give you the smallest number contained in those cells. =MAX() Max is the counterpart to MIN and does the opposite showing you the largest number contained in a range of cells. =TRIM() If you copy text from another program into Excel you can often end up with excess whitespace that turns your spreadsheet into a visual horror. TRIM can help you clean it up. TRIM can only deal with text from a single cell. So start by cleaning up the text in cell B1 by typing =TRIM(B1) into cell C1, for example, and then repeat the function for any other cells you need tidied up. If you’re looking to clean up line breaks, try CLEAN instead of TRIM. =COUNT() If you need to know how many cells in a given range contain numbers, don’t bother counting by hand—just use the COUNT function. If you have a mix of numbers and text in cells A1A20, for example, type into cell A21 =COUNT(A1:20) and you’ll have the answer in no time. =COUNTA() Similar to the above example, you can use =COUNTA() to count the number of cells in a given range that contain characters such as numbers, text, or symbols. =LEN() If you want to count the number of characters in a single cell, including white spaces, check out LEN. Want to know how many characters are in cell A1? Just type =LEN(A1) into a different cell and you’ll find out. =CONCATENATE() This takes data from two cells and turns it into one. Check out our previous look at Excel functions to see how this can be useful. =DAYS() Want to know the number of days between two dates in a spreadsheet? If you had for example, September 5, 2015 in cell A4 and December 27, 2015 in A5 just use =DAYS(A5, A4) to get the answer. (Reversing the order of cells would give you a positive number). =NETWORKDAYS() Knowing the number of days is great, but if you need to know how many work days that range encompasses, turn to NETWORKDAYS (that’s ‘net work days’) instead. This function uses the same format as DAYS, but you need to use an ascending cell order to get a nonnegative number. So =NETWORKDAYS(A4, A5) will give you 80 and not 80. =SQRT() Need to know the square root of 1764? Type =SQRT(1764) into a cell, hit Enter, and you’ll find the answer to life, the universe, and everything—including the example formula. =NOW() Want to see the current date and time whenever you open a particular worksheet? Type =NOW() into the cell where you want the date and you’re done. If you want to see a future date from right now, you can use something like =NOW()+32. The now function does not take any arguments so don’t put anything in between the brackets. =ROUND() As its name suggests, this function lets you round off numbers. ROUND requires two arguments: a number or cell, and the number of digits to round to. If you have the number 231.852645 in A1, for example, =ROUND(A1, 0) gives you 232, =ROUND(A1, 1) gives you 232.9, and =ROUND(A1, 1) returns 230. =ROUNDUP(), =ROUNDDOWN() If you want more direct control over rounding up or down there are functions for that too. ROUNDUP and ROUNDDOWN use exactly the same argument format as ROUND. 
For data analysis and interactive, exploratory computing and data visualization, Python will inevitably draw comparisons with the many other domainspecific open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent years, Python’s improved library support (primarily pandas) has made it a strong alternative for data manipulation tasks. Combined with Python’s strength in general purpose programming, it is an excellent choice as a single language for building datacentric applications Solving the “TwoLanguage” Problem In many organizations, it is common to research, prototype, and test new ideas using a more domainspecific computing language like MATLAB or R then later port those ideas to be part of a larger production system written in, say, Java, C#, or C++. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also building the production systems, too. I believe that more and more companies will go down this path as there are often significant organizational benefits to having both scientists and technologists using the same set of programmatic tools. Essential Python Libraries For those who are less familiar with the scientific Python ecosystem and the libraries used in data analysis. I present the following overview of some libraries 1. NumPy NumPy, short for Numerical Python, is the foundational package for scientific computing in Python. It provides, among other things A fast and efficient multidimensional array object ndarray Functions for performing elementwise computations with arrays or mathematical operations between arrays Tools for reading and writing arraybased data sets to disk Linear algebra operations, Fourier transform, and random number generation Tools for integrating connecting C, C++, and Fortran code to Python Beyond the fast arrayprocessing capabilities that NumPy adds to Python, one of its primary purposes with regards to data analysis is as the primary container for data to be passed between algorithms. For numerical data, NumPy arrays are a much more efficient way of storing and manipulating data than the other builtin Python data structures. Also, libraries written in a lowerlevel language, such as C or Fortran, can operate on the data stored in a NumPy array without copying any data. 2. pandas pandas provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients enabling Python to be a powerful and productive data analysis environment. The primary object in pandas that will be used is the DataFrame, a two dimensional tabular, columnoriented data structure with both row and column labels: pandas combines the high performance arraycomputing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases (such as SQL). It provides sophisticated indexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data. For financial users, pandas features rich, highperformance time series functionality and tools wellsuited for working with financial data. In fact, I initially designed pandas as an ideal tool for financial data analysis applications. For users of the R language for statistical computing, the DataFrame name will be familiar, as the object was named after the similar R data.frame object. They are not the same, however; the functionality provided by data.frame in R is essentially a strict subset of that provided by the pandas DataFrame. While this is a book about Python, I will occasionally draw comparisons with R as it is one of the most widelyused open source data analysis environments and will be familiar to many readers. The pandas name itself is derived from panel data, an econometrics term for multidimensional structured data sets, and Python data analysis itself. 3. matplotlib matplotlib is the most popular Python library for producing plots and other 2D data visualizations. It was originally created by John D. Hunter (JDH) and is now maintained by a large team of developers. It is wellsuited for creating plots suitable for publication. It integrates well with IPython (see below), thus providing a comfortable interactive environment for plotting and exploring data. The plots are also interactive; you can zoom in on a section of the plot and pan around the plot using the toolbar in the plot window. 4. IPython IPython is the component in the standard scientific Python toolset that ties everything together. It provides a robust and productive environment for interactive and exploratory computing. It is an enhanced Python shell designed to accelerate the writing, testing, and debugging of Python code. It is particularly useful for interactively working with data and visualizing data with matplotlib. IPython is usually involved with the majority of my Python work, including running, debugging, and testing code. Aside from the standard terminalbased IPython shell, the project also provides A Mathematicalike HTML notebook for connecting to IPython through a web browser (more on this later). • A Qt frameworkbased GUI console with inline plotting, multiline editing, and syntax highlighting • An infrastructure for interactive parallel and distributed computing 5. SciPy SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included: scipy.integrate: numerical integration routines and differential equation solvers scipy.linalg: linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg. scipy.optimize: function optimizers (minimizers) and root finding algorithms scipy.signal: signal processing tools scipy.sparse: sparse matrices and sparse linear system solvers scipy.special: wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function scipy.stats: standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics scipy.weave: tool for using inline C++ code to accelerate array computations Together NumPy and SciPy form a reasonably complete computational replacement for much of MATLAB along with some of its addon toolboxes.

10 Excel basic formulas and functions to supercharge your reporting skills.
Saurabh Jain posted a topic in Excel Fundamentals
How to use pivot tables in Excel and that has made all the difference. It enabled me to take a series of data and analyze it in a variety of different ways rather than waiting for a programmer to create a static report. As I was establishing KPIs for my company, this helped me make sense of the data. This really saved incredible amounts of time and I started enjoyed playing with data. Sharing with you the ten basic formulas and functions that one can’t live without when it comes to building reports to track the KPIs that are essential to managing their programs. This isn’t meant to be a tutorial but more of a “hey, did you know that you can do that in Excel or Google Sheets?” Here are ten that you should add to your repertoire if you haven’t already. 1. Filter Filtering allows you to quickly sort any column and include/exclude rows containing certain data. For example, if you have a list of all of your phone calls for the month, you can use filtering to only display the calls handled by certain agents. 2. Conditional Formatting Conditional Formatting is a great way to add visuals to your spreadsheets based on the data. For example, if you want to rank a set of data, you can have all rows within a certain parameter be one color and anything outside of that parameter another color. It’s great for seeing where you’re hitting your goals and where you might need to improve. 3 Format Painter The format painter is a simple tool that allows you to copy just the formatting from a certain spreadsheet and quickly apply it to a new set of cells. 4. Date The DATE function is incredibly useful in Excel. It’s a common thing to have a bunch of rows with dates but it can be difficult to group them by month. You can quickly create a column for month next to your date column by using this formula: =DATE(YEAR(A2),MONTH(A2),1) A2 represents the cell that has the date in it. If the date in that cell looks like 20160403 the month cell will look like 20160401. Regardless of the date, the month will display the first day of each month and now you can quickly filter or group in your pivot table by month. Bonus: Once you do this in your first cell, double click in the bottom right hand corner of that cell and it will copy the formula all the way to the bottom of your spreadsheet. You don’t have to write that formula 25,000 times. Win. 5. Dollar sign The dollar sign is an important little symbol to remember in your formulas. In the previous point, Excel has the ability to customize the formula in the cell relative to the row it’s in. Perhaps you’re doing a formula where you want to multiply the cells in a particular column by a value in another single cell. The dollar sign turns off that relativity and makes sure that the cell number doesn’t change in your formula. So your formula might look something like : =A2*$D$16 If you copy that formula to the next cell in the column, the formula will look like: =B2*$D$16 6. Lookup & HLookup Learn this formula and you will really feel like an Excel wizard. This allows you to grab data from another sheet using a common identifier. For example, you may have one sheet that has all of your data for your support tickets for the month including the name of the agent who handled the ticket. You’d really like to group ticket results by the team the agents work on. In another sheet, you can create a simple, twocolumn sheet that has the name of the agent and the team they are on. Once you have that sheet created, go back to your main sheet and create a “team” column. In this case you will use VLOOKUP because it’s vertical. HLOOKUP is if your data is laid out horizontally. The VLOOKUP will look up the name of the agent and match it with the name on the other sheet and fill in the corresponding team name. A formula might look like this: =VLOOKUP(A2,’OtherSheet’!$A$1:$B$25,2,FALSE) Note that the aforementioned dollar signs are often really useful in this formula. 7. Data Validation Data Validation allows you to create cool things like drop down lists in a cell. Consider a time where you had to enter “yes” or “no” in a certain column or possibly a wider range of options. A drop down list allows you to quickly select from the options you want in a couple clicks rather than several key strokes and it ensures consistent formatting. 8. Sum, Average, Count These simple functions are great to remember so you don’t have to hunt around for the function button. Add an “=” before each and you can count, add, or average any range of cells that you select. These are essential building blocks to know for the next tip. 9. IF and IFS Add IF or IFS after Sum, Average, and Count to add some criteria to those formulas. Adding the “S” allows you to attach multiple criteria to the formula. Let’s say I have a table full of quality scores and I want to count how many perfect 100% scores my team earned. My formula would look something like this: =COUNTIF(A1:A500,“100”) 10. IF Statements IF statements allow you to apply logic to a certain cell. For example, if you have a list of all of your phone calls and a column that displays how many seconds each caller waited before their call was answered, you can determine how many calls were answered in less than 60 seconds. Your formula would look something like this: =IF(A2<=60,“Yes”,“No”) So if the caller waited 60 seconds or less, the value printed in the cell will be “Yes.” If it was more than 60 seconds, the value would be “No.” That’s ten rather amazing tools to supercharge the way you use Microsoft Excel or Google Sheets for your reporting and we barely scratched the surface of some of the things you can do. A rule of thumb I’ve learned with these programs is that if there’s something you’re doing manually that’s time consuming, there’s probably a formula or function that will help you do it faster What are your go to Excel tips and tricks? Leave us a comment and let us know. 


Charts for Business Intelligence, Reports and Dashboards preparation
Saurabh Jain posted a gallery album in Member Albums




