# Saurabh Jain

41

32

4. ## Predictions about Data Science, Machine Learning, AI & Analytics for 2018

What are your Predictions about Data Science, Machine Learning, AI & Analytics for 2018
5. ## Excel VLookUp made easy

Hi Friends, This is my first video tutorial for VLookUp. Hope you all will like it Download the exercise file Vlookup example.xlsx
6. ## 45 Analytic Techniques Used by Data Scientists

These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools The 45 data science techniques Linear Regression Logistic Regression Jackknife Regression * Density Estimation Confidence Interval Test of Hypotheses Pattern Recognition Clustering - (aka Unsupervised Learning) Supervised Learning Time Series Decision Trees Random Numbers Monte-Carlo Simulation Bayesian Statistics Naive Bayes Principal Component Analysis - (PCA) Ensembles Neural Networks Support Vector Machine - (SVM) Nearest Neighbors - (k-NN) Feature Selection - (aka Variable Reduction) Indexation / Cataloguing * (Geo-) Spatial Modeling Recommendation Engine * Search Engine * Attribution Modeling * Collaborative Filtering * Rule System Linkage Analysis Association Rules Scoring Engine Segmentation Predictive Modeling Graphs Deep Learning Game Theory Imputation Survival Analysis Arbitrage Lift Modeling Yield Optimization Cross-Validation Model Fitting Relevancy Algorithm * Experimental Design

8. ## Standardization vs. normalization ?

In the overall knowledge discovery process, before data mining itself, data preprocessing plays a crucial role. One of the first steps concerns the normalization of the data. This step is very important when dealing with parameters of different units and scales. For example, some data mining techniques use the Euclidean distance. Therefore, all parameters should have the same scale for a fair comparison between them. Two methods are usually well known for rescaling data. Normalization, which scales all numeric variables in the range [0,1]. One possible formula is given below: On the other hand, you can use standardization on your data set. It will then transform it to have zero mean and unit variance, for example using the equation below: Both of these techniques have their drawbacks. If you have outliers in your data set, normalizing your data will certainly scale the “normal” data to a very small interval. And generally, most of data sets have outliers. When using standardization, your new data aren’t bounded (unlike normalization). So my question is what do you usually use when mining your data and why?

There are tons of shortcuts for Excel out there—here’s a list of 200 for Excel 2013 alone. But trying to learn such a large number can be overwhelming, confusing, and ultimately inefficient. Instead, we’ve put together a list of 15 shortcuts that will be immediately useful for most users. This keyboard shortcut lists includes quick ways to format cells, navigate the program, and carry out a few operations. The list is based on Excel 2016, but most will also work on Excel 2013. When that’s not the case we’ve noted it. Keyboard access to the ribbon: Similar to the Vim-inspired add-ons for Chrome andFirefox, Excel 2013 and 2016 have a feature called Key Tips. When Key Tips appears by pressing Alt the Ribbon menu is overlaid with letters. Pressing a letter launches the corresponding menu item. Ctrl + PgDn: Switch between worksheet tabs, moving left to right. Ctrl + PgUp: Switch between worksheet tabs, moving right to left. F12: Display the “Save As” dialog. Ctrl + Shift + \$: (Excel 2016) Current cell formatted as currency, with two decimal places and negative numbers in parentheses. Ctrl + Shift + %: (Excel 2016) Current cell formatted as percentage with no decimal places. Ctrl + Shift + #: (Excel 2016) Current cell formatted as date with day, month, year. Ctrl + Shift + “:”: Insert current time. Ctrl + Shift + “;”: Insert current date. F4: Repeats the last command or action, if possible. Shift + Arrow key: Extends your current cell selection by one addition cell in the direction specified. Ctrl + F1: Display or hide the Ribbon. Alt + Shift + F1: Insert a new worksheet tab. Ctrl + F4: Close the current workbook. Ctrl + D: Launches the Fill Down command for the selected cells below. Fill Down copies contents and format of the topmost cell in the column.

11. ## Why Python for Data Analysis ?

For data analysis and interactive, exploratory computing and data visualization, Python will inevitably draw comparisons with the many other domain-specific open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent years, Python’s improved library support (primarily pandas) has made it a strong alternative for data manipulation tasks. Combined with Python’s strength in general purpose programming, it is an excellent choice as a single language for building data-centric applications Solving the “Two-Language” Problem In many organizations, it is common to research, prototype, and test new ideas using a more domain-specific computing language like MATLAB or R then later port those ideas to be part of a larger production system written in, say, Java, C#, or C++. What people are increasingly finding is that Python is a suitable language not only for doing research and prototyping but also building the production systems, too. I believe that more and more companies will go down this path as there are often significant organizational benefits to having both scientists and technologists using the same set of programmatic tools. Essential Python Libraries For those who are less familiar with the scientific Python ecosystem and the libraries used in data analysis. I present the following overview of some libraries 1. NumPy NumPy, short for Numerical Python, is the foundational package for scientific computing in Python. It provides, among other things A fast and efficient multidimensional array object ndarray Functions for performing element-wise computations with arrays or mathematical operations between arrays Tools for reading and writing array-based data sets to disk Linear algebra operations, Fourier transform, and random number generation Tools for integrating connecting C, C++, and Fortran code to Python Beyond the fast array-processing capabilities that NumPy adds to Python, one of its primary purposes with regards to data analysis is as the primary container for data to be passed between algorithms. For numerical data, NumPy arrays are a much more efficient way of storing and manipulating data than the other built-in Python data structures. Also, libraries written in a lower-level language, such as C or Fortran, can operate on the data stored in a NumPy array without copying any data. 2. pandas pandas provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. It is, as you will see, one of the critical ingredients enabling Python to be a powerful and productive data analysis environment. The primary object in pandas that will be used is the DataFrame, a two dimensional tabular, column-oriented data structure with both row and column labels: pandas combines the high performance array-computing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases (such as SQL). It provides sophisticated indexing functionality to make it easy to reshape, slice and dice, perform aggregations, and select subsets of data. For financial users, pandas features rich, high-performance time series functionality and tools well-suited for working with financial data. In fact, I initially designed pandas as an ideal tool for financial data analysis applications. For users of the R language for statistical computing, the DataFrame name will be familiar, as the object was named after the similar R data.frame object. They are not the same, however; the functionality provided by data.frame in R is essentially a strict subset of that provided by the pandas DataFrame. While this is a book about Python, I will occasionally draw comparisons with R as it is one of the most widely-used open source data analysis environments and will be familiar to many readers. The pandas name itself is derived from panel data, an econometrics term for multidimensional structured data sets, and Python data analysis itself. 3. matplotlib matplotlib is the most popular Python library for producing plots and other 2D data visualizations. It was originally created by John D. Hunter (JDH) and is now maintained by a large team of developers. It is well-suited for creating plots suitable for publication. It integrates well with IPython (see below), thus providing a comfortable interactive environment for plotting and exploring data. The plots are also interactive; you can zoom in on a section of the plot and pan around the plot using the toolbar in the plot window. 4. IPython IPython is the component in the standard scientific Python toolset that ties everything together. It provides a robust and productive environment for interactive and exploratory computing. It is an enhanced Python shell designed to accelerate the writing, testing, and debugging of Python code. It is particularly useful for interactively working with data and visualizing data with matplotlib. IPython is usually involved with the majority of my Python work, including running, debugging, and testing code. Aside from the standard terminal-based IPython shell, the project also provides A Mathematica-like HTML notebook for connecting to IPython through a web browser (more on this later). • A Qt framework-based GUI console with inline plotting, multiline editing, and syntax highlighting • An infrastructure for interactive parallel and distributed computing 5. SciPy SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. Here is a sampling of the packages included: scipy.integrate: numerical integration routines and differential equation solvers scipy.linalg: linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg. scipy.optimize: function optimizers (minimizers) and root finding algorithms scipy.signal: signal processing tools scipy.sparse: sparse matrices and sparse linear system solvers scipy.special: wrapper around SPECFUN, a Fortran library implementing many common mathematical functions, such as the gamma function scipy.stats: standard continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests, and more descriptive statistics scipy.weave: tool for using inline C++ code to accelerate array computations Together NumPy and SciPy form a reasonably complete computational replacement for much of MATLAB along with some of its add-on toolboxes.
12. ## Charts for Business Intelligence, Reports and Dashboards preparation

Which chart do you often use in your presentation or analysis? Share your experience....For help.. please visit the forums